Le Studio d'IA pour le climat de Mila vise à combler l’écart entre la technologie et l'impact afin de libérer le potentiel de l'IA pour lutter contre la crise climatique rapidement et à grande échelle.
Hugo Larochelle nommé directeur scientifique de Mila
Professeur associé à l’Université de Montréal et ancien responsable du laboratoire de recherche en IA de Google à Montréal, Hugo Larochelle est un pionnier de l’apprentissage profond et fait partie des chercheur·euses les plus respecté·es au Canada.
Perspectives sur l’IA pour les responsables des politiques
Co-dirigé par Mila et le CIFAR, ce programme met en relation les décideur·euse·s avec des chercheur·euse·s de pointe en IA grâce à une combinaison de consultations ouvertes et d'exercices de test de faisabilité des politiques. La prochaine session aura lieu les 9 et 10 octobre.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Traditionally, constrained policy optimization with Reinforcement Learning (RL) requires learning a new policy from scratch for any new envi… (voir plus)ronment, goal or cost function, with limited generalization to new tasks and constraints. Given the sample inefficiency of many common deep RL methods, this procedure can be impractical for many real-world scenarios, particularly when constraints or tasks are changing. As an alternative, in the unconstrained setting, various works have sought to pre-train representations from offline datasets to accelerate policy optimization upon specification of a reward.
Such methods can permit faster adaptation to new tasks in a given environment, dramatically improving sample efficiency. Recently, zero-shot policy optimization has been explored by leveraging a particular
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Tradit… (voir plus)ionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimizes the reward through repeated RL procedures. This game-solving approach is both computationally expensive and difficult to stabilize. In this work, we propose a novel approach to IRL by direct policy optimization: exploiting a linear factorization of the return as the inner product of successor features and a reward vector, we design an IRL algorithm by policy gradient descent on the gap between the learner and expert features. Our non-adversarial method does not require learning a reward function and can be solved seamlessly with existing actor-critic RL algorithms. Remarkably, our approach works in state-only settings without expert action labels, a setting which behavior cloning (BC) cannot solve. Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks.
This paper contributes a new approach for distributional reinforcement learning which elucidates
a clean separation of transition structure … (voir plus)and reward in the learning process. Analogous to how
the successor representation (SR) describes the expected consequences of behaving according to a
given policy, our distributional successor measure
(SM) describes the distributional consequences of
this behaviour. We formulate the distributional
SM as a distribution over distributions and provide theory connecting it with distributional and
model-based reinforcement learning. Moreover,
we propose an algorithm that learns the distributional SM from data by minimizing a two-level
maximum mean discrepancy. Key to our method
are a number of algorithmic techniques that are
independently valuable for learning generative
models of state. As an illustration of the usefulness of the distributional SM, we show that it
enables zero-shot risk-sensitive policy evaluation
in a way that was not previously possible.
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In th… (voir plus)is work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.