Portrait de Glen Berseth

Glen Berseth

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement
Apprentissage profond
Robotique

Biographie

Glen Berseth est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre académique principal de Mila – Institut québécois d'intelligence artificielle, détenteur d’une chaire en IA Canada-CIFAR et codirecteur du Laboratoire de robotique et d’IA intégrative de Montréal (REAL). Il a été chercheur postdoctoral à Berkeley Artificial Intelligence Research (BAIR), où il a travaillé avec Sergey Levine. Ses recherches portent sur la résolution de problèmes de prise de décision séquentielle (planification) pour les systèmes d'apprentissage autonomes du monde réel (robots). Elles ont couvert les domaines de la collaboration humain-robot, du renforcement, ainsi que de l'apprentissage continu, multiagent et hiérarchique et du méta-apprentissage. Glen Berseth a fait paraître des articles dans les meilleures publications des domaines de la robotique, de l'apprentissage automatique et de l'animation informatique. Il donne également un cours sur l'apprentissage des robots à l'Université de Montréal et à Mila, couvrant les recherches les plus récentes sur les techniques d'apprentissage automatique pour la création de robots généralistes.

Étudiants actuels

Maîtrise recherche - UdeM
Maîtrise recherche - UdeM
Collaborateur·rice de recherche - Waterloo
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Postdoctorat - UdeM
Co-superviseur⋅e :
Maîtrise professionnelle - UdeM
Stagiaire de recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :

Publications

Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View.
Raj Ghugare
Matthieu Geist
Benjamin Eysenbach
Some reinforcement learning (RL) algorithms have the capability of recombining together pieces of previously seen experience to solve a task… (voir plus) never seen before during training. This oft-sought property is one of the few ways in which dynamic programming based RL algorithms are considered different from supervised learning (SL) based RL algorithms. Yet, recent RL methods based on off-the-shelf SL algorithms achieve excellent results without an explicit mechanism for stitching; it remains unclear whether those methods forgo this important stitching property. This paper studies this question in the setting of goal-reaching problems. We show that the desirable stitching property corresponds to a form of generalization: after training on a distribution of (state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen \emph{together} in the training data. Our analysis shows that this sort of generalization is different from \emph{i.i.d.} generalization. This connection between stitching and generalization reveals why we should not expect existing RL methods based on SL to perform stitching, even in the limit of large datasets and models. We experimentally validate this result on carefully constructed datasets. This connection suggests a simple remedy, the same remedy for improving generalization in supervised learning: data augmentation. We propose a naive \emph{temporal} data augmentation approach and demonstrate that adding it to RL methods based on SL enables them to stitch together experience so that they succeed in navigating between states and goals unseen together during training.
Closing the Gap between TD Learning and Supervised Learning - A Generalisation Point of View
Raj Ghugare
Matthieu Geist
Benjamin Eysenbach
Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training. This oft-soug… (voir plus)ht property is one of the few ways in which RL methods based on dynamic-programming differ from RL methods based on supervised-learning (SL). Yet, certain RL methods based on off-the-shelf SL algorithms achieve excellent results without an explicit mechanism for stitching; it remains unclear whether those methods forgo this important stitching property. This paper studies this question for the problems of achieving a target goal state and achieving a target return value. Our main result is to show that the stitching property corresponds to a form of combinatorial generalization: after training on a distribution of (state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen together in the training data. Our analysis shows that this sort of generalization is different from i.i.d. generalization. This connection between stitching and generalisation reveals why we should not expect SL-based RL methods to perform stitching, even in the limit of large datasets and models. Based on this analysis, we construct new datasets to explicitly test for this property, revealing that SL-based methods lack this stitching property and hence fail to perform combinatorial generalization. Nonetheless, the connection between stitching and combinatorial generalisation also suggests a simple remedy for improving generalisation in SL: data augmentation. We propose a temporal data augmentation and demonstrate that adding it to SL-based methods enables them to successfully complete tasks not seen together during training. On a high level, this connection illustrates the importance of combinatorial generalization for data efficiency in time-series data beyond tasks beyond RL, like audio, video, or text.
Improving Intrinsic Exploration by Creating Stationary Objectives
Roger Creus Castanyer
Joshua Romoff
Intelligent Switching for Reset-Free RL
Janarthanan Rajendran
Intelligent Switching for Reset-Free RL
Janarthanan Rajendran
In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable. The \textit{resett… (voir plus)ing} assumption limits the potential of reinforcement learning in the real world, as providing resets to an agent usually requires the creation of additional handcrafted mechanisms or human interventions. Recent work aims to train agents (\textit{forward}) with learned resets by constructing a second (\textit{backward}) agent that returns the forward agent to the initial state. We find that the termination and timing of the transitions between these two agents are crucial for algorithm success. With this in mind, we create a new algorithm, Reset Free RL with Intelligently Switching Controller (RISC) which intelligently switches between the two agents based on the agent's confidence in achieving its current goal. Our new method achieves state-of-the-art performance on several challenging environments for reset-free RL.
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Shivesh Khaitan
Ravi Tej Akella
John Dolan
Jeff Schneider
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Shivesh Khaitan
Ravi Tej Akella
John Dolan
Jeff Schneider
Searching for High-Value Molecules Using Reinforcement Learning and Transformers
Raj Ghugare
Santiago Miret
Adriana Hugessen
Mariano Phielipp
Adaptive Resolution Residual Networks
Mahtab Sandhu
We introduce Adaptive Resolution Residual Networks (ARRNs), a form of neural operator that enables the creation of networks for signal-based… (voir plus) tasks that can be rediscretized to suit any signal resolution. ARRNs are composed of a chain of Laplacian residuals that each contain ordinary layers, which do not need to be rediscretizable for the whole network to be rediscretizable. ARRNs have the property of requiring a lower number of Laplacian residuals for exact evaluation on lower-resolution signals, which greatly reduces computational cost. ARRNs also implement Laplacian dropout, which encourages networks to become robust to low-bandwidth signals. ARRNs can thus be trained once at high-resolution and then be rediscretized on the fly at a suitable resolution with great robustness.
Improving Intrinsic Exploration by Creating Stationary Objectives
Roger Creus Castanyer
Joshua Romoff
Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
Adriana Hugessen
Roger Creus Castanyer
Searching for High-Value Molecules Using Reinforcement Learning and Transformers
Raj Ghugare
Santiago Miret
Adriana Hugessen
Mariano Phielipp
Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However,… (voir plus) RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.