Portrait de Glen Berseth

Glen Berseth

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement
Apprentissage profond
Robotique

Biographie

Glen Berseth est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre académique principal de Mila – Institut québécois d'intelligence artificielle, détenteur d’une chaire en IA Canada-CIFAR et codirecteur du Laboratoire de robotique et d’IA intégrative de Montréal (REAL). Il a été chercheur postdoctoral à Berkeley Artificial Intelligence Research (BAIR), où il a travaillé avec Sergey Levine. Ses recherches portent sur la résolution de problèmes de prise de décision séquentielle (planification) pour les systèmes d'apprentissage autonomes du monde réel (robots). Elles ont couvert les domaines de la collaboration humain-robot, du renforcement, ainsi que de l'apprentissage continu, multiagent et hiérarchique et du méta-apprentissage. Glen Berseth a fait paraître des articles dans les meilleures publications des domaines de la robotique, de l'apprentissage automatique et de l'animation informatique. Il donne également un cours sur l'apprentissage des robots à l'Université de Montréal et à Mila, couvrant les recherches les plus récentes sur les techniques d'apprentissage automatique pour la création de robots généralistes.

Étudiants actuels

Maîtrise recherche - UdeM
Maîtrise recherche - UdeM
Collaborateur·rice de recherche - Waterloo
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Maîtrise recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Postdoctorat - UdeM
Co-superviseur⋅e :
Maîtrise professionnelle - UdeM
Stagiaire de recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :

Publications

Efficient Morphology-Aware Policy Transfer to New Embodiments
Michael Przystupa
Hongyao Tang
Mariano Phielipp
Santiago Miret
Martin Jägersand
Matthew E. Taylor
Morphology-aware policy learning is a means of enhancing policy sample efficiency by aggregating data from multiple agents. These types of p… (voir plus)olicies have previously been shown to help generalize over dynamic, kinematic, and limb configuration variations between agent morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on morphologies at deployment. This limitation has ramifications in practical applications such as robotics because further data collection to perform end-to-end finetuning can be computationally expensive. In this work, we investigate combining morphology-aware pretraining with \textit{parameter efficient finetuning} (PEFT) techniques to help reduce the learnable parameters necessary to specialize a morphology-aware policy to a target embodiment. We compare directly tuning sub-sets of model weights, input learnable adapters, and prefix tuning techniques for online finetuning. Our analysis reveals that PEFT techniques in conjunction with policy pre-training generally help reduce the number of samples to necessary to improve a policy compared to training models end-to-end from scratch. We further find that tuning as few as less than 1\% of total parameters will improve policy performance compared the zero-shot performance of the base pretrained a policy.
Efficient Morphology-Aware Policy Transfer to New Embodiments
Michael Przystupa
Hongyao Tang
Mariano Phielipp
Santiago Miret
Martin Jägersand
Matthew E. Taylor
Morphology-aware policy learning is a means of enhancing policy sample efficiency by aggregating data from multiple agents. These types of p… (voir plus)olicies have previously been shown to help generalize over dynamic, kinematic, and limb configuration variations between agent morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on morphologies at deployment. This limitation has ramifications in practical applications such as robotics because further data collection to perform end-to-end finetuning can be computationally expensive. In this work, we investigate combining morphology-aware pretraining with \textit{parameter efficient finetuning} (PEFT) techniques to help reduce the learnable parameters necessary to specialize a morphology-aware policy to a target embodiment. We compare directly tuning sub-sets of model weights, input learnable adapters, and prefix tuning techniques for online finetuning. Our analysis reveals that PEFT techniques in conjunction with policy pre-training generally help reduce the number of samples to necessary to improve a policy compared to training models end-to-end from scratch. We further find that tuning as few as less than 1\% of total parameters will improve policy performance compared the zero-shot performance of the base pretrained a policy.
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
Mingqi Yuan
Roger Creus Castanyer
Bo Li
Xin Jin
Wenjun Zeng
Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Siddarth Venkatraman
Mohsin Hasan
Minsu Kim
Luca Scimeca
Marcin Sendera
Nikolay Malkin
Any well-behaved generative model over a variable …
Solving Bayesian inverse problems with diffusion priors and off-policy RL
Luca Scimeca
Siddarth Venkatraman
Moksh J. Jain
Minsu Kim
Marcin Sendera
Mohsin Hasan
Luke Rowe
Sarthak Mittal
Pablo Lemos
Alexandre Adam
Jarrid Rector-Brooks
Nikolay Malkin
This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (voir plus)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.
Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Siddarth Venkatraman
Mohsin Hasan
Minsu Kim
Luca Scimeca
Marcin Sendera
Nikolay Malkin
Any well-behaved generative model over a variable …
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Matthew D Riemer
Gopeshh Subbaraj
Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectivel… (voir plus)y minimize regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime reinforcement learning (RL) environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pokémon and Tetris.
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Arnav Kumar Jain
Harley Wiltzer
Jesse Farebrother
Sanjiban Choudhury
Towards Improving Exploration through Sibling Augmented GFlowNets
Kanika Madan
Alex Lamb
Exploration is a key factor for the success of an active learning agent, especially when dealing with sparse extrinsic terminal rewards and … (voir plus)long trajectories. We introduce Sibling Augmented Generative Flow Networks (SA-GFN), a novel framework designed to enhance exploration and training efficiency of Generative Flow Networks (GFlowNets). SA-GFN uses a decoupled dual network architecture, comprising of a main Behavior Network and an exploratory Sibling Network, to enable a diverse exploration of the underlying distribution using intrinsic rewards. Inspired by the ideas on exploration from reinforcement learning, SA-GFN provides a general-purpose exploration and learning paradigm that integrates with multiple GFlowNet training objectives and is especially helpful for exploration over a wide range of sparse or low reward distributions and task structures. An extensive set of experiments across a diverse range of tasks, reward structures and trajectory lengths, along with a thorough set of ablations, demonstrate the superior performance of SA-GFN in terms of exploration efficacy and convergence speed as compared to the existing methods. In addition, SA-GFN's versatility and compatibility with different GFlowNet training objectives and intrinsic reward methods underscores its broad applicability in various problem domains.
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control
Zhongyu Li
Xue Bin Peng
Pieter Abbeel
Sergey Levine
Koushil Sreenath
This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal rob… (voir plus)ots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Matthew Riemer
Gopeshh Raaj Subbaraj
Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectivel… (voir plus)y minimize regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime reinforcement learning (RL) environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pok\'emon and Tetris.
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
Matthew Riemer
Gopeshh Raaj Subbaraj
Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectivel… (voir plus)y minimize regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime reinforcement learning (RL) environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pok\'emon and Tetris.