Portrait de Glen Berseth

Glen Berseth

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement
Apprentissage profond

Biographie

Glen Berseth est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre académique principal de Mila – Institut québécois d'intelligence artificielle, détenteur d’une chaire en IA Canada-CIFAR et codirecteur du Laboratoire de robotique et d’IA intégrative de Montréal (REAL). Il a été chercheur postdoctoral à Berkeley Artificial Intelligence Research (BAIR), où il a travaillé avec Sergey Levine. Ses recherches portent sur la résolution de problèmes de prise de décision séquentielle (planification) pour les systèmes d'apprentissage autonomes du monde réel (robots). Elles ont couvert les domaines de la collaboration humain-robot, du renforcement, ainsi que de l'apprentissage continu, multiagent et hiérarchique et du méta-apprentissage. Glen Berseth a fait paraître des articles dans les meilleures publications des domaines de la robotique, de l'apprentissage automatique et de l'animation informatique. Il donne également un cours sur l'apprentissage des robots à l'Université de Montréal et à Mila, couvrant les recherches les plus récentes sur les techniques d'apprentissage automatique pour la création de robots généralistes.

Étudiants actuels

Maîtrise recherche - UdeM
Maîtrise professionnelle - UdeM
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Stagiaire de recherche - Polytechnic
Collaborateur·rice de recherche
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Postdoctorat - UdeM
Co-superviseur⋅e :
Maîtrise professionnelle - UdeM
Stagiaire de recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM

Publications

Simplifying Constraint Inference with Inverse Reinforcement Learning
Adriana Hugessen
Harley Wiltzer
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky
Karl Pertsch
Suraj Nair
Ashwin Balakrishna
Sudeep Dasari
Siddharth Karamcheti
Soroush Nasiriany
Mohan Kumar Srirama
Lawrence Yunliang Chen
Kirsty Ellis
Peter David Fagan
Joey Hejna
Masha Itkina
Marion Lepert
Yecheng Jason Ma
Ye Ma
Patrick Tree Miller
Jimmy Wu
Suneel Belkhale
Shivin Dass … (voir 80 de plus)
Huy Ha
Arhan Jain
Abraham Lee
Youngwoon Lee
Marius Memmel
Sungjae Park
Ilija Radosavovic
Kaiyuan Wang
Albert Zhan
Kevin Black
Cheng Chi
Kyle Beltran Hatch
Shan Lin
Jingpei Lu
Jean Mercat
Abdul Rehman
Pannag R Sanketi
Archit Sharma
Cody Simpson
Quan Vuong
Homer Rich Walke
Blake Wulfe
Ted Xiao
Jonathan Heewon Yang
Arefeh Yavary
Tony Z. Zhao
Christopher Agia
Rohan Baijal
Mateo Guaman Castro
Daphne Chen
Qiuyu Chen
Trinity Chung
Jaimyn Drake
Ethan Paul Foster
Jensen Gao
David Antonio Herrera
Minho Heo
Kyle Hsu
Jiaheng Hu
Donovon Jackson
Charlotte Le
Yunshuang Li
K. Lin
Roy Lin
Zehan Ma
Abhiram Maddukuri
Suvir Mirchandani
Daniel Morton
Tony Khuong Nguyen
Abigail O'Neill
Rosario Scalise
Derick Seale
Victor Son
Stephen Tian
Emi Tran
Andrew E. Wang
Yilin Wu
Annie Xie
Jingyun Yang
Patrick Yin
Yunchu Zhang
Osbert Bastani
Jeannette Bohg
Ken Goldberg
Abhinav Gupta
Abhishek Gupta
Dinesh Jayaraman
Joseph J Lim
Jitendra Malik
Roberto Martín-Martín
Subramanian Ramamoorthy
Dorsa Sadigh
Shuran Song
Jiajun Wu
Michael C. Yip
Yuke Zhu
Thomas Kollar
Sergey Levine
Chelsea Finn
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and … (voir plus)robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.
Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models
Matthew D Riemer
Gopeshh Subbaraj
Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectivel… (voir plus)y minimize long-term regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pokemon and Tetris.
Revisiting Successor Features for Inverse Reinforcement Learning
Arnav Kumar Jain
Harley Wiltzer
Jesse Farebrother
Sanjiban Choudhury
Amortizing intractable inference in diffusion models for vision, language, and control
Siddarth Venkatraman
Moksh J. Jain
Luca Scimeca
Minsu Kim
Marcin Sendera
Mohsin Hasan
Luke Rowe
Sarthak Mittal
Pablo Lemos
Emmanuel Bengio
Alexandre Adam
Jarrid Rector-Brooks
Nikolay Malkin
Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (voir plus)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
Mingqi Yuan
Roger Creus Castanyer
Bo Li
Xin Jin
Wenjun Zeng
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control
Zhongyu Li
Xue Bin Peng
Pieter Abbeel
Sergey Levine
Koushil Sreenath
This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal rob… (voir plus)ots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.
Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View.
Raj Ghugare
Matthieu Geist
Benjamin Eysenbach
Some reinforcement learning (RL) algorithms have the capability of recombining together pieces of previously seen experience to solve a task… (voir plus) never seen before during training. This oft-sought property is one of the few ways in which dynamic programming based RL algorithms are considered different from supervised learning (SL) based RL algorithms. Yet, recent RL methods based on off-the-shelf SL algorithms achieve excellent results without an explicit mechanism for stitching; it remains unclear whether those methods forgo this important stitching property. This paper studies this question in the setting of goal-reaching problems. We show that the desirable stitching property corresponds to a form of generalization: after training on a distribution of (state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen \emph{together} in the training data. Our analysis shows that this sort of generalization is different from \emph{i.i.d.} generalization. This connection between stitching and generalization reveals why we should not expect existing RL methods based on SL to perform stitching, even in the limit of large datasets and models. We experimentally validate this result on carefully constructed datasets. This connection suggests a simple remedy, the same remedy for improving generalization in supervised learning: data augmentation. We propose a naive \emph{temporal} data augmentation approach and demonstrate that adding it to RL methods based on SL enables them to stitch together experience so that they succeed in navigating between states and goals unseen together during training.
Closing the Gap between TD Learning and Supervised Learning - A Generalisation Point of View
Raj Ghugare
Matthieu Geist
Benjamin Eysenbach
Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training. This oft-soug… (voir plus)ht property is one of the few ways in which RL methods based on dynamic-programming differ from RL methods based on supervised-learning (SL). Yet, certain RL methods based on off-the-shelf SL algorithms achieve excellent results without an explicit mechanism for stitching; it remains unclear whether those methods forgo this important stitching property. This paper studies this question for the problems of achieving a target goal state and achieving a target return value. Our main result is to show that the stitching property corresponds to a form of combinatorial generalization: after training on a distribution of (state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen together in the training data. Our analysis shows that this sort of generalization is different from i.i.d. generalization. This connection between stitching and generalisation reveals why we should not expect SL-based RL methods to perform stitching, even in the limit of large datasets and models. Based on this analysis, we construct new datasets to explicitly test for this property, revealing that SL-based methods lack this stitching property and hence fail to perform combinatorial generalization. Nonetheless, the connection between stitching and combinatorial generalisation also suggests a simple remedy for improving generalisation in SL: data augmentation. We propose a temporal data augmentation and demonstrate that adding it to SL-based methods enables them to successfully complete tasks not seen together during training. On a high level, this connection illustrates the importance of combinatorial generalization for data efficiency in time-series data beyond tasks beyond RL, like audio, video, or text.
Improving Intrinsic Exploration by Creating Stationary Objectives
Roger Creus Castanyer
Joshua Romoff
Intelligent Switching for Reset-Free RL
Darshan Patil
Janarthanan Rajendran
Intelligent Switching for Reset-Free RL
Darshan Patil
Janarthanan Rajendran
In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable. The \textit{resett… (voir plus)ing} assumption limits the potential of reinforcement learning in the real world, as providing resets to an agent usually requires the creation of additional handcrafted mechanisms or human interventions. Recent work aims to train agents (\textit{forward}) with learned resets by constructing a second (\textit{backward}) agent that returns the forward agent to the initial state. We find that the termination and timing of the transitions between these two agents are crucial for algorithm success. With this in mind, we create a new algorithm, Reset Free RL with Intelligently Switching Controller (RISC) which intelligently switches between the two agents based on the agent's confidence in achieving its current goal. Our new method achieves state-of-the-art performance on several challenging environments for reset-free RL.