Portrait of Glen Berseth

Glen Berseth

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Deep Learning
Reinforcement Learning

Biography

Glen Berseth is an assistant professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal and a core academic member of Mila – Quebec Artificial Intelligence Institute.

He is a Canada CIFAR AI Chair and co-directs the Robotics and Embodied AI Lab (REAL). He was formerly a postdoctoral researcher at Berkeley Artificial Intelligence Research (BAIR), working with Sergey Levine.

Berseth’s previous and current research has focused on solving sequential decision-making problems (planning) for real-world autonomous learning systems (robots). More specifically, his research has focused on human-robot collaboration, reinforcement, and continual-, meta-, multi-agent and hierarchical learning.

He has published in the top venues in robotics, machine learning and computer animation. He teaches a course on robot learning at Université de Montréal and at Mila, in which he covers the most recent research on machine learning techniques for creating generalist robots.

Current Students

PhD - Université de Montréal
Master's Research - Université de Montréal
Professional Master's - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Research Intern - Polytechnic
Collaborating researcher
Principal supervisor :
PhD - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
Postdoctorate - Université de Montréal
Co-supervisor :
Professional Master's - Université de Montréal
Research Intern - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Université de Montréal
PhD - Université de Montréal

Publications

Searching for High-Value Molecules Using Reinforcement Learning and Transformers
Raj Ghugare
Santiago Miret
Adriana Hugessen
Mariano Phielipp
Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However,… (see more) RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.
Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning
Jensen Gao
Siddharth Reddy
Anca Dragan
Sergey Levine
Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command… (see more) signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practice. In this paper, we propose a reinforcement learning algorithm to address this by training an interface to map raw command signals to actions using a combination of offline pre-training and online fine-tuning. To address the challenges posed by noisy command signals and sparse rewards, we develop a novel method for representing and inferring the user's long-term intent for a given trajectory. We primarily evaluate our method's ability to assist users who can only communicate through noisy, high-dimensional input channels through a user study in which 12 participants performed a simulated navigation task by using their eye gaze to modulate a 128-dimensional command signal from their webcam. The results show that our method enables successful goal navigation more often than a baseline directional interface, by learning to denoise user commands signals and provide shared autonomy assistance. We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well. Extensive ablation experiments with simulated user commands empirically motivate each component of our method.
Torque-Based Deep Reinforcement Learning for Task-and-Robot Agnostic Learning on Bipedal Robots Using Sim-to-Real Transfer
Donghyeon Kim
Mathew Schwartz
Jaeheung Park
In this letter, we review the question of which action space is best suited for controlling a real biped robot in combination with Sim2Real … (see more)training. Position control has been popular as it has been shown to be more sample efficient and intuitive to combine with other planning algorithms. However, for position control, gain tuning is required to achieve the best possible policy performance. We show that, instead, using a torque-based action space enables task-and-robot agnostic learning with less parameter tuning and mitigates the sim-to-reality gap by taking advantage of torque control's inherent compliance. Also, we accelerate the torque-based-policy training process by pre-training the policy to remain upright by compensating for gravity. The letter showcases the first successful sim-to-real transfer of a torque-based deep reinforcement learning policy on a real human-sized biped robot.
Maximum State Entropy Exploration using Predecessor and Successor Representations
Arnav Kumar Jain
Lucas Lehnert
Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misp… (see more)laced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition only on the current state or simply rely on making random open-loop exploratory moves. In this work, we propose
Robust and Versatile Bipedal Jumping Control through Reinforcement Learning
Zhongyu Li
Xue Bin Peng
Pieter Abbeel
Sergey Levine
Koushil Sreenath
Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning
Zhongyu Li
Xue Bin Peng
Pieter Abbeel
Sergey Levine
Koushil Sreenath
Towards Learning to Imitate from a Single Video Demonstration
Florian Golemo
Agents that can learn to imitate given video observation -- \emph{without direct access to state or action information} are more applicable … (see more)to learning in the natural world. However, formulating a reinforcement learning (RL) agent that facilitates this goal remains a significant challenge. We approach this challenge using contrastive training to learn a reward function comparing an agent's behaviour with a single demonstration. We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we also find that the inclusion of multi-task data and additional image encoding losses improve the temporal consistency of the learned rewards and, as a result, significantly improves policy learning. We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D. We show that our method outperforms current state-of-the-art techniques in these environments and can learn to imitate from a single video demonstration.
Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills using a Quadrupedal Robot
Yandong Ji
Zhongyu Li
Yinan Sun
Xue Bin Peng
Sergey Levine
Koushil Sreenath
We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Dev… (see more)eloping algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability during the control of a dynamic legged robot. Moreover, we need to consider motion planning to shoot the hard-to-model deformable ball rolling on the ground with uncertain friction to a desired location. In this paper, we propose a hierarchical framework that leverages deep reinforcement learning to train (a) a robust motion control policy that can track arbitrary motions and (b) a planning policy to decide the desired kicking motion to shoot a soccer ball to a target. We deploy the proposed framework on an A1 quadrupedal robot and enable it to accurately shoot the ball to random targets in the real world.
ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning
Sean Chen
Jensen Gao
Siddharth Reddy
Anca Dragan
Sergey Levine
Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e.g., webcam images of eye gaze) can… (see more) be challenging, especially when it involves inferring the user's desired action in the absence of a natural ‘default’ interface. Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem, and enables the interface to adapt to individual users. However, this approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse. We propose a hierarchical solution that learns efficiently from sparse user feedback: we use offline pre-training to acquire a latent embedding space of useful, high-level robot behaviors, which, in turn, enables the system to focus on using online user feedback to learn a mapping from user inputs to desired high-level behaviors. The key insight is that access to a pre-trained policy enables the system to learn more from sparse rewards than a naïve RL algorithm: using the pre-trained policy, the system can make use of successful task executions to relabel, in hindsight, what the user actually meant to do during unsuccessful executions. We evaluate our method primarily through a user study with 12 participants who perform tasks in three simulated robotic manipulation domains using a webcam and their eye gaze: flipping light switches, opening a shelf door to reach objects inside, and rotating a valve. The results show that our method successfully learns to map 128-dimensional gaze features to 7-dimensional joint torques from sparse rewards in under 10 minutes of online training, and seamlessly helps users who employ different gaze strategies, while adapting to distributional shift in webcam inputs, tasks, and environments
ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning
Sean Andrew Chen
Jensen Gao
Siddharth Reddy
Anca Dragan
Sergey Levine
Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e.g., webcam images of eye gaze) can… (see more) be challenging, especially when it involves inferring the user's desired action in the absence of a natural ‘default’ interface. Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem, and enables the interface to adapt to individual users. However, this approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse. We propose a hierarchical solution that learns efficiently from sparse user feedback: we use offline pre-training to acquire a latent embedding space of useful, high-level robot behaviors, which, in turn, enables the system to focus on using online user feedback to learn a mapping from user inputs to desired high-level behaviors. The key insight is that access to a pre-trained policy enables the system to learn more from sparse rewards than a naïve RL algorithm: using the pre-trained policy, the system can make use of successful task executions to relabel, in hindsight, what the user actually meant to do during unsuccessful executions. We evaluate our method primarily through a user study with 12 participants who perform tasks in three simulated robotic manipulation domains using a webcam and their eye gaze: flipping light switches, opening a shelf door to reach objects inside, and rotating a valve. The results show that our method successfully learns to map 128-dimensional gaze features to 7-dimensional joint torques from sparse rewards in under 10 minutes of online training, and seamlessly helps users who employ different gaze strategies, while adapting to distributional shift in webcam inputs, tasks, and environments
Heterogeneous Crowd Simulation Using Parametric Reinforcement Learning
Kaidong Hu
Michael Brandon Haworth
Vladimir Pavlovic
Petros Faloutsos
Mubbasir. T. Kapadia
Agent-based synthetic crowd simulation affords the cost-effective large-scale simulation and animation of interacting digital humans. Model-… (see more)based approaches have successfully generated a plethora of simulators with a variety of foundations. However, prior approaches have been based on statically defined models predicated on simplifying assumptions, limited video-based datasets, or homogeneous policies. Recent works have applied reinforcement learning to learn policies for navigation. However, these approaches may learn static homogeneous rules, are typically limited in their generalization to trained scenarios, and limited in their usability in synthetic crowd domains. In this article, we present a multi-agent reinforcement learning-based approach that learns a parametric predictive collision avoidance and steering policy. We show that training over a parameter space produces a flexible model across crowd configurations. That is, our goal-conditioned approach learns a parametric policy that affords heterogeneous synthetic crowds. We propose a model-free approach without centralization of internal agent information, control signals, or agent communication. The model is extensively evaluated. The results show policy generalization across unseen scenarios, agent parameters, and out-of-distribution parameterizations. The learned model has comparable computational performance to traditional methods. Qualitatively the model produces both expected (laminar flow, shuffling, bottleneck) and unexpected (side-stepping) emergent qualitative behaviours, and quantitatively the approach is performant across measures of movement quality.
Heterogeneous Crowd Simulation Using Parametric Reinforcement Learning
Kaidong Hu
Brandon Haworth
Vladimir Pavlovic
Petros Faloutsos
Mubbasir Kapadia
Agent-based synthetic crowd simulation affords the cost-effective large-scale simulation and animation of interacting digital humans. Model-… (see more)based approaches have successfully generated a plethora of simulators with a variety of foundations. However, prior approaches have been based on statically defined models predicated on simplifying assumptions, limited video-based datasets, or homogeneous policies. Recent works have applied reinforcement learning to learn policies for navigation. However, these approaches may learn static homogeneous rules, are typically limited in their generalization to trained scenarios, and limited in their usability in synthetic crowd domains. In this article, we present a multi-agent reinforcement learning-based approach that learns a parametric predictive collision avoidance and steering policy. We show that training over a parameter space produces a flexible model across crowd configurations. That is, our goal-conditioned approach learns a parametric policy that affords heterogeneous synthetic crowds. We propose a model-free approach without centralization of internal agent information, control signals, or agent communication. The model is extensively evaluated. The results show policy generalization across unseen scenarios, agent parameters, and out-of-distribution parameterizations. The learned model has comparable computational performance to traditional methods. Qualitatively the model produces both expected (laminar flow, shuffling, bottleneck) and unexpected (side-stepping) emergent qualitative behaviours, and quantitatively the approach is performant across measures of movement quality.