Glen Berseth

charlie.gauthier@mila.quebec

Albert Zhan

PhD - Université de Montréal

albert.zhan@mila.quebec

Charlie Gauthier

PhD - Université de Montréal

Principal supervisor :

Liam Paull

Elham Daneshmand

PhD - McGill University

Principal supervisor :

Hsiu-Chin Lin

elham.daneshmand@mila.quebec

Esra'a Saleh

PhD - Université de Montréal

Co-supervisor :

Aaron Courville

esraa.saleh@mila.quebec

faisal.mohamed@mila.quebec

Faisal Mohamed

Collaborating researcher - Université de Montréal

Collaborating researcher - Université de Montréal

florence.cloutier@mila.quebec

Hongyao Tang

Postdoctorate - Université de Montréal

tang.hongyao@mila.quebec

Jiajun Fan

Research Intern - Université de Montréal

jiajun.fan@mila.quebec

Léa Demeule

Master's Research - Université de Montréal

lea.demeule@mila.quebec

michael.przystupa@mila.quebec

Michael Przystupa

Research Intern - Université de Montréal

Parnika Parnika

Professional Master's - Université de Montréal

parnika.parnika@mila.quebec

Angela Hu

Research Intern - McGill University University

qingchen.hu@mila.quebec

Raj Ghugare

Master's Research - Université de Montréal

raj.ghugare@mila.quebec

roger.creus-castanyer@mila.quebec

Roger Creus-Castanyer

Master's Research - Université de Montréal

PhD - Université de Montréal

siddarth.venkatraman@mila.quebec

Research Intern - Polytechnic

victor.gilbert@mila.quebec

Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation Primary tabs View Edit(active tab) Delete Revisions

Blog Posts

February 15, 2022

Jędrzej Orbik

Charles Sun

Coline Devin

Glen Berseth

Read the article

Publications

Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Adriana Hugessen

Roger Creus Castanyer

2023-10-20

NeurIPS.cc/2023/Workshop/IMOL (oral)

openreview.net

Searching for High-Value Molecules Using Reinforcement Learning and Transformers

Raj Ghugare

Santiago Miret

Adriana Hugessen

Mariano Phielipp

Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However,… (see more) RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.

2023-10-04

ArXiv (preprint)

Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning

Jensen Gao

Siddharth Reddy

Anca Dragan

Sergey Levine

Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command… (see more) signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practice. In this paper, we propose a reinforcement learning algorithm to address this by training an interface to map raw command signals to actions using a combination of offline pre-training and online fine-tuning. To address the challenges posed by noisy command signals and sparse rewards, we develop a novel method for representing and inferring the user's long-term intent for a given trajectory. We primarily evaluate our method's ability to assist users who can only communicate through noisy, high-dimensional input channels through a user study in which 12 participants performed a simulated navigation task by using their eye gaze to modulate a 128-dimensional command signal from their webcam. The results show that our method enables successful goal navigation more often than a baseline directional interface, by learning to denoise user commands signals and provide shared autonomy assistance. We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well. Extensive ablation experiments with simulated user commands empirically motivate each component of our method.

2023-10-01

2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

Torque-Based Deep Reinforcement Learning for Task-and-Robot Agnostic Learning on Bipedal Robots Using Sim-to-Real Transfer

Donghyeon Kim

Mathew Schwartz

Jaeheung Park

In this letter, we review the question of which action space is best suited for controlling a real biped robot in combination with Sim2Real … (see more)training. Position control has been popular as it has been shown to be more sample efficient and intuitive to combine with other planning algorithms. However, for position control, gain tuning is required to achieve the best possible policy performance. We show that, instead, using a torque-based action space enables task-and-robot agnostic learning with less parameter tuning and mitigates the sim-to-reality gap by taking advantage of torque control's inherent compliance. Also, we accelerate the torque-based-policy training process by pre-training the policy to remain upright by compensating for gravity. The letter showcases the first successful sim-to-real transfer of a torque-based deep reinforcement learning policy on a real human-sized biped robot.

2023-10-01

IEEE Robotics and Automation Letters (published)

Maximum State Entropy Exploration using Predecessor and Successor Representations

Arnav Kumar Jain

Lucas Lehnert

Irina Rish

Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misp… (see more)laced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition only on the current state or simply rely on making random open-loop exploratory moves. In this work, we propose

openreview.net

Robust and Versatile Bipedal Jumping Control through Reinforcement Learning

Zhongyu Li

Xue Bin Peng

Pieter Abbeel

Sergey Levine

Koushil Sreenath

2023-07-10

Robotics: Science and Systems XIX (published)

Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

Zhongyu Li

Xue Bin Peng

Pieter Abbeel

Sergey Levine

Koushil Sreenath

2023-01-01

arXiv.org (preprint)

Towards Learning to Imitate from a Single Video Demonstration

Florian Golemo

Chris Pal

Agents that can learn to imitate given video observation -- \emph{without direct access to state or action information} are more applicable … (see more)to learning in the natural world. However, formulating a reinforcement learning (RL) agent that facilitates this goal remains a significant challenge. We approach this challenge using contrastive training to learn a reward function comparing an agent's behaviour with a single demonstration. We use a Siamese recurrent neural network architecture to learn rewards in space and time between motion clips while training an RL policy to minimize this distance. Through experimentation, we also find that the inclusion of multi-task data and additional image encoding losses improve the temporal consistency of the learned rewards and, as a result, significantly improves policy learning. We demonstrate our approach on simulated humanoid, dog, and raptor agents in 2D and a quadruped and a humanoid in 3D. We show that our method outperforms current state-of-the-art techniques in these environments and can learn to imitate from a single video demonstration.

Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills using a Quadrupedal Robot

Yandong Ji

Zhongyu Li

Yinan Sun

Xue Bin Peng

Sergey Levine

Koushil Sreenath

We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Dev… (see more)eloping algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability during the control of a dynamic legged robot. Moreover, we need to consider motion planning to shoot the hard-to-model deformable ball rolling on the ground with uncertain friction to a desired location. In this paper, we propose a hierarchical framework that leverages deep reinforcement learning to train (a) a robust motion control policy that can track arbitrary motions and (b) a planning policy to decide the desired kicking motion to shoot a soccer ball to a target. We deploy the proposed framework on an A1 quadrupedal robot and enable it to accurately shoot the ball to random targets in the real world.

2022-10-23

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning

Sean Chen

Jensen Gao

Siddharth Reddy

Anca Dragan

Sergey Levine

Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e.g., webcam images of eye gaze) can… (see more) be challenging, especially when it involves inferring the user's desired action in the absence of a natural ‘default’ interface. Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem, and enables the interface to adapt to individual users. However, this approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse. We propose a hierarchical solution that learns efficiently from sparse user feedback: we use offline pre-training to acquire a latent embedding space of useful, high-level robot behaviors, which, in turn, enables the system to focus on using online user feedback to learn a mapping from user inputs to desired high-level behaviors. The key insight is that access to a pre-trained policy enables the system to learn more from sparse rewards than a naïve RL algorithm: using the pre-trained policy, the system can make use of successful task executions to relabel, in hindsight, what the user actually meant to do during unsuccessful executions. We evaluate our method primarily through a user study with 12 participants who perform tasks in three simulated robotic manipulation domains using a webcam and their eye gaze: flipping light switches, opening a shelf door to reach objects inside, and rotating a valve. The results show that our method successfully learns to map 128-dimensional gaze features to 7-dimensional joint torques from sparse rewards in under 10 minutes of online training, and seamlessly helps users who employ different gaze strategies, while adapting to distributional shift in webcam inputs, tasks, and environments

2022-05-23

2022 International Conference on Robotics and Automation (ICRA) (published)

ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning

Sean Andrew Chen

Jensen Gao

Siddharth Reddy

Anca Dragan

Sergey Levine

2022-02-05

ArXiv (preprint)

Heterogeneous Crowd Simulation Using Parametric Reinforcement Learning

Kaidong Hu

Michael Brandon Haworth

Vladimir Pavlovic

Petros Faloutsos

Mubbasir. T. Kapadia

Agent-based synthetic crowd simulation affords the cost-effective large-scale simulation and animation of interacting digital humans. Model-… (see more)based approaches have successfully generated a plethora of simulators with a variety of foundations. However, prior approaches have been based on statically defined models predicated on simplifying assumptions, limited video-based datasets, or homogeneous policies. Recent works have applied reinforcement learning to learn policies for navigation. However, these approaches may learn static homogeneous rules, are typically limited in their generalization to trained scenarios, and limited in their usability in synthetic crowd domains. In this article, we present a multi-agent reinforcement learning-based approach that learns a parametric predictive collision avoidance and steering policy. We show that training over a parameter space produces a flexible model across crowd configurations. That is, our goal-conditioned approach learns a parametric policy that affords heterogeneous synthetic crowds. We propose a model-free approach without centralization of internal agent information, control signals, or agent communication. The model is extensively evaluated. The results show policy generalization across unseen scenarios, agent parameters, and out-of-distribution parameterizations. The learned model has comparable computational performance to traditional methods. Qualitatively the model produces both expected (laminar flow, shuffling, bottleneck) and unexpected (side-stepping) emergent qualitative behaviours, and quantitatively the approach is performant across measures of movement quality.

2021-12-29

IEEE Transactions on Visualization and Computer Graphics (published)