Glen Berseth

roger.creus-castanyer@mila.quebec

Roger Creus-Castanyer

Master's Research - Université de Montréal

PhD - McGill University

Principal supervisor :

Hsiu-Chin Lin

elham.daneshmand@mila.quebec

Léa Demeule

Master's Research - Université de Montréal

lea.demeule@mila.quebec

Jiajun Fan

Research Intern - Université de Montréal

jiajun.fan@mila.quebec

charlie.gauthier@mila.quebec

Charlie Gauthier

PhD - Université de Montréal

Principal supervisor :

Liam Paull

Raj Ghugare

Master's Research - Université de Montréal

raj.ghugare@mila.quebec

Research Intern - Polytechnic

victor.gilbert@mila.quebec

Angela Hu

Research Intern - McGill University University

qingchen.hu@mila.quebec

Adriana Knatchbull-Hugessen

Master's Research - Université de Montréal

adriana.knatchbull-hugessen@mila.quebec

faisal.mohamed@mila.quebec

Faisal Mohamed

Collaborating researcher - Université de Montréal

Professional Master's - Université de Montréal

parnika.parnika@mila.quebec

michael.przystupa@mila.quebec

Michael Przystupa

Research Intern - Université de Montréal

Esra'a Saleh

PhD - Université de Montréal

Co-supervisor :

Aaron Courville

esraa.saleh@mila.quebec

Hongyao Tang

Postdoctorate - Université de Montréal

tang.hongyao@mila.quebec

siddarth.venkatraman@mila.quebec

Siddarth Venkatraman

PhD - Université de Montréal

Fully Autonomous Real-World Reinforcement Learning with Applications to Mobile Manipulation Primary tabs View Edit(active tab) Delete Revisions

Albert Zhan

PhD - Université de Montréal

albert.zhan@mila.quebec

Blog Posts

February 15, 2022

Jędrzej Orbik

Charles Sun

Coline Devin

Glen Berseth

Read the article

Publications

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Alexander Khazatsky

Karl Pertsch

Suraj Nair

Ashwin Balakrishna

Sudeep Dasari

Siddharth Karamcheti

Soroush Nasiriany

Mohan Kumar Srirama

Lawrence Yunliang Chen

Kirsty Ellis

Peter David Fagan

Joey Hejna

Masha Itkina

Marie Lepert

Ye Ma

Patrick Tree Miller

Jimmy Wu

Suneel Belkhale

S. Dass

Huy Ha … (see 79 more)

Arhan Jain

Abraham Lee

Youngwoon Lee

Marius Memmel

S. Park

Ilija Radosavovic

Kaiyuan Wang

Albert Zhan

Kevin Black

Cheng Chi

Kyle Beltran Hatch

Shan Lin

Jingpei Lu

Jean-Pierre Mercat

Abdul Rehman

Pannag R. Sanketi

Archit Sharma

C. Simpson

Q. Vương

Homer Rich Walke

Blake Wulfe

Ted Xiao

Jonathan Heewon Yang

Arefeh Yavary

Tony Z. Zhao

Christopher Agia

Rohan Baijal

Mateo Guaman Castro

D. Chen

Qiuyu Chen

Trinity Chung

Jaimyn Drake

Ethan Paul Foster

Jensen Gao

David Antonio Herrera

Minho Heo

Kyle Hsu

Jiaheng Hu

Donovon Jackson

Charlotte Le

Yunshuang Li

K. Lin

Roy Lin

Zehan Ma

Abhiram Maddukuri

Suvir Mirchandani

D. Morton

Tony Nguyen

Abigail O'Neill

R. Scalise

Derick Seale

Victor Son

Stephen Tian

Emi Tran

Andrew E. Wang

Yilin Wu

Annie Xie

Jingyun Yang

Patrick Yin

Yunchu Zhang

Osbert Bastani

Jeannette Bohg

Ken Goldberg

Abhinav Gupta

Abhishek Gupta

Dinesh Jayaraman

Joseph J. Lim

Jitendra Malik

Roberto Mart'in-Mart'in

Subramanian Ramamoorthy

Dorsa Sadigh

Shuran Song

Jiajun Wu

Michael C. Yip

Yuke Zhu

Thomas Kollar

Sergey Levine

Chelsea Finn

The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and … (see more)robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.

2024-03-19

ArXiv (preprint)

arxiv.org

Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

Zhongyu Li

Xue Bin Peng

Pieter Abbeel

Sergey Levine

Koushil Sreenath

This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal rob… (see more)ots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.

2024-01-30

ArXiv (preprint)

arxiv.org

Closing the Gap between TD Learning and Supervised Learning - A Generalisation Point of View

Raj Ghugare

Matthieu Geist

Benjamin Eysenbach

Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training. This oft-soug… (see more)ht property is one of the few ways in which RL methods based on dynamic-programming differ from RL methods based on supervised-learning (SL). Yet, certain RL methods based on off-the-shelf SL algorithms achieve excellent results without an explicit mechanism for stitching; it remains unclear whether those methods forgo this important stitching property. This paper studies this question for the problems of achieving a target goal state and achieving a target return value. Our main result is to show that the stitching property corresponds to a form of combinatorial generalization: after training on a distribution of (state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen together in the training data. Our analysis shows that this sort of generalization is different from i.i.d. generalization. This connection between stitching and generalisation reveals why we should not expect SL-based RL methods to perform stitching, even in the limit of large datasets and models. Based on this analysis, we construct new datasets to explicitly test for this property, revealing that SL-based methods lack this stitching property and hence fail to perform combinatorial generalization. Nonetheless, the connection between stitching and combinatorial generalisation also suggests a simple remedy for improving generalisation in SL: data augmentation. We propose a temporal data augmentation and demonstrate that adding it to SL-based methods enables them to successfully complete tasks not seen together during training. On a high level, this connection illustrates the importance of combinatorial generalization for data efficiency in time-series data beyond tasks beyond RL, like audio, video, or text.

2024-01-16

ICLR.cc/2024/Conference (poster)

Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View.

Raj Ghugare

Matthieu Geist

Benjamin Eysenbach

Some reinforcement learning (RL) algorithms have the capability of recombining together pieces of previously seen experience to solve a task… (see more) never seen before during training. This oft-sought property is one of the few ways in which dynamic programming based RL algorithms are considered different from supervised learning (SL) based RL algorithms. Yet, recent RL methods based on off-the-shelf SL algorithms achieve excellent results without an explicit mechanism for stitching; it remains unclear whether those methods forgo this important stitching property. This paper studies this question in the setting of goal-reaching problems. We show that the desirable stitching property corresponds to a form of generalization: after training on a distribution of (state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen \emph{together} in the training data. Our analysis shows that this sort of generalization is different from \emph{i.i.d.} generalization. This connection between stitching and generalization reveals why we should not expect existing RL methods based on SL to perform stitching, even in the limit of large datasets and models. We experimentally validate this result on carefully constructed datasets. This connection suggests a simple remedy, the same remedy for improving generalization in supervised learning: data augmentation. We propose a naive \emph{temporal} data augmentation approach and demonstrate that adding it to RL methods based on SL enables them to stitch together experience so that they succeed in navigating between states and goals unseen together during training.

2024-01-16

ICLR.cc/2024/Conference (poster)

Improving Intrinsic Exploration by Creating Stationary Objectives

Roger Creus Castanyer

Joshua Romoff

2024-01-16

ICLR.cc/2024/Conference (poster)

Intelligent Switching for Reset-Free RL

Darshan Patil

Janarthanan Rajendran

Sarath Chandar Anbil Parthipan

In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable. The \textit{resett… (see more)ing} assumption limits the potential of reinforcement learning in the real world, as providing resets to an agent usually requires the creation of additional handcrafted mechanisms or human interventions. Recent work aims to train agents (\textit{forward}) with learned resets by constructing a second (\textit{backward}) agent that returns the forward agent to the initial state. We find that the termination and timing of the transitions between these two agents are crucial for algorithm success. With this in mind, we create a new algorithm, Reset Free RL with Intelligently Switching Controller (RISC) which intelligently switches between the two agents based on the agent's confidence in achieving its current goal. Our new method achieves state-of-the-art performance on several challenging environments for reset-free RL.

2024-01-16

ICLR.cc/2024/Conference (poster)

Intelligent Switching for Reset-Free RL

Darshan Patil

Janarthanan Rajendran

Sarath Chandar Anbil Parthipan

2024-01-16

ICLR.cc/2024/Conference (poster)

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Siddarth Venkatraman

Shivesh Khaitan

Ravi Tej Akella

John Dolan

Jeff Schneider

2024-01-16

ICLR.cc/2024/Conference (poster)

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Siddarth Venkatraman

Shivesh Khaitan

Ravi Tej Akella

John Dolan

Jeff Schneider

2024-01-16

ICLR.cc/2024/Conference (poster)

Searching for High-Value Molecules Using Reinforcement Learning and Transformers

Raj Ghugare

Santiago Miret

Adriana Hugessen

Mariano Phielipp

2024-01-16

ICLR.cc/2024/Conference (poster)

Adaptive Resolution Residual Networks

Léa Demeule

Mahtab Sandhu

We introduce Adaptive Resolution Residual Networks (ARRNs), a form of neural operator that enables the creation of networks for signal-based… (see more) tasks that can be rediscretized to suit any signal resolution. ARRNs are composed of a chain of Laplacian residuals that each contain ordinary layers, which do not need to be rediscretizable for the whole network to be rediscretizable. ARRNs have the property of requiring a lower number of Laplacian residuals for exact evaluation on lower-resolution signals, which greatly reduces computational cost. ARRNs also implement Laplacian dropout, which encourages networks to become robust to low-bandwidth signals. ARRNs can thus be trained once at high-resolution and then be rediscretized on the fly at a suitable resolution with great robustness.

2023-10-31

NeurIPS.cc/2023/Workshop/DLDE (published)

Improving Intrinsic Exploration by Creating Stationary Objectives

Roger Creus Castanyer

Joshua Romoff

2023-10-27

ArXiv (preprint)