Robots are used worldwide in many industrial processes, and are getting better at helping humans every year. Machine learning algorithms are enhancing the capabilities of traditional robotics, and have become essential in making robots more adaptable to challenging situations.

People watch a robotic arm at work in a factory.

Embodied machine learning seeks to emulate the ways in which humans process information.  By using a wide variety of sensors on robotic hardware, researchers are able to help robots perceive, analyze, interact, and navigate through unpredictable physical environments. Mila researchers are tackling challenges such as better long-term planning for the use of robots in daily life, building representations of the world — including simultaneous localization and mapping — while creating better workflows to teach robotic agents new tasks.

Mila’s work also includes designing experimental machine learning algorithms to help robots perform better in industrial applications such as assembly and disassembly, meal preparation, and warehouse management.

Featured Projects

Engineers working with medical robotic equipment.


DROID is an initiative that aims to address the scarcity of comprehensive datasets in robotics, enhancing the development of manipulation algorithms for real-world applications.

Geometric shapes on a dark blue background.


ConceptGraphs is a mapping system that builds 3D scene-graphs of objects and their relationships, enabling robots to perform complex navigation and object manipulation tasks.

Photo of Glen Berseth

AI can help us make robots more adaptable to unpredictable environments, which will lead to true robotics assistants in the real world. 

Glen Berseth, Assistant Professor, Université de Montréal, Core Academic Member, Mila

Research Labs

Mila professors exploring the subject as part of their research.

Mila Faculty
Core Academic Member
Portrait of Glen Berseth
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Canada CIFAR AI Chair
Affiliate Member
Portrait of Samira Ebrahimi Kahou
Assistant Professor, University of Calgary, Deparment of Electrical and Software Engineering
Canada CIFAR AI Chair
Associate Industry Member
Portrait of Maxime Gasse
Senior Research Scientist, ServiceNow
Associate Academic Member
Portrait of Toby Dylan Hocking
Associate Academic Member
Portrait of Xue (Steve) Liu is unavailable
Full Professor, McGill University, School of Computer Science
Associate Academic Member
Portrait of David Meger
Associate Professor, McGill University, School of Computer Science
Core Academic Member
Portrait of AJung Moon
Assistant Professor, McGill University, Department of Electrical and Computer Engineering
Associate Academic Member
Portrait of Eilif Benjamin Muller
Assistant Professor, Université de Montréal, Department of Neurosciences
Canada CIFAR AI Chair
Core Academic Member
Portrait of Chris Pal
Full Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Canada CIFAR AI Chair
Core Academic Member
Portrait of Liam Paull
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Canada CIFAR AI Chair
Core Academic Member
Portrait of Doina Precup
Associate Professor, McGill University, School of Computer Science
Canada CIFAR AI Chair
Associate Academic Member
Portrait of Audrey Sedal
Assistant Professor, McGill University, Department of Mechanical Engineering

Featured Video

Prof. Glen Berseth studies how machine learning can be used to train more adaptable robots that could help humanity meet its most pressing challenges.


ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Qiao Gu
Alihusein Kuwajerwala
Sacha Morin
Krishna Murthy
Bipasha Sen
Aditya Agarwal
Corban Rivera
William Paul
Kirsty Ellis
Rama Chellappa
Chuang Gan
Celso M de Melo
Joshua B. Tenenbaum
Antonio Torralba
Florian Shkurti
For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and effi… (see more)cient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, which do not scale well in larger environments, nor do they contain semantic spatial relationships between entities in the environment, which are useful for downstream planning. In this work, we propose ConceptGraphs, an open-vocabulary graph-structured representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association. The resulting representations generalize to novel semantic classes, without the need to collect large 3D datasets or finetune models. We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts. (Project page: Explainer video: )
ConceptFusion: Open-set Multimodal 3D Mapping
Krishna Murthy
Alihusein Kuwajerwala
Qiao Gu
Mohd Omama
Tao Chen
Shuang Li
Alaa Maalouf
Ganesh Subramanian Iyer
Soroush Saryazdi
Nikhil Varma Keetha
Ayush Tewari
Joshua B. Tenenbaum
Celso M de Melo
Madhava Krishna
Florian Shkurti
Antonio Torralba
Building 3D maps of the environment is central to robot navigation, planning, and interaction with objects in a scene. Most existing approac… (see more)hes that integrate semantic concepts with 3D maps largely remain confined to the closed-set setting: they can only reason about a finite set of concepts, pre-defined at training time. Further, these maps can only be queried using class labels, or in recent work, using text prompts. We address both these issues with ConceptFusion, a scene representation that is: (i) fundamentally open-set, enabling reasoning beyond a closed set of concepts (ii) inherently multi-modal, enabling a diverse range of possible queries to the 3D map, from language, to images, to audio, to 3D geometry, all working in concert. ConceptFusion leverages the open-set capabilities of today’s foundation models pre-trained on internet-scale data to reason about concepts across modalities such as natural language, images, and audio. We demonstrate that pixel-aligned open-set features can be fused into 3D maps via traditional SLAM and multi-view fusion approaches. This enables effective zero-shot spatial reasoning, not needing any additional training or finetuning, and retains long-tailed concepts better than supervised approaches, outperforming them by more than 40% margin on 3D IoU. We extensively evaluate ConceptFusion on a number of real-world datasets, simulated home environments, a real-world tabletop manipulation task, and an autonomous driving platform. We showcase new avenues for blending foundation models with 3D open-set multimodal mapping.
Hierarchical Reinforcement Learning for Precise Soccer Shooting Skills using a Quadrupedal Robot
Yandong Ji
Zhongyu Li
Yinan Sun
Xue Bin Peng
Sergey Levine
Koushil Sreenath
We address the problem of enabling quadrupedal robots to perform precise shooting skills in the real world using reinforcement learning. Dev… (see more)eloping algorithms to enable a legged robot to shoot a soccer ball to a given target is a challenging problem that combines robot motion control and planning into one task. To solve this problem, we need to consider the dynamics limitation and motion stability during the control of a dynamic legged robot. Moreover, we need to consider motion planning to shoot the hard-to-model deformable ball rolling on the ground with uncertain friction to a desired location. In this paper, we propose a hierarchical framework that leverages deep reinforcement learning to train (a) a robust motion control policy that can track arbitrary motions and (b) a planning policy to decide the desired kicking motion to shoot a soccer ball to a target. We deploy the proposed framework on an A1 quadrupedal robot and enable it to accurately shoot the ball to random targets in the real world.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky
Karl Pertsch
Suraj Nair
Ashwin Balakrishna
Sudeep Dasari
Siddharth Karamcheti
Soroush Nasiriany
Mohan Kumar Srirama
Lawrence Yunliang Chen
Kirsty Ellis
Peter David Fagan
Joey Hejna
Masha Itkina
Marie Lepert
Ye Ma
Patrick Tree Miller
Jimmy Wu
Suneel Belkhale
S. Dass
Huy Ha … (see 79 more)
Arhan Jain
Abraham Lee
Youngwoon Lee
Marius Memmel
S. Park
Ilija Radosavovic
Kaiyuan Wang
Albert Zhan
Kevin Black
Cheng Chi
Kyle Beltran Hatch
Shan Lin
Jingpei Lu
Jean-Pierre Mercat
Abdul Rehman
Pannag R. Sanketi
Archit Sharma
C. Simpson
Q. Vương
Homer Rich Walke
Blake Wulfe
Ted Xiao
Jonathan Heewon Yang
Arefeh Yavary
Tony Z. Zhao
Christopher Agia
Rohan Baijal
Mateo Guaman Castro
D. Chen
Qiuyu Chen
Trinity Chung
Jaimyn Drake
Ethan Paul Foster
Jensen Gao
David Antonio Herrera
Minho Heo
Kyle Hsu
Jiaheng Hu
Donovon Jackson
Charlotte Le
Yunshuang Li
K. Lin
Roy Lin
Zehan Ma
Abhiram Maddukuri
Suvir Mirchandani
D. Morton
Tony Nguyen
Abigail O'Neill
R. Scalise
Derick Seale
Victor Son
Stephen Tian
Emi Tran
Andrew E. Wang
Yilin Wu
Annie Xie
Jingyun Yang
Patrick Yin
Yunchu Zhang
Osbert Bastani
Jeannette Bohg
Ken Goldberg
Abhinav Gupta
Abhishek Gupta
Dinesh Jayaraman
Joseph J. Lim
Jitendra Malik
Roberto Mart'in-Mart'in
Subramanian Ramamoorthy
Dorsa Sadigh
Shuran Song
Jiajun Wu
Michael C. Yip
Yuke Zhu
Thomas Kollar
Sergey Levine
Chelsea Finn
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and … (see more)robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.

Related Topics