David Meger

Wesley Chung

PhD - McGill University

Co-supervisor :

chungwes@mila.quebec

Farnoosh Faraji

PhD - McGill University

Co-supervisor :

farnoosh.faraji@mila.quebec

Scott Fujimoto

PhD - McGill University

Principal supervisor :

fujimots@mila.quebec

Farinaz Mozifian

PhD - McGill University

Principal supervisor :

Joelle Pineau

mozifiam@mila.quebec

Louis Petit

Postdoctorate - McGill University

louis.petit@mila.quebec

Sahand Rezaei-Shoshtari

PhD - McGill University

Principal supervisor :

sahand.rezaei-shoshtari@mila.quebec

Jean-François Tremblay

PhD - McGill University

trembljf@mila.quebec

Steven Wang

Master's Research - McGill University

zhizun.wang@mila.quebec

Harley Wiltzer

PhD - McGill University

Co-supervisor :

Marc Gendron-Bellemare

wiltzerh@mila.quebec

Yurchyk Yurchyk

Master's Research - McGill University

hanna.yurchyk@mila.quebec

Publications

Constrained Robotic Navigation on Preferred Terrains Using LLMs and Speech Instruction: Exploiting the Power of Adverbs

Faraz Lotfi

Farnoosh Faraji

Nikhil Kakodkar

Travis Manderson

2024-04-02

ArXiv (preprint)

Generalizable Imitation Learning Through Pre-Trained Representations

Wei-Di Chang

Francois R. Hogan

In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abil… (see more)ities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.

2023-11-15

ArXiv (preprint)

Imitation Learning from Observation through Optimal Transport

Wei-Di Chang

Scott Fujimoto

2023-10-02

ArXiv (preprint)

Uncertainty-aware hybrid paradigm of nonlinear MPC and model-based RL for offroad navigation: Exploration of transformers in the predictive model

Faraz Lotfi

Khalil Virji

Farnoosh Faraji

Lucas Berry

Andrew Holliday

In this paper, we investigate a hybrid scheme that combines nonlinear model predictive control (MPC) and model-based reinforcement learning … (see more)(RL) for navigation planning of an autonomous model car across offroad, unstructured terrains without relying on predefined maps. Our innovative approach takes inspiration from BADGR, an LSTM-based network that primarily concentrates on environment modeling, but distinguishes itself by substituting LSTM modules with transformers to greatly elevate the performance our model. Addressing uncertainty within the system, we train an ensemble of predictive models and estimate the mutual information between model weights and outputs, facilitating dynamic horizon planning through the introduction of variable speeds. Further enhancing our methodology, we incorporate a nonlinear MPC controller that accounts for the intricacies of the vehicle's model and states. The model-based RL facet produces steering angles and quantifies inherent uncertainty. At the same time, the nonlinear MPC suggests optimal throttle settings, striking a balance between goal attainment speed and managing model uncertainty influenced by velocity. In the conducted studies, our approach excels over the existing baseline by consistently achieving higher metric values in predicting future events and seamlessly integrating the vehicle's kinematic model for enhanced decision-making. The code and the evaluation data are available at https://github.com/FARAZLOTFI/offroad_autonomous_navigation/).

2023-10-01

ArXiv (preprint)

For SALE: State-Action Representation Learning for Deep Reinforcement Learning

Scott Fujimoto

Wei-Di Chang

Edward J. Smith

Shixiang Shane Gu

In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked… (see more) for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.

2023-09-21

NeurIPS.cc/2023/Conference (poster)

openreview.net

Leveraging World Model Disentanglement in Value-Based Multi-Agent Reinforcement Learning

Zhizun Wang

In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentang… (see more)led World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the environment dynamics and produce imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.

2023-09-08

ArXiv (preprint)

Efficient Epistemic Uncertainty Estimation in Regression Ensemble Models Using Pairwise-Distance Estimators

Lucas Berry

This work introduces an efficient novel approach for epistemic uncertainty estimation for ensemble models for regression tasks using pairwis… (see more)e-distance estimators (PaiDEs). Utilizing the pairwise-distance between model components, these estimators establish bounds on entropy. We leverage this capability to enhance the performance of Bayesian Active Learning by Disagreement (BALD). Notably, unlike sample-based Monte Carlo estimators, PaiDEs exhibit a remarkable capability to estimate epistemic uncertainty at speeds up to 100 times faster while covering a significantly larger number of inputs at once and demonstrating superior performance in higher dimensions. To validate our approach, we conducted a varied series of regression experiments on commonly used benchmarks: 1D sinusoidal data,

2023-08-25

ArXiv (preprint)

Hypernetworks for Zero-shot Transfer in Reinforcement Learning

Sahand Rezaei-Shoshtari

Charlotte Morissette

Francois Hogan

In this paper, hypernetworks are trained to generate behaviors across a range of unseen task conditions, via a novel TD-based training objec… (see more)tive and data from a set of near-optimal RL solutions for training tasks. This work relates to meta RL, contextual RL, and transfer learning, with a particular focus on zero-shot performance at test time, enabled by knowledge of the task parameters (also known as context). Our technical approach is based upon viewing each RL algorithm as a mapping from the MDP specifics to the near-optimal value function and policy and seek to approximate it with a hypernetwork that can generate near-optimal value functions and policies, given the parameters of the MDP. We show that, under certain conditions, this mapping can be considered as a supervised learning problem. We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from DeepMind Control Suite. Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.

2023-06-26

Proceedings of the AAAI Conference on Artificial Intelligence (published)

ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence

Dmitriy Rivkin

Nikhil Kakodkar

Oliver Limoyo

Xue (Steve) Liu

Francois Hogan

Our work examines the way in which large language models can be used for robotic planning and sampling in the context of automated photograp… (see more)hic documentation. Specifically, we illustrate how to produce a photo-taking robot with an exceptional level of semantic awareness by leveraging recent advances in general purpose language (LM) and vision-language (VLM) models. Given a high-level description of an event we use an LM to generate a natural-language list of photo descriptions that one would expect a photographer to capture at the event. We then use a VLM to identify the best matches to these descriptions in the robot's video stream. The photo portfolios generated by our method are consistently rated as more appropriate to the event by human evaluators than those generated by existing methods.

2023-06-02

2023 IEEE International Conference on Robotics and Automation (ICRA) (published)

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Prakash Panangaden

Sahand Rezaei-Shoshtari

Rosie Zhao

2023-05-09

ArXiv (preprint)

Normalizing Flow Ensembles for Rich Aleatoric and Epistemic Uncertainty Modeling

Lucas Berry

2023-02-02

ArXiv (preprint)

Learning active tactile perception through belief-space control

Jean-François Tremblay

Johanna Hansen

Francois Hogan

Robot operating in an open world can encounter novel objects with unknown physical properties, such as mass, friction, or size. It is desira… (see more)ble to be able to sense those property through contact-rich interaction, before performing downstream tasks with the objects. We propose a method for autonomously learning active tactile perception policies, by learning a generative world model leveraging a differentiable bayesian filtering algorithm, and designing an information- gathering model predictive controller. We test the method on three simulated tasks: mass estimation, height estimation and toppling height estimation. Our method is able to discover policies which gather information about the desired property in an intuitive manner.

2022-05-12

ICRA.org/2022/Workshop/Contact-Rich (poster)