David Meger

Valliappan Chidambaram Adaikkappan

Doctorat - McGill

Google Scholar

Wesley Chung

Doctorat - McGill

Co-superviseur⋅e :

Doina Precup

Farnoosh Faraji

Doctorat - McGill

Co-superviseur⋅e :

Maîtrise recherche - McGill

Co-superviseur⋅e :

Hsiu-Chin Lin

Zina Kamel

Maîtrise recherche - McGill

Co-superviseur⋅e :

Hsiu-Chin Lin

Sahand Rezaei-Shoshtari

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Maîtrise recherche - McGill

Steven Wang

Maîtrise recherche - McGill

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Publications

VOCALoco: Viability-Optimized Cost-aware Adaptive Locomotion

Simon Li

Anas El Houssaini

Recent advancements in legged robot locomotion have facilitated traversal over increasingly complex terrains. Despite this progress, many ex… (voir plus)isting approaches rely on end-to-end deep reinforcement learning (DRL), which poses limitations in terms of safety and interpretability, especially when generalizing to novel terrains. To overcome these challenges, we introduce VOCALoco, a modular skill-selection framework that dynamically adapts locomotion strategies based on perceptual input. Given a set of pre-trained locomotion policies, VOCALoco evaluates their viability and energy-consumption by predicting both the safety of execution and the anticipated cost of transport over a fixed planning horizon. This joint assessment enables the selection of policies that are both safe and energy-efficient, given the observed local terrain. We evaluate our approach on staircase locomotion tasks, demonstrating its performance in both simulated and real-world scenarios using a quadrupedal robot. Empirical results show that VOCALoco achieves improved robustness and safety during stair ascent and descent compared to a conventional end-to-end DRL policy

2026-01-31

IEEE Robotics and Automation Letters (publié)

Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations

Charlotte Morissette

Anas El Houssaini

Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characte… (voir plus)rized by a score function guiding a Stochastic Differential Equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs solver and score-matching errors, large data requirements, and inconsistencies in action generation. While less critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the diffusion sampling dynamics. Contraction pulls nearby flows closer to enhance robustness against solver and score-matching errors while reducing unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and computational cost. We evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world settings. Across benchmarks, CDPs often outperform baseline policies, with pronounced benefits under data scarcity.

2026-01-01

arXiv (Cornell University) (prépublication)

Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning

Yash Jhaveri

Patrick Shafto

In the pursuit of finding an optimal policy, reinforcement learning (RL) methods generally ignore the properties of learned policies apart f… (voir plus)rom their expected return. Thus, even when successful, it is difficult to characterize which policies will be learned and what they will do. In this work, we present a theoretical framework for policy optimization that guarantees convergence to a particular optimal policy, via vanishing entropy regularization and a temperature decoupling gambit. Our approach realizes an interpretable, diversity-preserving optimal policy as the regularization temperature vanishes and ensures the convergence of policy derived objects--value functions and return distributions. In a particular instance of our method, for example, the realized policy samples all optimal actions uniformly. Leveraging our temperature decoupling gambit, we present an algorithm that estimates, to arbitrary accuracy, the return distribution associated to its interpretable, diversity-preserving optimal policy.

2025-10-09

ArXiv (prépublication)

Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning

Yash Jhaveri

Patrick Shafto

2025-10-09

ArXiv (prépublication)

Large Pre-Trained Models for Bimanual Manipulation in 3D

Hanna Yurchyk

Wei-Di Chang

Gregory Dudek

2025-09-30

2025 IEEE-RAS 24th International Conference on Humanoid Robots (Humanoids) (publié)

Large Pre-Trained Models for Bimanual Manipulation in 3D

Hanna Yurchyk

Wei-Di Chang

Gregory Dudek

We investigate the integration of attention maps from a pre-trained Vision Transformer into voxel representations to enhance bimanual roboti… (voir plus)c manipulation. Specifically, we extract attention maps from DINOv2, a self-supervised ViT model, and interpret them as pixel-level saliency scores over RGB images. These maps are lifted into a 3D voxel grid, resulting in voxel-level semantic cues that are incorporated into a behavior cloning policy. When integrated into a state-of-the-art voxel-based policy, our attention-guided featurization yields an average absolute improvement of 8.2% and a relative gain of 21.9% across all tasks in the RLBench bimanual benchmark.

2025-09-24

ArXiv (prépublication)

Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning

Yash Jhaveri

Patrick Shafto

2025-09-17

NeurIPS.cc/2025/Conference (poster)

Epistemic Uncertainty Estimation in Regression Ensemble Models with Pairwise Epistemic Estimators

Lucas Berry

This work introduces a novel approach, Pairwise Epistemic Estimators (PairEpEsts), for epistemic uncertainty estimation in ensemble models f… (voir plus)or regression tasks using pairwise-distance estimators (PaiDEs). By utilizing the pairwise distances between model components, PaiDEs establish bounds on entropy. We leverage this capability to enhance the performance of Bayesian Active Learning by Disagreement (BALD). Notably, unlike sample-based Monte Carlo estimators, PairEpEsts can estimate epistemic uncertainty up to 100 times faster and demonstrate superior performance in higher dimensions. To validate our approach, we conducted a varied series of regression experiments on commonly used benchmarks: 1D sinusoidal data, *Pendulum*, *Hopper*, *Ant*, and *Humanoid*, demonstrating PairEpEsts’ advantage over baselines in high-dimensional regression active learning.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

Generalizable Imitation Learning Through Pre-Trained Representations

Wei-Di Chang

Francois Hogan

Scott Fujimoto

Gregory Dudek

In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abil… (voir plus)ities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.

2025-05-19

2025 IEEE International Conference on Robotics and Automation (ICRA) (publié)

Topological mapping for traversability-aware long-range navigation in off-road terrain

Jean-François Tremblay

Julie Alhosh

Louis Petit

Faraz Lotfi

Lara Landauro

Autonomous robots navigating in off-road terrain like forests open new opportunities for automation. While off-road navigation has been stud… (voir plus)ied, existing work often relies on clearly delineated pathways. We present a method allowing for long-range planning, exploration and low-level control in unknown off-trail forest terrain, using vision and GPS only. We represent outdoor terrain with a topological map, which is a set of panoramic snapshots connected with edges containing traversability information. A novel traversability analysis method is demonstrated, predicting the existence of a safe path towards a target in an image. Navigating between nodes is done using goal-conditioned behavior cloning, leveraging the power of a pretrained vision transformer. An exploration planner is presented, efficiently covering an unknown off-road area with unknown traversability using a frontiers-based approach. The approach is successfully deployed to autonomously explore two 400 meters squared forest sites unseen during training, in difficult conditions for navigation.

2025-05-19

2025 IEEE International Conference on Robotics and Automation (ICRA) (publié)

Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models

Lucas Berry

Axel Brando

Wei-Di Chang

Juan Higuera

2025-05-01

arXiv (publié)

Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models

Lucas Berry

Axel Brando

Wei-Di Chang

Juan Higuera

2025-04-30

arXiv (publié)