David Meger

Valliappan Chidambaram Adaikkappan

PhD - McGill University

Google Scholar

Wesley Chung

PhD - McGill University

Co-supervisor :

Doina Precup

Farnoosh Faraji

PhD - McGill University

Co-supervisor :

Master's Research - McGill University

Co-supervisor :

Hsiu-Chin Lin

Zina Kamel

Master's Research - McGill University

Co-supervisor :

Hsiu-Chin Lin

Arian Sargazi

PhD - McGill University

Junming(Clark) Shi

Master's Research - McGill University

Steven Wang

Master's Research - McGill University

Harley Wiltzer

PhD - McGill University

Co-supervisor :

Marc Gendron-Bellemare

Website

Google Scholar

Publications

Drift Q-Learning

Offline reinforcement learning requires improving a policy from fixed data while avoiding out-of-distribution actions with unreliable value … (see more)estimates. Diffusion and flow policies handle this trade-off by modeling the behavior distribution to regularize the RL objective, but they require iterative denoising, solver integrations, and in more efficient variants, distillation or other approximations at inference. We propose DriftQL, which combines a drift-based behavioral regularizer with critic-driven policy improvement. The value signal biases the policy toward high-value regions of the data support, while attraction and repulsion together keep generated actions near the data and prevent collapse onto a single mode. DriftQL is implemented as a single network with a unified training objective and generates actions in a single forward pass. On D4RL and OGBench, DriftQL consistently outperforms diffusion and flow methods, advancing the state of the art. Under degraded data quality, where the baselines visibly struggle, DriftQL remains close to its clean-data performance, positioning it as a promising alternative to diffusion and flow-based methods while maintaining the simplicity and efficiency of deterministic approaches. Project page: https://driftql.github.io/

2026-05-28

arXiv (preprint)

Valliappan Chidambaram Adaikkappan

CA2: Code-Aware Agent for Automated Game Testing

Vincent Martineau

Joshua Romoff

Automated game testing is important for verifying game functionality, but it remains a costly and time-consuming process. Manual testing oft… (see more)en misses edge cases, and current automated methods struggle to provide full code coverage. Prior work has explored reinforcement learning (RL) for game testing, but without leveraging internal code signals such as the call stack. We present Code Aware Agent (CA2), which uses call stack information to learn effective testing strategies. The agent receives the current function call trace along with the game state and learns to reach specific target functions. We instrument two types of environments, 1) State-based and 2) Image-based, with support for efficient call stack extraction. Through experimental evaluation, we find that CA2 achieves consistent improvement over the non-code aware baselines, which does not leverage call stack information. Our results show that incorporating code signals like the call stack enables more effective and targeted game testing.

2026-05-12

arXiv (preprint)

Tactile Modality Fusion for Vision-Language-Action Models

Charlotte Morissette

Wei-Di Chang

We propose TacFiLM, a lightweight modality-fusion approach that integrates visual-tactile signals into vision-language-action (VLA) models. … (see more)While recent advances in VLA models have introduced robot policies that are both generalizable and semantically grounded, these models mainly rely on vision-based perception. Vision alone, however, cannot capture the complex interaction dynamics that occur during contact-rich manipulation, including contact forces, surface friction, compliance, and shear. While recent attempts to integrate tactile signals into VLA models often increase complexity through token concatenation or large-scale pretraining, the heavy computational demands of behavioural models necessitate more lightweight fusion strategies. To address these challenges, TacFiLM outlines a post-training finetuning approach that conditions intermediate visual features on pretrained tactile representations using feature-wise linear modulation (FiLM). Experimental results on insertion tasks demonstrate consistent improvements in success rate, direct insertion performance, completion time, and force stability across both in-distribution and out-of-distribution tasks. Together, these results support our method as an effective approach to integrating tactile signals into VLA models, improving contact-rich manipulation behaviours.

2026-03-14

arXiv (preprint)

Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

Valliappan CA

Sai Rajeswar

Pietro Mazzaglia

Goal-conditioned reinforcement learning (GCRL) requires agents to learn effective state and goal representations, which represents a challen… (see more)ging problem, especially in high-dimensional vision-based environments, as differences in the observations can be uncorrelated with dynamical distances. Classical deep reinforcement learning techniques often fail to capture the alignment between state and goal spaces, requiring additional representation learning techniques. To address this, we propose

2026-03-01

World Models @ International Conference on Learning Representations (published)

Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings

Zhizun Wang

Learning to coordinate many agents in partially observable and highly dynamic environments requires both informative representations and dat… (see more)a-efficient training. To address this challenge, we present a novel model-based multi-agent reinforcement learning framework that unifies joint state-action representation learning with imaginative roll-outs. We design a world model trained with variational auto-encoders and augment the model using the state-action learned embedding (SALE). SALE is injected into both the imagination module that forecasts plausible future roll-outs and the joint agent network whose individual action values are combined through a mixing network to estimate the joint action-value function. By coupling imagined trajectories with SALE-based action values, the agents acquire a richer understanding of how their choices influence collective outcomes, leading to improved long-term planning and optimization under limited real-environment interactions. Empirical studies on well-established multi-agent benchmarks, including StarCraft II Micro-Management, Multi-Agent MuJoCo, and Level-Based Foraging challenges, demonstrate consistent gains of our method over baseline algorithms and highlight the effectiveness of joint state-action learned embeddings within a multi-agent model-based paradigm.

2026-02-12

arXiv (preprint)

VOCALoco: Viability-Optimized Cost-aware Adaptive Locomotion

Simon Li

Anas El Houssaini

Recent advancements in legged robot locomotion have facilitated traversal over increasingly complex terrains. Despite this progress, many ex… (see more)isting approaches rely on end-to-end deep reinforcement learning (DRL), which poses limitations in terms of safety and interpretability, especially when generalizing to novel terrains. To overcome these challenges, we introduce VOCALoco, a modular skill-selection framework that dynamically adapts locomotion strategies based on perceptual input. Given a set of pre-trained locomotion policies, VOCALoco evaluates their viability and energy-consumption by predicting both the safety of execution and the anticipated cost of transport over a fixed planning horizon. This joint assessment enables the selection of policies that are both safe and energy-efficient, given the observed local terrain. We evaluate our approach on staircase locomotion tasks, demonstrating its performance in both simulated and real-world scenarios using a quadrupedal robot. Empirical results show that VOCALoco achieves improved robustness and safety during stair ascent and descent compared to a conventional end-to-end DRL policy

2026-01-31

IEEE Robotics and Automation Letters (published)

Contractive Diffusion Policies

Charlotte Morissette

Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characte… (see more)rized by a score function guiding a Stochastic Differential Equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs solver and score-matching errors, large data requirements, and inconsistencies in action generation. While less critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce **C**ontractive **D**iffusion **P**olicies (CDPs) to induce contractive behavior in the diffusion sampling dynamics. Contraction pulls nearby flows closer to enhance robustness against solver and score-matching errors while reducing unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and computational cost. We evaluate CDPs for offline learning by conducting extensive experiments in simulation and real world settings. Across benchmarks, CDPs often outperform baseline policies, with pronounced benefits under data scarcity. Project page: https://contractive-diffusion.github.io

2026-01-25

International Conference on Learning Representations (poster)

Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations

Charlotte Morissette

Anas El Houssaini

Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characte… (see more)rized by a score function guiding a Stochastic Differential Equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs solver and score-matching errors, large data requirements, and inconsistencies in action generation. While less critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the diffusion sampling dynamics. Contraction pulls nearby flows closer to enhance robustness against solver and score-matching errors while reducing unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and computational cost. We evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world settings. Across benchmarks, CDPs often outperform baseline policies, with pronounced benefits under data scarcity.

2025-12-31

arXiv (preprint)

Large Pre-Trained Models for Bimanual Manipulation in 3D

Hanna Yurchyk

Wei-Di Chang

Gregory Dudek

2025-09-29

IEEE-RAS Conference on Humanoid Robots (published)

Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning

Yash Jhaveri

Harley Wiltzer

Patrick Shafto

Bellemare Marc-Emmanuel

In the pursuit of finding an optimal policy, reinforcement learning (RL) methods generally ignore the properties of learned policies apart f… (see more)rom their expected return. Thus, even when successful, it is difficult to characterize which policies will be learned and what they will do. In this work, we present a theoretical framework for policy optimization that guarantees convergence to a particular optimal policy, via vanishing entropy regularization and a temperature decoupling gambit. Our approach realizes an interpretable, diversity-preserving optimal policy as the regularization temperature vanishes and ensures the convergence of policy derived objects--value functions and return distributions. In a particular instance of our method, for example, the realized policy samples all optimal actions uniformly. Leveraging our temperature decoupling gambit, we present an algorithm that estimates, to arbitrary accuracy, the return distribution associated to its interpretable, diversity-preserving optimal policy.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

Epistemic Uncertainty Estimation in Regression Ensemble Models with Pairwise Epistemic Estimators

Lucas Berry

This work introduces a novel approach, Pairwise Epistemic Estimators (PairEpEsts), for epistemic uncertainty estimation in ensemble models f… (see more)or regression tasks using pairwise-distance estimators (PaiDEs). By utilizing the pairwise distances between model components, PaiDEs establish bounds on entropy. We leverage this capability to enhance the performance of Bayesian Active Learning by Disagreement (BALD). Notably, unlike sample-based Monte Carlo estimators, PairEpEsts can estimate epistemic uncertainty up to 100 times faster and demonstrate superior performance in higher dimensions. To validate our approach, we conducted a varied series of regression experiments on commonly used benchmarks: 1D sinusoidal data, *Pendulum*, *Hopper*, *Ant*, and *Humanoid*, demonstrating PairEpEsts’ advantage over baselines in high-dimensional regression active learning.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

Generalizable Imitation Learning Through Pre-Trained Representations

Wei-Di Chang

Francois Hogan

Scott Fujimoto

Gregory Dudek

In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abil… (see more)ities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.

2025-05-18

2025 IEEE International Conference on Robotics and Automation (ICRA) (published)