Portrait of David Meger

David Meger

Associate Academic Member
Associate Professor, McGill University, School of Computer Science
Research Topics
Computer Vision
Reinforcement Learning

Biography

David Meger is an associate professor at McGill University’s School of Computer Science.

He co-directs the Mobile Robotics Lab within the Centre for Intelligent Machines, one of Canada's largest and longest-running robotics research groups. He was the general chair of Canada’s first joint CS-CAN conference in 2023.

Meger's research contributions include visually guided robots powered by active vision and learning, deep reinforcement learning models that are widely cited and used by researchers and industry worldwide, and field robotics that allow for autonomous deployment underwater and on land.

Current Students

Master's Research - McGill University
Collaborating researcher - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :
PhD - McGill University
Master's Research - McGill University
Master's Research - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University

Publications

Leveraging World Model Disentanglement in Value-Based Multi-Agent Reinforcement Learning
Zhizun Wang
In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentang… (see more)led World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the environment dynamics and produce imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.
Efficient Epistemic Uncertainty Estimation in Regression Ensemble Models Using Pairwise-Distance Estimators
Lucas Berry
This work introduces an efficient novel approach for epistemic uncertainty estimation for ensemble models for regression tasks using pairwis… (see more)e-distance estimators (PaiDEs). Utilizing the pairwise-distance between model components, these estimators establish bounds on entropy. We leverage this capability to enhance the performance of Bayesian Active Learning by Disagreement (BALD). Notably, unlike sample-based Monte Carlo estimators, PaiDEs exhibit a remarkable capability to estimate epistemic uncertainty at speeds up to 100 times faster while covering a significantly larger number of inputs at once and demonstrating superior performance in higher dimensions. To validate our approach, we conducted a varied series of regression experiments on commonly used benchmarks: 1D sinusoidal data,
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
Charlotte Morissette
Francois Hogan
In this paper, hypernetworks are trained to generate behaviors across a range of unseen task conditions, via a novel TD-based training objec… (see more)tive and data from a set of near-optimal RL solutions for training tasks. This work relates to meta RL, contextual RL, and transfer learning, with a particular focus on zero-shot performance at test time, enabled by knowledge of the task parameters (also known as context). Our technical approach is based upon viewing each RL algorithm as a mapping from the MDP specifics to the near-optimal value function and policy and seek to approximate it with a hypernetwork that can generate near-optimal value functions and policies, given the parameters of the MDP. We show that, under certain conditions, this mapping can be considered as a supervised learning problem. We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from DeepMind Control Suite. Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.
ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence
Dmitriy Rivkin
Nikhil Kakodkar
Oliver Limoyo
Francois Hogan
Our work examines the way in which large language models can be used for robotic planning and sampling in the context of automated photograp… (see more)hic documentation. Specifically, we illustrate how to produce a photo-taking robot with an exceptional level of semantic awareness by leveraging recent advances in general purpose language (LM) and vision-language (VLM) models. Given a high-level description of an event we use an LM to generate a natural-language list of photo descriptions that one would expect a photographer to capture at the event. We then use a VLM to identify the best matches to these descriptions in the robot's video stream. The photo portfolios generated by our method are consistently rated as more appropriate to the event by human evaluators than those generated by existing methods.
Normalizing Flow Ensembles for Rich Aleatoric and Epistemic Uncertainty Modeling
Lucas Berry
NeurIPS 2022 Competition: Driving SMARTS
Amir Hossein Rasouli
R. Goebel
Matthew E. Taylor
Iuliia Kotseruba
Soheil Alizadeh
Tianpei Yang
Montgomery Alban
Florian Shkurti
Yuzheng Zhuang
Adam Ścibior
Kasra Rezaee
Animesh Garg
Jun Luo
Weinan Zhang
Xinyu Wang
Xiangshan Chen
Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error
Ofir Nachum
Shixiang Shane Gu
In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is… (see more) uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.
Adaptive Confidence Calibration
Jonathan W. Pearce
IL-flOw: Imitation Learning from Observation using Normalizing Flows
Wei-Di Chang
Juan Higuera
Continuous MDP Homomorphisms and Homomorphic Policy Gradient
Learning Assisted Identification of Scenarios Where Network Optimization Algorithms Under-Perform
Dmitriy Rivkin
X. T. Chen
We present a generative adversarial method that uses deep learning to identify network load traffic conditions in which network optimization… (see more) algorithms under-perform other known algorithms: the Deep Convolutional Failure Generator (DCFG). The spatial distribution of network load presents challenges for network operators for tasks such as load balancing, in which a network optimizer attempts to maintain high quality communication while at the same time abiding capacity constraints. Testing a network optimizer for all possible load distributions is challenging if not impossible. We propose a novel method that searches for load situations where a target network optimization method underperforms baseline, which are key test cases that can be used for future refinement and performance optimization. By modeling a realistic network simulator's quality assessments with a deep network and, in parallel, optimizing a load generation network, our method efficiently searches the high dimensional space of load patterns and reliably finds cases in which a target network optimization method under-performs a baseline by a significant margin.