David Meger

Membre académique associé

Professeur adjoint, McGill University, École d'informatique

Sujets de recherche

Apprentissage par renforcement

Vision par ordinateur

Biographie

David Meger est professeur adjoint à l'École d'informatique de l'Université McGill. Il codirige le Laboratoire de robotique mobile au sein du Centre sur les machines intelligentes, qui est l'un des groupes de recherche en robotique les plus importants et les plus anciens du Canada. Les travaux de recherche du professeur Meger portent notamment sur les robots à guidage visuel dotés d'une vision et d'un apprentissage actifs, sur les modèles d'apprentissage par renforcement profond qui sont largement cités et utilisés par les chercheurs et l'industrie dans le monde entier, et sur la robotique de terrain, y compris les déploiements autonomes sous l'eau et sur la terre ferme. Il a été le président général de la première conférence conjointe CS-CAN au Canada en 2023.

Étudiants actuels

William Bonilla

Doctorat - McGill

Github

Valliappan Chidambaram Adaikkappan

Doctorat - McGill

Github

Google Scholar

Wesley Chung

Doctorat - McGill

Co-superviseur⋅e :

Doina Precup

Farnoosh Faraji

Doctorat - McGill

Co-superviseur⋅e :

Maîtrise recherche - McGill

Co-superviseur⋅e :

Hsiu-Chin Lin

Github

Zina Kamel

Maîtrise recherche - McGill

Co-superviseur⋅e :

Hsiu-Chin Lin

Sahand Rezaei-Shoshtari

Doctorat - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Maîtrise recherche - McGill

Github

Steven Wang

Maîtrise recherche - McGill

Harley Wiltzer

Doctorat - McGill

Co-superviseur⋅e :

Marc Gendron-Bellemare

Doctorat - McGill

Publications

Where Off-Policy Deep Reinforcement Learning Fails

Scott Fujimoto

David Meger

Doina Precup

This work examines batch reinforcement learning–the task of maximally exploiting a given batch of off-policy data, without further data co… (voir plus)llection. We demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are only capable of learning with data correlated to their current policy, making them ineffective for most off-policy applications. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space to force the agent towards behaving on-policy with respect to a subset of the given data. We extend this notion to deep reinforcement learning, and to the best of our knowledge, present the first continuous control deep reinforcement learning algorithm which can learn effectively from uncorrelated off-policy data.

2018-09-26

(publié)

openreview.net

Addressing Function Approximation Error in Actor-Critic Methods

Scott Fujimoto

Herke van Hoof

David Meger

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated valu… (voir plus)e estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

2018-07-02

Proceedings of the 35th International Conference on Machine Learning (publié)

proceedings.mlr.press

Deep Reinforcement Learning that Matters

Philip Bachman

In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning… (voir plus) (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Wei-Di Chang

Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward fu… (voir plus)nction can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

David Meger

Biographie

Étudiants actuels

Publications

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

David Meger

Biographie

Étudiants actuels

Publications