Portrait de Marc Gendron-Bellemare n'est pas disponible

Marc Gendron-Bellemare

Membre industriel principal
Chaire en IA Canada-CIFAR
Professeur adjoint, McGill University, École d'informatique
Professeur asssocié, Université de Montréal, Département d'informatique et de recherche opérationnelle
Directeur scientifique, Reliant AI
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Grands modèles de langage (LLM)

Biographie

J'occupe actuellement le poste de directeur scientifique à Reliant AI. Je suis également professeur adjoint à l'École d'informatique de l'Université McGill et professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal.

Précédemment, j'ai travaillé à Google Brain à Montréal, où je me concentrais sur l'apprentissage par renforcement. De 2013 à 2017, j'ai travaillé chez DeepMind au Royaume-Uni. J'ai obtenu un doctorat de l'Université de l'Alberta en travaillant avec Michael Bowling et Joel Veness.

Ma recherche se situe au carrefour de l'apprentissage par renforcement et de la prédiction probabiliste. Je m'intéresse aussi à l'apprentissage profond, à la modélisation générative, à l'apprentissage en ligne et à la théorie de l'information.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :

Publications

Variance Double-Down: The Small Batch Size Anomaly in Multistep Deep Reinforcement Learning
Johan Samir Obando Ceron
In deep reinforcement learning, multi-step learning is almost unavoidable to achieve state-of-the-art performance. However, the increased va… (voir plus)riance that multistep learning brings makes it difficult to increase the update horizon beyond relatively small numbers. In this paper, we report the counterintuitive finding that decreasing the batch size parameter improves the performance of many standard deep RL agents that use multi-step learning. It is well-known that gradient variance decreases with increasing batch sizes, so obtaining improved performance by increasing variance on two fronts is a rather surprising finding. We conduct a broad set of experiments to better understand what we call the variance doubledown phenomenon.
I NTRODUCING C OORDINATION IN C ONCURRENT R EIN - FORCEMENT L EARNING
Adrien Ali Taiga
Google Brain
Research on exploration in reinforcement learning has mostly focused on problems with a single agent interacting with an environment. Howeve… (voir plus)r many problems are better addressed by the concurrent reinforcement learning paradigm, where multiple agents operate in a common environment. Recent work has tackled the challenge of exploration in this particular setting (Dimakopoulou & Van Roy, 2018; Dimakopoulou et al., 2018). Nonetheless, they do not completely leverage the characteristics of this framework and agents end up behaving independently from each other. In this work we argue that coordination among concurrent agents is crucial for efficient exploration. We introduce coordination in Thompson Sampling based methods by drawing correlated samples from an agent’s posterior. We apply this idea to extend existing exploration schemes such as randomized least squares value iteration (RLSVI). Empirical results on simple toy tasks emphasize the merits of our approach and call attention to coordination as a key objective for efficient exploration in concurrent reinforcement learning.
Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress
Metrics and continuity in reinforcement learning
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generali… (voir plus)zation, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar. We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. M… (voir plus)ost published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. This work received an outstanding paper award at NeurIPS 2021.
Autonomous navigation of stratospheric balloons using reinforcement learning
S. Candido
Jun Gong
Marlos C. Machado
Subhodeep Moitra
Sameera S. Ponda
Ziyun Wang
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate it… (voir plus)s effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(
On Catastrophic Interference in Atari 2600 Games
William Fedus
Dibya Ghosh
John D. Martin
Model-free deep reinforcement learning is sample inefficient. One hypothesis -- speculated, but not confirmed -- is that catastrophic interf… (voir plus)erence within an environment inhibits learning. We test this hypothesis through a large-scale empirical study in the Arcade Learning Environment (ALE) and, indeed, find supporting evidence. We show that interference causes performance to plateau; the network cannot train on segments beyond the plateau without degrading the policy used to reach there. By synthetically controlling for interference, we demonstrate performance boosts across architectures, learning algorithms and environments. A more refined analysis shows that learning one segment of a game often increases prediction errors elsewhere. Our study provides a clear empirical link between catastrophic interference and sample efficiency in reinforcement learning.
On Bonus Based Exploration Methods In The Arcade Learning Environment
Adrien Ali Taiga
William Fedus
Marlos C. Machado
Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration prob… (voir plus)lems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma's Revenge, Bellemare et al.'s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma's Revenge they do not provide meaningful gains over the simpler epsilon-greedy scheme. In fact, we find that methods that perform best on that game often underperform epsilon-greedy on easy exploration Atari 2600 games. We find that our conclusions remain valid even when hyperparameters are tuned for these easy-exploration games. Finally, we find that none of the methods surveyed benefit from additional training samples (1 billion frames, versus Rainbow's 200 million) on Bellemare et al.'s hard exploration games. Our results suggest that recent gains in Montezuma's Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
Felipe Petroski Such
Vashisht Madhavan
Rosanne Liu
Rui Wang
Yulun Li
Jiale Zhi
Ludwig Schubert
Jeff Clune
Joel Lehman
Much human and computational effort has aimed to improve how deep reinforcement learning (DRL) algorithms perform on benchmarks such as the … (voir plus)Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of DRL algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running DRL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous DRL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several DRL algorithms, highlighting interesting and previously unknown distinctions between them.