Portrait of Marc Gendron-Bellemare is unavailable

Marc Gendron-Bellemare

Core Industry Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Chief Scientific Officer, Reliant AI
Research Topics
Large Language Models (LLM)
Reinforcement Learning
Representation Learning

Biography

I am Chief Scientific Officer at Reliant AI, an adjunct professor at the School of Computer and Science at McGill University, and an adjunct professor at the Department of Computer Science and Operations Research (DIRO) at Université de Montréal.

Previously, I was a research scientist at Google Brain in Montréal, where my research focused on reinforcement learning effort. From 2013 to 2017, I worked at DeepMind in the U.K. I received my PhD from the University of Alberta under the supervision of Michael Bowling and Joel Veness.

My research lies at the intersection of reinforcement learning and probabilistic prediction. I am also interested in deep learning, generative modelling, online learning and information theory.

Current Students

PhD - Université de Montréal
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :

Publications

Small batch deep reinforcement learning
Johan Samir Obando Ceron
In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each … (see more)gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.
Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM
Kevin M Roccapriore
Max Schwarzer
Joshua Greaves
Jesse Farebrother
Colton Bishop
Maxim Ziatdinov
Igor Mordatch
Ekin Dogus Cubuk
Sergei V Kalinin
Bigger, Better, Faster: Human-level Atari with human-level efficiency
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (see more)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
Bootstrapped Representations in Reinforcement Learning
Charline Le Lan
Stephen Tu
Mark Rowland
Anna Harutyunyan
Will Dabney
In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces. While one of the promises of… (see more) deep learning algorithms is to automatically construct features well-tuned for the task they try to solve, such a representation might not emerge from end-to-end training of deep RL agents. To mitigate this issue, auxiliary objectives are often incorporated into the learning process and help shape the learnt state representation. Bootstrapping methods are today's method of choice to make these additional predictions. Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988). Surprisingly, we find that this representation differs from the features learned by Monte Carlo and residual gradient algorithms for most transition structures of the environment in the policy evaluation setting. We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules. We complement our theoretical results with an empirical comparison of these learning rules for different cumulant functions on classic domains such as the four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).
A Novel Stochastic Gradient Descent Algorithm for LearningPrincipal Subspaces
Charline Le Lan
Joshua Greaves
Jesse Farebrother
Mark Rowland
Fabian Pedregosa
In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace i… (see more)s represented by a neural network, and hence can bescaled to datasets with an effectively infinite number of rows and columns. Our method consistsin defining a loss function whose minimizer is the desired principal subspace, and constructing agradient estimate of this loss whose bias can be controlled.
Investigating Multi-task Pretraining and Generalization in Reinforcement Learning
Adrien Ali Taiga
Jesse Farebrother
Google Brain
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks
Jesse Farebrother
Joshua Greaves
Charline Le Lan
Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well-und… (see more)erstood; in practice, how-ever, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent’s network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)’s proto-value functions to deep reinforcement learning – accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment’s reward function.
Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Pierluca D'Oro
Max Schwarzer
Evgenii Nikishin
Increasing the replay ratio, the number of updates of an agent's parameters per environment interaction, is an appealing strategy for improv… (see more)ing the sample efficiency of deep reinforcement learning algorithms. In this work, we show that fully or partially resetting the parameters of deep reinforcement learning agents causes better replay ratio scaling capabilities to emerge. We push the limits of the sample efficiency of carefully-modified algorithms by training them using an order of magnitude more updates than usual, significantly improving their performance in the Atari 100k and DeepMind Control Suite benchmarks. We then provide an analysis of the design choices required for favorable replay ratio scaling to be possible and discuss inherent limits and tradeoffs.
Bigger, Better, Faster: Human-level Atari with human-level efficiency
We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on sca… (see more)ling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. We make our code and data publicly available at https://github.com/google-research/google-research/tree/master/bigger_better_faster.
The Small Batch Size Anomaly in Multistep Deep Reinforcement Learning
Johan Samir Obando Ceron
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Mark Rowland
Yunhao Tang
Clare Lyle
Remi Munos
Will Dabney
We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distri… (see more)butional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reach the surprising conclusion that even if a practitioner has no interest in the return distribution beyond the mean, QTD (which learns predictions about the full distribution of returns) may offer performance superior to approaches such as classical TD learning, which predict only the mean return, even in the tabular setting.
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
Mark Rowland
Yunhao Tang
Clare Lyle
Remi Munos
Will Dabney