Portrait de Marc Gendron-Bellemare n'est pas disponible

Marc Gendron-Bellemare

Membre industriel principal
Chaire en IA Canada-CIFAR
Professeur adjoint, McGill University, École d'informatique
Professeur asssocié, Université de Montréal, Département d'informatique et de recherche opérationnelle
Directeur scientifique, Reliant AI
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Grands modèles de langage (LLM)

Biographie

J'occupe actuellement le poste de directeur scientifique à Reliant AI. Je suis également professeur adjoint à l'École d'informatique de l'Université McGill et professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal.

Précédemment, j'ai travaillé à Google Brain à Montréal, où je me concentrais sur l'apprentissage par renforcement. De 2013 à 2017, j'ai travaillé chez DeepMind au Royaume-Uni. J'ai obtenu un doctorat de l'Université de l'Alberta en travaillant avec Michael Bowling et Joel Veness.

Ma recherche se situe au carrefour de l'apprentissage par renforcement et de la prédiction probabiliste. Je m'intéresse aussi à l'apprentissage profond, à la modélisation générative, à l'apprentissage en ligne et à la théorie de l'information.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :

Publications

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal
Marlos C. Machado
Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generali… (voir plus)zation, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states. PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar. We also present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improve generalization on diverse benchmarks, including LQR with spurious correlations, a jumping task from pixels, and Distracting DM Control Suite.
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. M… (voir plus)ost published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. This work received an outstanding paper award at NeurIPS 2021.
Autonomous navigation of stratospheric balloons using reinforcement learning
S. Candido
Jun Gong
Marlos C. Machado
Subhodeep Moitra
Sameera S. Ponda
Ziyun Wang
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate it… (voir plus)s effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
Felipe Petroski Such
Vashisht Madhavan
Rosanne Liu
Rui Wang
Yulun Li
Jiale Zhi
Ludwig Schubert
Jeff Clune
Joel Lehman
Much human and computational effort has aimed to improve how deep reinforcement learning (DRL) algorithms perform on benchmarks such as the … (voir plus)Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of DRL algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running DRL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous DRL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several DRL algorithms, highlighting interesting and previously unknown distinctions between them.
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results re… (voir plus)lative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.
The Value Function Polytope in Reinforcement Learning
Robert Dadashi
Adrien Ali Taiga
Dale Schuurmans
We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main… (voir plus) contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.
Distributional reinforcement learning with linear function approximation
Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.… (voir plus) One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.
Distributional reinforcement learning with linear function approximation
Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.… (voir plus) One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.
Distributional reinforcement learning with linear function approximation
Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.… (voir plus) One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.
A Geometric Perspective on Optimal Representations for Reinforcement Learning
Will Dabney
Robert Dadashi
Adrien Ali Taiga
Dale Eric. Schuurmans
Tor Lattimore
Clare Lyle
We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functi… (voir plus)ons. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.