Portrait de Marc Gendron-Bellemare n'est pas disponible

Marc Gendron-Bellemare

Membre industriel principal
Chaire en IA Canada-CIFAR
Professeur adjoint, McGill University, École d'informatique
Professeur asssocié, Université de Montréal, Département d'informatique et de recherche opérationnelle
Directeur scientifique, Reliant AI

Biographie

J'occupe actuellement le poste de directeur scientifique à Reliant AI. Je suis également professeur adjoint à l'École d'informatique de l'Université McGill et professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal.

Précédemment, j'ai travaillé à Google Brain à Montréal, où je me concentrais sur l'apprentissage par renforcement. De 2013 à 2017, j'ai travaillé chez DeepMind au Royaume-Uni. J'ai obtenu un doctorat de l'Université de l'Alberta en travaillant avec Michael Bowling et Joel Veness.

Ma recherche se situe au carrefour de l'apprentissage par renforcement et de la prédiction probabiliste. Je m'intéresse aussi à l'apprentissage profond, à la modélisation générative, à l'apprentissage en ligne et à la théorie de l'information.

Étudiants actuels

Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - McGill University
Superviseur⋅e principal⋅e :
Doctorat - McGill University
Co-superviseur⋅e :
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - McGill University
Co-superviseur⋅e :
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :

Publications

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate it… (voir plus)s effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms
An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents
Felipe Petroski Such
Vashisht Madhavan
Rosanne Liu
Rui Wang
Yulun Li
Jiale Zhi
Ludwig Schubert
Jeff Clune
Joel Lehman
Much human and computational effort has aimed to improve how deep reinforcement learning (DRL) algorithms perform on benchmarks such as the … (voir plus)Atari Learning Environment. Comparatively less effort has focused on understanding what has been learned by such methods, and investigating and comparing the representations learned by different families of DRL algorithms. Sources of friction include the onerous computational requirements, and general logistical and architectural complications for running DRL algorithms at scale. We lessen this friction, by (1) training several algorithms at scale and releasing trained models, (2) integrating with a previous DRL model release, and (3) releasing code that makes it easy for anyone to load, visualize, and analyze such models. This paper introduces the Atari Zoo framework, which contains models trained across benchmark Atari games, in an easy-to-use format, as well as code that implements common modes of analysis and connects such models to a popular neural network visualization library. Further, to demonstrate the potential of this dataset and software package, we show initial quantitative and qualitative comparisons between the performance and representations of several DRL algorithms, highlighting interesting and previously unknown distinctions between them.
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results re… (voir plus)lative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.
The Value Function Polytope in Reinforcement Learning
Robert Dadashi
Adrien Ali Taiga
Dale Schuurmans
We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main… (voir plus) contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.
Distributional reinforcement learning with linear function approximation
Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.… (voir plus) One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.
Distributional reinforcement learning with linear function approximation
Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.… (voir plus) One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.
Distributional reinforcement learning with linear function approximation
Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited.… (voir plus) One exception is Rowland et al. (2018)'s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51's use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model's prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.
A Geometric Perspective on Optimal Representations for Reinforcement Learning
Will Dabney
Robert Dadashi
Adrien Ali Taiga
Dale Eric. Schuurmans
Tor Lattimore
Clare Lyle
We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functi… (voir plus)ons. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.
A Geometric Perspective on Optimal Representations for Reinforcement Learning
Will Dabney
Robert Dadashi
Adrien Ali Taiga
Dale Schuurmans
Tor Lattimore
Clare Lyle
We propose a new perspective on representation learning in reinforcement learning based on geometric properties of the space of value functi… (voir plus)ons. We leverage this perspective to provide formal evidence regarding the usefulness of value functions as auxiliary tasks. Our formulation considers adapting the representation to minimize the (linear) approximation of the value function of all stationary policies for a given environment. We show that this optimization reduces to making accurate predictions regarding a special class of value functions which we call adversarial value functions (AVFs). We demonstrate that using value functions as auxiliary tasks corresponds to an expected-error relaxation of our formulation, with AVFs a natural candidate, and identify a close relationship with proto-value functions (Mahadevan, 2005). We highlight characteristics of AVFs and their usefulness as auxiliary tasks in a series of experiments on the four-room domain.
Dopamine: A Research Framework for Deep Reinforcement Learning
Subhodeep Moitra
Carles Gelada
Saurabh Kumar
Deep reinforcement learning (deep RL) research has grown significantly in recent years. A number of software offerings now exist that provid… (voir plus)e stable, comprehensive implementations for benchmarking. At the same time, recent deep RL research has become more diverse in its goals. In this paper we introduce Dopamine, a new research framework for deep RL that aims to support some of that diversity. Dopamine is open-source, TensorFlow-based, and provides compact and reliable implementations of some state-of-the-art deep RL agents. We complement this offering with a taxonomy of the different research objectives in deep RL research. While by no means exhaustive, our analysis highlights the heterogeneity of research in the field, and the value of frameworks such as ours.