Marc Gendron-Bellemare

Core Industry Member

Canada CIFAR AI Chair

Associate Professor, McGill University, School of Computer Science

Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research

Chief Scientific Officer, Reliant AI

Research Topics

Large Language Models (LLM)

Reinforcement Learning

Representation Learning

Biography

I am Chief Scientific Officer at Reliant AI, an adjunct professor at the School of Computer and Science at McGill University, and an adjunct professor at the Department of Computer Science and Operations Research (DIRO) at Université de Montréal.

Previously, I was a research scientist at Google Brain in Montréal, where my research focused on reinforcement learning effort. From 2013 to 2017, I worked at DeepMind in the U.K. I received my PhD from the University of Alberta under the supervision of Michael Bowling and Joel Veness.

My research lies at the intersection of reinforcement learning and probabilistic prediction. I am also interested in deep learning, generative modelling, online learning and information theory.

Current Students

Pierluca D'Oro

Collaborating Alumni - Université de Montréal

Principal supervisor :

PhD - McGill University

Co-supervisor :

Nate Rahn

PhD - McGill University

Co-supervisor :

Doina Precup

Harley Wiltzer

PhD - McGill University

Principal supervisor :

Publications

Approximate Exploration through State Abstraction

Adrien Ali Taiga

Aaron Courville

Marc Gendron-Bellemare

Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impracti… (see more)cal. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exploration scheme based on density modelling. As a warm-up, we quantify the performance of an exploration algorithm, MBIE-EB (Strehl and Littman, 2008), when explicitly combined with state aggregation. This allows us to confirm that, as might be expected, approximation allows the agent to trade off between learning speed and quality of the learned policy. Next, we show how a given density model can be related to an abstraction and that the corresponding pseudo-count bonus can act as a substitute in MBIE-EB combined with this abstraction, but may lead to either under- or over-exploration. Then, we show that a given density model also defines an implicit abstraction, and find a surprising mismatch between pseudo-counts derived either implicitly or explicitly. Finally we derive a new pseudo-count bonus alleviating this issue.

2018-08-29

ArXiv (preprint)

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Marc Gendron-Bellemare

Biography

Current Students

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Marc Gendron-Bellemare

Biography

Current Students

Publications