Marc Gendron-Bellemare

PhD - McGill University

Co-supervisor :

Nate Rahn

PhD - McGill University

Co-supervisor :

Doina Precup

Jesse Silverberg

PhD - Université de Montréal

Principal supervisor :

PhD - McGill University

Principal supervisor :

Publications

Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning

Yash Jhaveri

Patrick Shafto

David Meger

2025-10-09

ArXiv (preprint)

Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning

Yash Jhaveri

Patrick Shafto

David Meger

In the pursuit of finding an optimal policy, reinforcement learning (RL) methods generally ignore the properties of learned policies apart f… (see more)rom their expected return. Thus, even when successful, it is difficult to characterize which policies will be learned and what they will do. In this work, we present a theoretical framework for policy optimization that guarantees convergence to a particular optimal policy, via vanishing entropy regularization and a temperature decoupling gambit. Our approach realizes an interpretable, diversity-preserving optimal policy as the regularization temperature vanishes and ensures the convergence of policy derived objects--value functions and return distributions. In a particular instance of our method, for example, the realized policy samples all optimal actions uniformly. Leveraging our temperature decoupling gambit, we present an algorithm that estimates, to arbitrary accuracy, the return distribution associated to its interpretable, diversity-preserving optimal policy.

2025-10-09

ArXiv (preprint)

MSR37 Improve Analyst Accuracy in Systematic Literature Reviews Using Reliant Tabular and LLM-Based Relevance Scoring

Christoph R. Schlegel

Sam Work

2025-07-01

Value in Health (published)

MSR37 Improve Analyst Accuracy in Systematic Literature Reviews Using Reliant Tabular and LLM-Based Relevance Scoring

Christoph R. Schlegel

Sam Work

2025-07-01

Value in Health (published)

MSR37 Improve Analyst Accuracy in Systematic Literature Reviews Using Reliant Tabular and LLM-Based Relevance Scoring

Christoph R. Schlegel

Sam Work

2025-07-01

Value in Health (published)

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Max Schwarzer

Joshua Greaves

Ekin Dogus Cubuk

Rishabh Agarwal

Aaron Courville

Sergei Kalinin

Igor Mordatch

Pablo Samuel Castro

Kevin M Roccapriore

We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimu… (see more)lated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.

2025-05-20

Advanced Materials Interfaces (published)

Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs

Nicolas Le Roux

Jonathan Lebensold

Arnaud Bergeron

Joshua Greaves

Alex Fr'echette

Carolyne Pelletier

Éric Thibodeau-Laufer

S'andor Toth

Sam Work

2025-03-18

ArXiv (preprint)

Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs

Nicolas Le Roux

Jonathan Lebensold

Arnaud Bergeron

Joshua Greaves

Alex Fr'echette

Carolyne Pelletier

Éric Thibodeau-Laufer

S'andor Toth

Sam Work

2025-03-18

ArXiv (preprint)

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

David Meger

Patrick Shafto

Yash Jhaveri

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

The Position Dependence of Electron Beam Induced Effects in 2D Materials with Deep Neural Networks

Kevin M Roccapriore

Max Schwarzer

Joshua Greaves

Riccardo Torsi

Rishabh Agarwal

Colton Bishop

Igor Mordatch

Ekin Dogus Cubuk

Aaron Courville

Joshua Robinson

Pablo Samuel Castro

Sergei V Kalinin

2024-07-01

Microscopy and Microanalysis (published)

Controlling Large Language Model Agents with Entropic Activation Steering

Nathan Rahn

Pierluca D'Oro

The generality of pretrained large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. To be… (see more) successful, such agents must form beliefs about how to achieve their goals based on limited interaction with their environment, resulting in uncertainty about the best action to take at each step. In this paper, we study how LLM agents form and act on these beliefs by conducting experiments in controlled sequential decision-making tasks. To begin, we find that LLM agents are overconfident: They draw strong conclusions about what to do based on insufficient evidence, resulting in inadequately explorative behavior. We dig deeper into this phenomenon and show how it emerges from a collapse in the entropy of the action distribution implied by sampling from the LLM. We then demonstrate that existing token-level sampling techniques are by themselves insufficient to make the agent explore more. Motivated by this fact, we introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. EAST computes a steering vector as an entropy-weighted combination of representations, and uses it to manipulate an LLM agent's uncertainty over actions by intervening on its activations during the forward pass. We show that EAST can reliably increase the entropy in an LLM agent's actions, causing more explorative behavior to emerge. Finally, EAST modifies the subjective uncertainty an LLM agent expresses, paving the way to interpreting and controlling how LLM agents represent uncertainty about their decisions.

2024-06-24

ICML.cc/2024/Workshop/MI (poster)

openreview.net

A Distributional Analogue to the Successor Representation

Arthur Gretton

Yunhao Tang

Andre Barreto

Will Dabney

Mark Rowland

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure … (see more)and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.

2024-05-01

ICML.cc/2024/Conference (spotlight)