Marc Gendron-Bellemare

Joshua Greaves

Ekin Dogus Cubuk

Sergei Kalinin

Igor Mordatch

Kevin M Roccapriore

We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimu… (see more)lated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.

2025-05-20

Advanced Materials Interfaces (published)

Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs

Nicolas Le Roux

Jonathan Lebensold

Arnaud Bergeron

Joshua Greaves

Alex Fr'echette

Carolyne Pelletier

Eric Thibodeau-Laufer

S'andor Toth

Sam Work

2025-03-18

ArXiv (preprint)

Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs

Nicolas Le Roux

Jonathan Lebensold

Arnaud Bergeron

Joshua Greaves

Alex Fr'echette

Carolyne Pelletier

Eric Thibodeau-Laufer

S'andor Toth

Sam Work

2025-03-18

ArXiv (preprint)

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

Harley Wiltzer

David Meger

Patrick Shafto

Yash Jhaveri

2024-09-25

NeurIPS.cc/2024/Conference (poster)

The Position Dependence of Electron Beam Induced Effects in 2D Materials with Deep Neural Networks

Kevin M Roccapriore

Max Schwarzer

Joshua Greaves

Riccardo Torsi

Colton Bishop

Igor Mordatch

Ekin Dogus Cubuk

Joshua Robinson

Sergei V Kalinin

2024-07-01

Microscopy and Microanalysis (published)

Controlling Large Language Model Agents with Entropic Activation Steering

Nathan Rahn

Pierluca D'Oro

The generality of pretrained large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. To be… (see more) successful, such agents must form beliefs about how to achieve their goals based on limited interaction with their environment, resulting in uncertainty about the best action to take at each step. In this paper, we study how LLM agents form and act on these beliefs by conducting experiments in controlled sequential decision-making tasks. To begin, we find that LLM agents are overconfident: They draw strong conclusions about what to do based on insufficient evidence, resulting in inadequately explorative behavior. We dig deeper into this phenomenon and show how it emerges from a collapse in the entropy of the action distribution implied by sampling from the LLM. We then demonstrate that existing token-level sampling techniques are by themselves insufficient to make the agent explore more. Motivated by this fact, we introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. EAST computes a steering vector as an entropy-weighted combination of representations, and uses it to manipulate an LLM agent's uncertainty over actions by intervening on its activations during the forward pass. We show that EAST can reliably increase the entropy in an LLM agent's actions, causing more explorative behavior to emerge. Finally, EAST modifies the subjective uncertainty an LLM agent expresses, paving the way to interpreting and controlling how LLM agents represent uncertainty about their decisions.

2024-06-24

ICML.cc/2024/Workshop/MI (poster)

A Distributional Analogue to the Successor Representation

Harley Wiltzer

Arthur Gretton

Yunhao Tang

Andre Barreto

Will Dabney

Mark Rowland

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure … (see more)and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.

2024-05-01

ICML.cc/2024/Conference (spotlight)

Marc Bellemare

2024-01-24

American Journal of Agricultural Economics (published)

An Analysis of Quantile Temporal-Difference Learning

Mark Rowland

Remi Munos

Mohammad Gheshlaghi Azar

Yunhao Tang

Georg Ostrovski

Anna Harutyunyan

K. Tuyls

Will Dabney

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key compon… (see more)ent in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic approximation tools, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. The core result of this paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1, putting QTD on firm theoretical footing. The proof establishes connections between QTD and non-linear differential inclusions through stochastic approximation theory and non-smooth analysis.

Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Max Schwarzer

Joshua Greaves

Kevin Roccapriore

Ekin Dogus Cubuk

Sergei Kalinin

Igor Mordatch

We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (see more)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (spotlight)

Learning Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Max Schwarzer

Joshua Greaves

Kevin Roccapriore

Ekin Dogus Cubuk

Sergei Kalinin

Igor Mordatch

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (spotlight)

ConText-GAN: using contextual texture information for realistic and controllable medical image synthesis*

Marc Adrien Hostin

Shahram Attarian

David Bendahan

This study proposes an enhancement to the ConText-GAN, an image synthesis model using a controllable texture input. The improvement consists… (see more) in using a texture feature fusion module to reduce the complexity of the model, and enable the use of the OASIS architecture for image generation.

2023-10-15

2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI) (published)