Benjamin Alsbury-Nealy

PhD - McGill University

Supervisor

Blake Richards

Research Topics

AI for Science

Computational Biology

Computational Neuroscience

Multimodal Learning

Neurotechnology

Reasoning

Reinforcement Learning

Website

Google Scholar

GitHub

Publications

Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RL

Chen Sun

Wannan Yang

Thomas Jiralerspong

Dane Malenfant

Benjamin Alsbury-Nealy

Yoshua Bengio

Blake Richards

In real life, success is often contingent upon multiple critical steps that are distant in time from each other and from the final reward. T… (see more)hese critical steps are challenging to identify with traditional reinforcement learning (RL) methods that rely on the Bellman equation for credit assignment. Here, we present a new RL algorithm that uses offline contrastive learning to hone in on these critical steps. This algorithm, which we call Contrastive Retrospection (ConSpec), can be added to any existing RL algorithm. ConSpec learns a set of prototypes for the critical steps in a task by a novel contrastive loss and delivers an intrinsic reward when the current state matches one of the prototypes. The prototypes in ConSpec provide two key benefits for credit assignment: (i) They enable rapid identification of all the critical steps. (ii) They do so in a readily interpretable manner, enabling out-of-distribution generalization when sensory features are altered. Distinct from other contemporary RL approaches to credit assignment, ConSpec takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon (and ignoring other states) than it is to prospectively predict reward at every taken step. ConSpec greatly improves learning in a diverse set of RL tasks. The code is available at the link: https://github.com/sunchipsster1/ConSpec

2023-09-20

Neural Information Processing Systems (poster)

doi.org

openreview.net

Contrastive introspection (ConSpec) to rapidly identify invariant prototypes for success in RL

Chen Sun

Mila

Wannan Yang

Benjamin Alsbury-Nealy

Thomas Jiralerspong

Yoshua Bengio

†. BlakeRichards

Reinforcement learning (RL) algorithms have achieved notable success in recent years, but still struggle with fundamental issues in long-ter… (see more)m credit assignment. It remains diﬃcult to learn in situations where success is contingent upon multiple critical steps that are distant in time from each other and from a sparse reward; as is often the case in real life. Moreover, how RL algorithms assign credit in these diﬃcult situations is typically not coded in a way that can rapidly generalize to new situations. Here, we present an approach using oﬄine contrastive learning, which we call contrastive introspection (ConSpec), that can be added to any existing RL algorithm and addresses both issues. In ConSpec, a contrastive loss is used during oﬄine replay to identify invariances among successful episodes. This takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon than it is to prospectively predict reward at every step taken in the environment. ConSpec stores this knowledge in a collection of prototypes summarizing the intermediate states required for success. During training, arrival at any state that matches these prototypes generates an intrinsic reward that is added to any external rewards. As well, the reward shaping provided by ConSpec can be made to preserve the optimal policy of the underlying RL agent. The prototypes in ConSpec provide two key beneﬁts for credit assignment: (1) They enable rapid identiﬁcation of all the critical states. (2) They do so in a readily interpretable manner, enabling out of distribution generalization when sensory features are altered. In summary, ConSpec is a modular system that can be added to any existing RL algorithm to improve its long-term credit assignment.

2021-12-31

(published)

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Benjamin Alsbury-Nealy

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Benjamin Alsbury-Nealy

Publications