Raj Ghugare

Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View.

Raj Ghugare

Matthieu Geist

Benjamin Eysenbach

Some reinforcement learning (RL) algorithms have the capability of recombining together pieces of previously seen experience to solve a task… (voir plus) never seen before during training. This oft-sought property is one of the few ways in which dynamic programming based RL algorithms are considered different from supervised learning (SL) based RL algorithms. Yet, recent RL methods based on off-the-shelf SL algorithms achieve excellent results without an explicit mechanism for stitching; it remains unclear whether those methods forgo this important stitching property. This paper studies this question in the setting of goal-reaching problems. We show that the desirable stitching property corresponds to a form of generalization: after training on a distribution of (state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen \emph{together} in the training data. Our analysis shows that this sort of generalization is different from \emph{i.i.d.} generalization. This connection between stitching and generalization reveals why we should not expect existing RL methods based on SL to perform stitching, even in the limit of large datasets and models. We experimentally validate this result on carefully constructed datasets. This connection suggests a simple remedy, the same remedy for improving generalization in supervised learning: data augmentation. We propose a naive \emph{temporal} data augmentation approach and demonstrate that adding it to RL methods based on SL enables them to stitch together experience so that they succeed in navigating between states and goals unseen together during training.

2024-01-16

ICLR.cc/2024/Conference (poster)

openreview.net

Closing the Gap between TD Learning and Supervised Learning - A Generalisation Point of View

Raj Ghugare

Matthieu Geist

Glen Berseth

Benjamin Eysenbach

Some reinforcement learning (RL) algorithms can stitch pieces of experience to solve a task never seen before during training. This oft-soug… (voir plus)ht property is one of the few ways in which RL methods based on dynamic-programming differ from RL methods based on supervised-learning (SL). Yet, certain RL methods based on off-the-shelf SL algorithms achieve excellent results without an explicit mechanism for stitching; it remains unclear whether those methods forgo this important stitching property. This paper studies this question for the problems of achieving a target goal state and achieving a target return value. Our main result is to show that the stitching property corresponds to a form of combinatorial generalization: after training on a distribution of (state, goal) pairs, one would like to evaluate on (state, goal) pairs not seen together in the training data. Our analysis shows that this sort of generalization is different from i.i.d. generalization. This connection between stitching and generalisation reveals why we should not expect SL-based RL methods to perform stitching, even in the limit of large datasets and models. Based on this analysis, we construct new datasets to explicitly test for this property, revealing that SL-based methods lack this stitching property and hence fail to perform combinatorial generalization. Nonetheless, the connection between stitching and combinatorial generalisation also suggests a simple remedy for improving generalisation in SL: data augmentation. We propose a temporal data augmentation and demonstrate that adding it to SL-based methods enables them to successfully complete tasks not seen together during training. On a high level, this connection illustrates the importance of combinatorial generalization for data efficiency in time-series data beyond tasks beyond RL, like audio, video, or text.

2024-01-16

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Searching for High-Value Molecules Using Reinforcement Learning and Transformers

Raj Ghugare

Santiago Miret

Adriana Hugessen

Mariano Phielipp

Glen Berseth

2024-01-16

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Searching for High-Value Molecules Using Reinforcement Learning and Transformers

Raj Ghugare

Santiago Miret

Adriana Hugessen

Mariano Phielipp

Glen Berseth

Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However,… (voir plus) RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.

2023-10-04

ArXiv (prépublication)

doi.org

arxiv.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Raj Ghugare

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Raj Ghugare

Publications