Khimya Khetarpal

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Daniel Lawson

Adriana Hugessen

Charlotte Cloutier

Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (see more)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,

2025-07-01

rl-conference.cc/RLC/2025/Workshop/RLBrew (published)

doi.org

openreview.net

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Daniel Lawson

Adriana Hugessen

Charlotte Cloutier

Glen Berseth

Khimya Khetarpal

Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (see more)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,

2025-06-11

ArXiv (preprint)

arxiv.org

Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps

Matt Schmittle

Rohan Baijal

Nathan Hatch

Rosario Scalise

Mateo Guaman Castro

Sidharth Talia

Khimya Khetarpal

Byron Boots

Siddhartha Srinivasa

2025-05-30

roboticsfoundation.org/RSS/2025/Workshop/ROAR (oral)

doi.org

openreview.net

Plasticity as the Mirror of Empowerment

David Abel

Michael Bowling

Andre Barreto

Will Dabney

Shi Dong

Steven Hansen

Anna Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

2025-05-15

ArXiv (preprint)

arxiv.org

Plasticity as the Mirror of Empowerment

David Abel

Michael Bowling

Andre Barreto

Will Dabney

Shi Dong

Steven Hansen

Anna Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

2025-05-01

arXiv (published)

doi.org

arxiv.org

Representation Learning via Non-Contrastive Mutual Information

Zhaohan Daniel Guo

Bernardo Avila Pires

Khimya Khetarpal

Dale Schuurmans

Bo Dai

2025-04-23

ArXiv (preprint)

doi.org

arxiv.org

Representation Learning via Non-Contrastive Mutual Information

Zhaohan Daniel Guo

Bernardo Avila Pires

Khimya Khetarpal

Dale Schuurmans

Bo Dai

2025-04-23

ArXiv (preprint)

arxiv.org

Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps

Matt Schmittle

Rohan Baijal

Nathan Hatch

Rosario Scalise

Mateo Guaman Castro

Sidharth Talia

Khimya Khetarpal

Byron Boots

S. Srinivasa

2025-04-17

ArXiv (preprint)

arxiv.org

Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning

Lynn Cherif

Flemming Kondrup

David Venuto

Ankit Anand

Doina Precup

Khimya Khetarpal

Agents that can autonomously navigate the web through a graphical user interface (GUI) using a unified action space (e.g., mouse and keyboar… (see more)d actions) can require very large amounts of domain-specific expert demonstrations to achieve good performance. Low sample efficiency is often exacerbated in sparse-reward and large-action-space environments, such as a web GUI, where only a few actions are relevant in any given situation. In this work, we consider the low-data regime, with limited or no access to expert behavior. To enable sample-efficient learning, we explore the effect of constraining the action space through *intent-based affordances* -- i.e., considering in any situation only the subset of actions that achieve a desired outcome. We propose **Code as Generative Affordances (

2025-03-05

ICLR.cc/2025/Workshop/DL4C (published)

doi.org

openreview.net

Cracking the Code of Action: A Generative Approach to Affordances for Reinforcement Learning

Lynn Cherif

Flemming Kondrup

David Venuto

Ankit Anand

Doina Precup

Khimya Khetarpal

Agents that can autonomously navigate the web through a graphical user interface (GUI) using a unified action space (e.g., mouse and keyboar… (see more)d actions) can require very large amounts of domain-specific expert demonstrations to achieve good performance. Low sample efficiency is often exacerbated in sparse-reward and large-action-space environments, such as a web GUI, where only a few actions are relevant in any given situation. In this work, we consider the low-data regime, with limited or no access to expert behavior. To enable sample-efficient learning, we explore the effect of constraining the action space through intent-based affordances -- i.e., considering in any situation only the subset of actions that achieve a desired outcome. We propose **Code as Generative Affordances**

2025-03-05

ICLR.cc/2025/Workshop/DL4C (published)

openreview.net

Agency Is Frame-Dependent

David Abel

Andre Barreto

Michael Bowling

Will Dabney

Shi Dong

Steven Hansen

A. Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science… (see more), and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.

2025-02-06

ArXiv (preprint)

arxiv.org

Agency Is Frame-Dependent

David Abel

Andre Barreto

Michael Bowling

Will Dabney

Shi Dong

Steven Hansen

Anna Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science… (see more), and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.

2025-02-06

ArXiv (preprint)

doi.org

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Khimya Khetarpal

Biography

Current Students

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Khimya Khetarpal

Biography

Current Students

Publications