Khimya Khetarpal

Membre affilié

Chercheuse scientifique, Google DeepMind

Sujets de recherche

Apprentissage de représentations

Apprentissage en ligne

Apprentissage par renforcement

Théorie de l'apprentissage automatique

Biographie

Khimya Khetarpal est chercheuse chez Google Deepmind. Elle a obtenu son doctorat en informatique au Reasoning and Learning Lab de l'Université McGill et à Mila, csupervisée par Doina Precup. Elle s'intéresse de manière générale à l'intelligence artificielle et à l'apprentissage par renforcement. Ses recherches actuelles portent sur la manière dont les agents RL apprennent à représenter efficacement les connaissances du monde, à planifier avec elles et à s'adapter aux changements au fil du temps. Les travaux de Khimya ont été publiés dans les principales revues et conférences sur l'intelligence artificielle, notamment NeurIPS, ICML, AAAI, AISTATS, ICLR, The Knowledge Engineering Review, ACM, JAIR et TMLR. Ses travaux ont également été présentés dans la MIT Technology Review. Elle a été reconnue comme examinatrice experte de TMLR en 2023, l'une des étoiles montantes d'EECS 2020, finaliste du concours Three Minute Thesis (3MT) d'AAAI 2019, sélectionnée pour le consortium doctoral d'AAAI 2019, et a reçu le prix du meilleur article (3e prix) pour un atelier ICML 2018 sur l'apprentissage tout au long de la vie. Tout au long de sa carrière, elle s'est efforcée d'être une mentore active par le biais d'initiatives telles que la cofondation de l'initiative de conseil par les pairs Mila, l'enseignement et l'assistance au AI4Good Lab, le bénévolat à Skype A Scientist et le mentorat à FIRST Robotics.

Ses recherches visent à (1) comprendre le comportement intelligent qui fait le lien entre l'action et la perception en s'appuyant sur les fondements théoriques de l'apprentissage par renforcement, et (2) construire des agents d'intelligence artificielle pour représenter efficacement la connaissance du monde, planifier avec elle et s'adapter aux changements au fil du temps grâce à l'apprentissage et à l'interaction.

Elle aborde actuellement ces questions dans les directions de recherche suivantes :

- Attention sélective pour une adaptation et une robustesse rapides

- Apprentissage des abstractions et des affordances

- Découverte et apprentissage par renforcement continu

Étudiants actuels

Lynn Cherif

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Publications

Affordances Enable Partial World Modeling with LLMs

Khimya Khetarpal

Gheorghe Comanici

Jonathan Richens

Jeremy Shar

Fei Xia

Laurent Orseau

Aleksandra Faust

Doina Precup

2026-02-10

ArXiv (prépublication)

arxiv.org

Robust Intervention Learning from Emergency Stop Interventions

Ethan Pronovost

Khimya Khetarpal

Siddhartha Srinivasa

2026-02-02

ArXiv (prépublication)

arxiv.org

Plasticity as the Mirror of Empowerment

David Abel

Michael Bowling

Andre Barreto

Will Dabney

Shi Dong

Steven Hansen

Anna Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity … (voir plus)is captured by empowerment, which has served as a vital framing concept across artificial intelligence and cognitive science. This former capacity, however, is equally foundational: In what ways, and to what extent, can an agent be influenced by what it observes? In this paper, we ground this concept in a universal agent-centric measure that we refer to as plasticity, and reveal a fundamental connection to empowerment. Following a set of desiderata on a suitable definition, we define plasticity using a new information-theoretic quantity we call the generalized directed information. We show that this new quantity strictly generalizes the directed information introduced by Massey (1990) while preserving all of its desirable properties. Under this definition, we find that plasticity is well thought of as the mirror of empowerment: The two concepts are defined using the same measure, with only the direction of influence reversed. Our main result establishes a tension between the plasticity and empowerment of an agent, suggesting that agent design needs to be mindful of both characteristics. We explore the implications of these findings, and suggest that plasticity, empowerment, and their relationship are essential to understanding agency

2025-09-17

NeurIPS.cc/2025/Conference (spotlight)

doi.org

openreview.net

Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps

Matt Schmittle

Rohan Baijal

Nathan Hatch

Rosario Scalise

Mateo Guaman Castro

Sidharth Talia

Khimya Khetarpal

Byron Boots

Siddhartha Srinivasa

2025-08-07

robot-learning.org/CoRL/2025/Conference (poster)

doi.org

proceedings.mlr.press

Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning

Adriana Hugessen

Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (voir plus)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,

2025-06-30

rl-conference.cc/RLC/2025/Workshop/RLBrew (publié)

doi.org

openreview.net

Representation Learning via Non-Contrastive Mutual Information

Zhaohan Daniel Guo

Bernardo Avila Pires

Khimya Khetarpal

Dale Schuurmans

Bo Dai

2025-04-22

ArXiv (prépublication)

doi.org

arxiv.org

Cracking the Code of Action: A Generative Approach to Affordances for Reinforcement Learning

David Venuto

Agents that can autonomously navigate the web through a graphical user interface (GUI) using a unified action space (e.g., mouse and keyboar… (voir plus)d actions) can require very large amounts of domain-specific expert demonstrations to achieve good performance. Low sample efficiency is often exacerbated in sparse-reward and large-action-space environments, such as a web GUI, where only a few actions are relevant in any given situation. In this work, we consider the low-data regime, with limited or no access to expert behavior. To enable sample-efficient learning, we explore the effect of constraining the action space through

2025-03-04

ICLR.cc/2025/Workshop/DL4C (publié)

doi.org

openreview.net

Agency Is Frame-Dependent

David Abel

Andre Barreto

Michael Bowling

Will Dabney

Shi Dong

Steven Stenberg Hansen

Anna Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science… (voir plus), and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.

2025-02-05

ArXiv (prépublication)

doi.org

arxiv.org

Optimizing Return Distributions with Distributional Dynamic Programming

Bernardo Avila Pires

Mark Rowland

Diana Borsa

Zhaohan Daniel Guo

Khimya Khetarpal

Andre Barreto

David Abel

Remi Munos

Will Dabney

We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standar… (voir plus)d reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step). We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems, for example maximizing conditional value-at-risk, and homeostatic regulation. To highlight the practical potential of stock-augmented return distribution optimization and distributional DP, we combine the core ideas of distributional value iteration with the deep RL agent DQN, and empirically evaluate it for solving instances of the applications discussed.

2025-01-21

ArXiv (prépublication)

doi.org

arxiv.org

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

Khimya Khetarpal

Zhaohan Daniel Guo

Bernardo Avila Pires

Yunhao Tang

Clare Lyle

Mark Rowland

Nicolas Heess

Diana Borsa

Arthur Guez

Will Dabney

2025-01-21

aistats.org/AISTATS/2025/Conference (poster)

doi.org

proceedings.mlr.press

Balancing Context Length and Mixing Times for Reinforcement Learning at Scale

Sarath Chandar

Khimya Khetarpal

Janarthanan Rajendran

Matthew Riemer

É. Montréal

Due to the recent remarkable advances in artiﬁcial intelligence, researchers have begun to consider challenging learning problems such as … (voir plus)learning to generalize behavior from large ofﬂine datasets or learning online in non-Markovian environments. Meanwhile, recent advances in both of these areas have increasingly relied on conditioning policies on large context lengths. A natural question is if there is a limit to the performance beneﬁts of increasing the context length if the computation needed is available. In this work, we establish a novel theoretical result that links the context length of a policy to the time needed to reliably evaluate its performance (i.e., its mixing time) in large scale partially observable reinforcement learning environments that exhibit latent sub-task structure. This analysis underscores a key tradeoff: when we extend the context length, our policy can more effectively model non-Markovian dependencies, but this comes at the cost of potentially slower policy evaluation and as a result slower downstream learning. Moreover, our empirical results highlight the relevance of this analysis when leveraging Transformer based neural networks. This perspective will become increasingly pertinent as the ﬁeld scales towards larger and more realistic environments, opening up a number of potential future directions for improving the way we design learning agents.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Toward Human-AI Alignment in Large-Scale Multi-Player Games

Sugandha Sharma

Guy Davidson

Khimya Khetarpal

Anssi Kanervisto

Udit Arora

Katja Hofmann

Ida Momennejad

Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a … (voir plus)method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.

2024-02-04

ArXiv (prépublication)

doi.org

arxiv.org

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Khimya Khetarpal

Biographie

Étudiants actuels

Publications

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Khimya Khetarpal

Biographie

Étudiants actuels

Publications