Portrait of Khimya Khetarpal

Khimya Khetarpal

Affiliate Member
Research Scientist, Google DeepMind
Research Topics
Machine Learning Theory
Online Learning
Reinforcement Learning
Representation Learning

Biography

Khimya Khetarpal is a Research Scientist at Google Deepmind. She earned her PhD in Computer Science from the Reasoning and Learning Lab at McGill University and Mila, advised by Doina Precup. She is broadly interested in artificial intelligence and reinforcement learning. Her current research interests focus on how RL agents learn to efficiently represent the world's knowledge, plan with it, and adapt to changes over time. Khimya’s work has appeared in leading AI journals and conferences including NeurIPS, ICML, AAAI, AISTATS, ICLR, The Knowledge Engineering Review, ACM, JAIR and TMLR. Her work has also been featured in MIT Technology Review. She was recognized as a TMLR expert reviewer in 2023, one of the Rising Stars in EECS 2020, a finalist for Three Minute Thesis (3MT) competition in AAAI 2019, selected for the Doctoral Consortium at AAAI 2019, and awarded Best Paper Award (3rd Price) for an ICML 2018 workshop on lifelong learning. Throughout her career, she has sought to actively mentor through initiatives such as co-founding the Mila peer advising initiative, teaching and assisting AI4Good Lab, volunteering at Skype A Scientist, and mentoring at FIRST Robotics.

Her research aims to (1) understand intelligent behavior that bridges both action and perception grounded in theoretical foundations of reinforcement learning, and (2) build AI agents to efficiently represent the world's knowledge, plan with it, and adapt to changes over time through learning and interaction.

She currently approaches this with the following research directions:

- Selective Attention for Fast Adaptation and Robustness

- Learning Abstractions and Affordances

- Discovery and Continual Reinforcement Learning

Current Students

Collaborating Alumni - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :

Publications

Affordances Enable Partial World Modeling with LLMs
Gheorghe Comanici
Jonathan Richens
Jeremy Shar
Fei Xia
Laurent Orseau
Aleksandra Faust
Robust Intervention Learning from Emergency Stop Interventions
Ethan Pronovost
Siddhartha Srinivasa
Plasticity as the Mirror of Empowerment
David Abel
Michael Bowling
Andre Barreto
Will Dabney
Shi Dong
Steven Hansen
Anna Harutyunyan
Clare Lyle
Georgios Piliouras
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity … (see more)is captured by empowerment, which has served as a vital framing concept across artificial intelligence and cognitive science. This former capacity, however, is equally foundational: In what ways, and to what extent, can an agent be influenced by what it observes? In this paper, we ground this concept in a universal agent-centric measure that we refer to as plasticity, and reveal a fundamental connection to empowerment. Following a set of desiderata on a suitable definition, we define plasticity using a new information-theoretic quantity we call the generalized directed information. We show that this new quantity strictly generalizes the directed information introduced by Massey (1990) while preserving all of its desirable properties. Under this definition, we find that plasticity is well thought of as the mirror of empowerment: The two concepts are defined using the same measure, with only the direction of influence reversed. Our main result establishes a tension between the plasticity and empowerment of an agent, suggesting that agent design needs to be mindful of both characteristics. We explore the implications of these findings, and suggest that plasticity, empowerment, and their relationship are essential to understanding agency
Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps
Matt Schmittle
Rohan Baijal
Nathan Hatch
Rosario Scalise
Mateo Guaman Castro
Sidharth Talia
Byron Boots
Siddhartha Srinivasa
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (see more)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,
Representation Learning via Non-Contrastive Mutual Information
Zhaohan Daniel Guo
Bernardo Avila Pires
Dale Schuurmans
Bo Dai
Cracking the Code of Action: A Generative Approach to Affordances for Reinforcement Learning
Agents that can autonomously navigate the web through a graphical user interface (GUI) using a unified action space (e.g., mouse and keyboar… (see more)d actions) can require very large amounts of domain-specific expert demonstrations to achieve good performance. Low sample efficiency is often exacerbated in sparse-reward and large-action-space environments, such as a web GUI, where only a few actions are relevant in any given situation. In this work, we consider the low-data regime, with limited or no access to expert behavior. To enable sample-efficient learning, we explore the effect of constraining the action space through
Agency Is Frame-Dependent
David Abel
Andre Barreto
Michael Bowling
Will Dabney
Shi Dong
Steven Stenberg Hansen
Anna Harutyunyan
Clare Lyle
Georgios Piliouras
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science… (see more), and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.
Optimizing Return Distributions with Distributional Dynamic Programming
Bernardo Avila Pires
Mark Rowland
Diana Borsa
Zhaohan Daniel Guo
Andre Barreto
David Abel
Remi Munos
Will Dabney
We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standar… (see more)d reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step). We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems, for example maximizing conditional value-at-risk, and homeostatic regulation. To highlight the practical potential of stock-augmented return distribution optimization and distributional DP, we combine the core ideas of distributional value iteration with the deep RL agent DQN, and empirically evaluate it for solving instances of the applications discussed.
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
Zhaohan Daniel Guo
Bernardo Avila Pires
Yunhao Tang
Clare Lyle
Mark Rowland
Nicolas Heess
Diana Borsa
Arthur Guez
Will Dabney
Balancing Context Length and Mixing Times for Reinforcement Learning at Scale
Due to the recent remarkable advances in artificial intelligence, researchers have begun to consider challenging learning problems such as … (see more)learning to generalize behavior from large offline datasets or learning online in non-Markovian environments. Meanwhile, recent advances in both of these areas have increasingly relied on conditioning policies on large context lengths. A natural question is if there is a limit to the performance benefits of increasing the context length if the computation needed is available. In this work, we establish a novel theoretical result that links the context length of a policy to the time needed to reliably evaluate its performance (i.e., its mixing time) in large scale partially observable reinforcement learning environments that exhibit latent sub-task structure. This analysis underscores a key tradeoff: when we extend the context length, our policy can more effectively model non-Markovian dependencies, but this comes at the cost of potentially slower policy evaluation and as a result slower downstream learning. Moreover, our empirical results highlight the relevance of this analysis when leveraging Transformer based neural networks. This perspective will become increasingly pertinent as the field scales towards larger and more realistic environments, opening up a number of potential future directions for improving the way we design learning agents.
Toward Human-AI Alignment in Large-Scale Multi-Player Games
Sugandha Sharma
Guy Davidson
Anssi Kanervisto
Udit Arora
Katja Hofmann
Ida Momennejad
Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a … (see more)method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.