Portrait of Khimya Khetarpal

Khimya Khetarpal

Affiliate Member
Research Scientist, Google DeepMind
Research Topics
Machine Learning Theory
Online Learning
Reinforcement Learning
Representation Learning

Biography

Khimya Khetarpal is a Research Scientist at Google Deepmind. She earned her PhD in Computer Science from the Reasoning and Learning Lab at McGill University and Mila, advised by Doina Precup. She is broadly interested in artificial intelligence and reinforcement learning. Her current research interests focus on how RL agents learn to efficiently represent the world's knowledge, plan with it, and adapt to changes over time. Khimya’s work has appeared in leading AI journals and conferences including NeurIPS, ICML, AAAI, AISTATS, ICLR, The Knowledge Engineering Review, ACM, JAIR and TMLR. Her work has also been featured in MIT Technology Review. She was recognized as a TMLR expert reviewer in 2023, one of the Rising Stars in EECS 2020, a finalist for Three Minute Thesis (3MT) competition in AAAI 2019, selected for the Doctoral Consortium at AAAI 2019, and awarded Best Paper Award (3rd Price) for an ICML 2018 workshop on lifelong learning. Throughout her career, she has sought to actively mentor through initiatives such as co-founding the Mila peer advising initiative, teaching and assisting AI4Good Lab, volunteering at Skype A Scientist, and mentoring at FIRST Robotics.

Her research aims to (1) understand intelligent behavior that bridges both action and perception grounded in theoretical foundations of reinforcement learning, and (2) build AI agents to efficiently represent the world's knowledge, plan with it, and adapt to changes over time through learning and interaction.

She currently approaches this with the following research directions:

- Selective Attention for Fast Adaptation and Robustness

- Learning Abstractions and Affordances

- Discovery and Continual Reinforcement Learning

Current Students

PhD - Université de Montréal
Principal supervisor :

Publications

Difference-Aware Retrieval Policies for Imitation Learning
Quinn Pfeifer
Ethan Pronovost
Paarth Shah
Siddhartha Srinivasa
Abhishek Gupta
Parametric imitation learning via behavior cloning can suffer from poor generalization to out-of-distribution states due to compounding erro… (see more)rs during deployment. We show that reusing the training data during inference via a semi-parametric retrieval-based imitation learning approach can alleviate this challenge. We present Difference-Aware Retrieval Policies for Imitation Learning (DARP), a semi-parametric retrieval-based imitation learning approach that addresses this limitation by reparameterizing the imitation learning problem in terms of local neighborhood structure rather than direct state-to-action mappings. Instead of learning a global policy, DARP trains a model to predict actions based on
Preventing Learning Stagnation in PPO by Scaling to 1 Million Parallel Environments
Michael Beukman
Zeyu Zheng
Will Dabney
Jakob Foerster
Michael Dennis
Clare Lyle
Plateaus, where an agent's performance stagnates at a suboptimal level, are a common problem in deep on-policy RL. Focusing on PPO due to it… (see more)s widespread adoption, we show that plateaus in certain regimes arise not because of known exploration, capacity, or optimization challenges, but because sample-based estimates of the loss eventually become poor proxies for the true objective over the course of training. As a recap, PPO switches between sampling rollouts from several parallel environments online using the current policy (which we call the outer loop) and performing repeated minibatch SGD steps against this offline dataset (the inner loop). In our work we consider only the outer loop, and conceptually model it as stochastic optimization. The step size is then controlled by the regularization strength towards the previous policy and the gradient noise by the number of samples collected between policy update steps. This model predicts that performance will plateau at a suboptimal level if the outer step size is too large relative to the noise. Recasting PPO in this light makes it clear that there are two ways to address this particular type of learning stagnation: either reduce the step size or increase the number of samples collected between updates. We first validate the predictions of our model and investigate how hyperparameter choices influence the step size and update noise, concluding that increasing the number of parallel environments is a simple and robust way to reduce both factors. Next, we propose a recipe for how to co-scale the other hyperparameters when increasing parallelization, and show that incorrectly doing so can lead to severe performance degradation. Finally, we vastly outperform prior baselines in a complex open-ended domain by scaling PPO to more than 1M parallel environments, thereby enabling monotonic performance improvement up to one trillion transitions.
Affordances Enable Partial World Modeling with LLMs
Gheorghe Comanici
Jonathan Richens
Jeremy Shar
Fei Xia
Laurent Orseau
Aleksandra Faust
Robust Intervention Learning from Emergency Stop Interventions
Ethan Pronovost
Siddhartha Srinivasa
Plasticity as the Mirror of Empowerment
David Abel
Michael Bowling
Andre Barreto
Will Dabney
Shi Dong
Steven Hansen
Anna Harutyunyan
Clare Lyle
Georgios Piliouras
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity … (see more)is captured by empowerment, which has served as a vital framing concept across artificial intelligence and cognitive science. This former capacity, however, is equally foundational: In what ways, and to what extent, can an agent be influenced by what it observes? In this paper, we ground this concept in a universal agent-centric measure that we refer to as plasticity, and reveal a fundamental connection to empowerment. Following a set of desiderata on a suitable definition, we define plasticity using a new information-theoretic quantity we call the generalized directed information. We show that this new quantity strictly generalizes the directed information introduced by Massey (1990) while preserving all of its desirable properties. Under this definition, we find that plasticity is well thought of as the mirror of empowerment: The two concepts are defined using the same measure, with only the direction of influence reversed. Our main result establishes a tension between the plasticity and empowerment of an agent, suggesting that agent design needs to be mindful of both characteristics. We explore the implications of these findings, and suggest that plasticity, empowerment, and their relationship are essential to understanding agency
Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps
Matt Schmittle
Rohan Baijal
Nathan Hatch
Rosario Scalise
Mateo Guaman Castro
Sidharth Talia
Byron Boots
Siddhartha Srinivasa
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (see more)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,
Representation Learning via Non-Contrastive Mutual Information
Zhaohan Daniel Guo
Bernardo Avila Pires
Dale Schuurmans
Bo Dai
Cracking the Code of Action: A Generative Approach to Affordances for Reinforcement Learning
Agents that can autonomously navigate the web through a graphical user interface (GUI) using a unified action space (e.g., mouse and keyboar… (see more)d actions) can require very large amounts of domain-specific expert demonstrations to achieve good performance. Low sample efficiency is often exacerbated in sparse-reward and large-action-space environments, such as a web GUI, where only a few actions are relevant in any given situation. In this work, we consider the low-data regime, with limited or no access to expert behavior. To enable sample-efficient learning, we explore the effect of constraining the action space through
Agency Is Frame-Dependent
David Abel
Andre Barreto
Michael Bowling
Will Dabney
Shi Dong
Steven Stenberg Hansen
Anna Harutyunyan
Clare Lyle
Georgios Piliouras
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science… (see more), and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.
Optimizing Return Distributions with Distributional Dynamic Programming
Bernardo Avila Pires
Mark Rowland
Diana Borsa
Zhaohan Daniel Guo
Andre Barreto
David Abel
Remi Munos
Will Dabney
We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standar… (see more)d reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step). We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems, for example maximizing conditional value-at-risk, and homeostatic regulation. To highlight the practical potential of stock-augmented return distribution optimization and distributional DP, we combine the core ideas of distributional value iteration with the deep RL agent DQN, and empirically evaluate it for solving instances of the applications discussed.
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
Zhaohan Daniel Guo
Bernardo Avila Pires
Yunhao Tang
Clare Lyle
Mark Rowland
Nicolas Heess
Diana Borsa
Arthur Guez
Will Dabney