Portrait of Khimya Khetarpal

Khimya Khetarpal

Affiliate Member
Research Scientist, Google DeepMind


Khimya Khetarpal is a Research Scientist at Google Deepmind. She earned her PhD in Computer Science from the Reasoning and Learning Lab at McGill University and Mila, advised by Doina Precup. She is broadly interested in artificial intelligence and reinforcement learning. Her current research interests focus on how RL agents learn to efficiently represent the world's knowledge, plan with it, and adapt to changes over time. Khimya’s work has appeared in leading AI journals and conferences including NeurIPS, ICML, AAAI, AISTATS, ICLR, The Knowledge Engineering Review, ACM, JAIR and TMLR. Her work has also been featured in MIT Technology Review. She was recognized as a TMLR expert reviewer in 2023, one of the Rising Stars in EECS 2020, a finalist for Three Minute Thesis (3MT) competition in AAAI 2019, selected for the Doctoral Consortium at AAAI 2019, and awarded Best Paper Award (3rd Price) for an ICML 2018 workshop on lifelong learning. Throughout her career, she has sought to actively mentor through initiatives such as co-founding the Mila peer advising initiative, teaching and assisting AI4Good Lab, volunteering at Skype A Scientist, and mentoring at FIRST Robotics.

Her research aims to (1) understand intelligent behavior that bridges both action and perception grounded in theoretical foundations of reinforcement learning, and (2) build AI agents to efficiently represent the world's knowledge, plan with it, and adapt to changes over time through learning and interaction.

She currently approaches this with the following research directions:

- Selective Attention for Fast Adaptation and Robustness

- Learning Abstractions and Affordances

- Discovery and Continual Reinforcement Learning

Current Students

Master's Research - McGill University
Principal supervisor :


Disentangling the Causes of Plasticity Loss in Neural Networks
Clare Lyle
Zeyu Zheng
Hado van Hasselt
Razvan Pascanu
James Martens
Will Dabney
Toward Human-AI Alignment in Large-Scale Multi-Player Games
Sugandha Sharma
Guy Davidson
Anssi Kanervisto
Udit Arora
Katja Hofmann
Ida Momennejad
Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a … (see more)method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.
Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels
Thomas Jiralerspong
Flemming Kondrup
The ability to plan at many different levels of abstraction enables agents to envision the long-term repercussions of their decisions and th… (see more)us enables sample-efficient learning. This becomes particularly beneficial in complex environments from high-dimensional state space such as pixels, where the goal is distant and the reward sparse. We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals leveraging a temporally abstract world model. Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level and training a world model on such transition. It then uses this world model to choose optimal high-level goals through a tree-search planning procedure. It additionally trains a low-level policy that learns to reach those goals. Our method not only captures building world models with longer horizons, but also, planning with such models in downstream tasks. We empirically demonstrate Forecaster's potential in both single-task learning and generalization to new tasks in the AntMaze domain.
POMRL: No-Regret Learning-to-Plan with Increasing Horizons
Claire Vernade
Brendan O'Donoghue
Satinder Singh
Tom Zahavy
We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented w… (see more)ith a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.
Discovering Object-Centric Generalized Value Functions From Pixels
Somjit Nath
Gopeshh Raaj Subbaraj
Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using ha… (see more)nd-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent"question"functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.
Towards Continual Reinforcement Learning: A Review and Perspectives
The Paradox of Choice: On the Role of Attention in Hierarchical Reinforcement Learning
Andrei Cristian Nica
Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to … (see more)having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we first characterize "affordances" as a "hard" attention mechanism that strictly limits the available choices of temporally extended options. We then investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. To this end, we present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. Finally, we identify and empirically demonstrate the settings in which the "paradox of choice" arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.
Sequoia: A Software Framework to Unify Continual Learning Research
Fabrice Normandin
Florian Golemo
Oleksiy Ostapenko
Pau Rodriguez
Matthew D Riemer
J. Hurtado
Lucas Cecchi
Dominic Zhao
Ryan Lindeborg
Timothee LESORT
David Vazquez
Massimo Caccia
The field of Continual Learning (CL) seeks to develop algorithms that accumulate knowledge and skills over time through interaction with non… (see more)-stationary environments. In practice, a plethora of evaluation procedures (settings) and algorithmic solutions (methods) exist, each with their own potentially disjoint set of assumptions. This variety makes measuring progress in CL difficult. We propose a taxonomy of settings, where each setting is described as a set of assumptions. A tree-shaped hierarchy emerges from this view, where more general settings become the parents of those with more restrictive assumptions. This makes it possible to use inheritance to share and reuse research, as developing a method for a given setting also makes it directly applicable onto any of its children. We instantiate this idea as a publicly available software framework called Sequoia, which features a wide variety of settings from both the Continual Supervised Learning (CSL) and Continual Reinforcement Learning (CRL) domains. Sequoia also includes a growing suite of methods which are easy to extend and customize, in addition to more specialized methods from external libraries. We hope that this new paradigm and its first implementation can help unify and accelerate research in CL. You can help us grow the tree by visiting (this GitHub URL).
Self-Supervised Attention-Aware Reinforcement Learning
Visual saliency has emerged as a major visualization tool for interpreting deep reinforcement learning (RL) agents. However, much of the exi… (see more)sting research uses it as an analyzing tool rather than an inductive bias for policy learning. In this work, we use visual attention as an inductive bias for RL agents. We propose a novel self-supervised attention learning approach which can 1. learn to select regions of interest without explicit annotations, and 2. act as a plug for existing deep RL methods to improve the learning performance. We empirically show that the self-supervised attention-aware deep RL methods outperform the baselines in the context of both the rate of convergence and performance. Furthermore, the proposed self-supervised attention is not tied with specific policies, nor restricted to a specific scene. We posit that the proposed approach is a general self-supervised attention module for multi-task learning and transfer learning, and empirically validate the generalization ability of the proposed method. Finally, we show that our method learns meaningful object keypoints highlighting improvements both qualitatively and quantitatively.
Variance Penalized On-Policy and Off-Policy Actor-Critic
Arushi Jain
Gandharv Patil
Ayush Jain
Learning Robust State Abstractions for Hidden-Parameter Block MDPs
Amy Zhang
Shagun Sodhani
Temporally Abstract Partial Models
Zafarali Ahmed
Gheorghe Comanici
Humans and animals have the ability to reason and make predictions about different courses of action at many time scales. In reinforcement l… (see more)earning, option models (Sutton, Precup \& Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in a given situation, sometimes termed affordable actions. In this paper, we define a notion of affordances for options, and develop temporally abstract partial option models, that take into account the fact that an option might be affordable only in certain situations. We analyze the trade-offs between estimation and approximation error in planning and learning when using such models, and identify some interesting special cases. Additionally, we empirically demonstrate the ability to learn both affordances and partial option models online resulting in improved sample efficiency and planning time in the Taxi domain.