Portrait de Khimya Khetarpal

Khimya Khetarpal

Membre affilié
Chercheuse scientifique, Google DeepMind

Biographie

Khimya Khetarpal est chercheuse chez Google Deepmind. Elle a obtenu son doctorat en informatique au Reasoning and Learning Lab de l'Université McGill et à Mila, csupervisée par Doina Precup. Elle s'intéresse de manière générale à l'intelligence artificielle et à l'apprentissage par renforcement. Ses recherches actuelles portent sur la manière dont les agents RL apprennent à représenter efficacement les connaissances du monde, à planifier avec elles et à s'adapter aux changements au fil du temps. Les travaux de Khimya ont été publiés dans les principales revues et conférences sur l'intelligence artificielle, notamment NeurIPS, ICML, AAAI, AISTATS, ICLR, The Knowledge Engineering Review, ACM, JAIR et TMLR. Ses travaux ont également été présentés dans la MIT Technology Review. Elle a été reconnue comme examinatrice experte de TMLR en 2023, l'une des étoiles montantes d'EECS 2020, finaliste du concours Three Minute Thesis (3MT) d'AAAI 2019, sélectionnée pour le consortium doctoral d'AAAI 2019, et a reçu le prix du meilleur article (3e prix) pour un atelier ICML 2018 sur l'apprentissage tout au long de la vie. Tout au long de sa carrière, elle s'est efforcée d'être une mentore active par le biais d'initiatives telles que la cofondation de l'initiative de conseil par les pairs Mila, l'enseignement et l'assistance au AI4Good Lab, le bénévolat à Skype A Scientist et le mentorat à FIRST Robotics.

Ses recherches visent à (1) comprendre le comportement intelligent qui fait le lien entre l'action et la perception en s'appuyant sur les fondements théoriques de l'apprentissage par renforcement, et (2) construire des agents d'intelligence artificielle pour représenter efficacement la connaissance du monde, planifier avec elle et s'adapter aux changements au fil du temps grâce à l'apprentissage et à l'interaction.

Elle aborde actuellement ces questions dans les directions de recherche suivantes :

- Attention sélective pour une adaptation et une robustesse rapides

- Apprentissage des abstractions et des affordances

- Découverte et apprentissage par renforcement continu

Étudiants actuels

Maîtrise recherche - McGill University
Superviseur⋅e principal⋅e :

Publications

Disentangling the Causes of Plasticity Loss in Neural Networks
Clare Lyle
Zeyu Zheng
Hado van Hasselt
Razvan Pascanu
James Martens
Will Dabney
Toward Human-AI Alignment in Large-Scale Multi-Player Games
Sugandha Sharma
Guy Davidson
Anssi Kanervisto
Udit Arora
Katja Hofmann
Ida Momennejad
Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a … (voir plus)method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.
Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels
Thomas Jiralerspong
Flemming Kondrup
The ability to plan at many different levels of abstraction enables agents to envision the long-term repercussions of their decisions and th… (voir plus)us enables sample-efficient learning. This becomes particularly beneficial in complex environments from high-dimensional state space such as pixels, where the goal is distant and the reward sparse. We introduce Forecaster, a deep hierarchical reinforcement learning approach which plans over high-level goals leveraging a temporally abstract world model. Forecaster learns an abstract model of its environment by modelling the transitions dynamics at an abstract level and training a world model on such transition. It then uses this world model to choose optimal high-level goals through a tree-search planning procedure. It additionally trains a low-level policy that learns to reach those goals. Our method not only captures building world models with longer horizons, but also, planning with such models in downstream tasks. We empirically demonstrate Forecaster's potential in both single-task learning and generalization to new tasks in the AntMaze domain.
POMRL: No-Regret Learning-to-Plan with Increasing Horizons
Claire Vernade
Brendan O'Donoghue
Satinder Singh
Tom Zahavy
We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented w… (voir plus)ith a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.
Discovering Object-Centric Generalized Value Functions From Pixels
Somjit Nath
Gopeshh Raaj Subbaraj
Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using ha… (voir plus)nd-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent"question"functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.
Towards Continual Reinforcement Learning: A Review and Perspectives
The Paradox of Choice: On the Role of Attention in Hierarchical Reinforcement Learning
Andrei Cristian Nica
Decision-making AI agents are often faced with two important challenges: the depth of the planning horizon, and the branching factor due to … (voir plus)having many choices. Hierarchical reinforcement learning methods aim to solve the first problem, by providing shortcuts that skip over multiple time steps. To cope with the breadth, it is desirable to restrict the agent's attention at each step to a reasonable number of possible choices. The concept of affordances (Gibson, 1977) suggests that only certain actions are feasible in certain states. In this work, we first characterize "affordances" as a "hard" attention mechanism that strictly limits the available choices of temporally extended options. We then investigate the role of hard versus soft attention in training data collection, abstract value learning in long-horizon tasks, and handling a growing number of choices. To this end, we present an online, model-free algorithm to learn affordances that can be used to further learn subgoal options. Finally, we identify and empirically demonstrate the settings in which the "paradox of choice" arises, i.e. when having fewer but more meaningful choices improves the learning speed and performance of a reinforcement learning agent.
Sequoia: A Software Framework to Unify Continual Learning Research
Fabrice Normandin
Florian Golemo
Oleksiy Ostapenko
Pau Rodriguez
Matthew D Riemer
J. Hurtado
Lucas Cecchi
Dominic Zhao
Ryan Lindeborg
Timothee LESORT
David Vazquez
Massimo Caccia
The field of Continual Learning (CL) seeks to develop algorithms that accumulate knowledge and skills over time through interaction with non… (voir plus)-stationary environments. In practice, a plethora of evaluation procedures (settings) and algorithmic solutions (methods) exist, each with their own potentially disjoint set of assumptions. This variety makes measuring progress in CL difficult. We propose a taxonomy of settings, where each setting is described as a set of assumptions. A tree-shaped hierarchy emerges from this view, where more general settings become the parents of those with more restrictive assumptions. This makes it possible to use inheritance to share and reuse research, as developing a method for a given setting also makes it directly applicable onto any of its children. We instantiate this idea as a publicly available software framework called Sequoia, which features a wide variety of settings from both the Continual Supervised Learning (CSL) and Continual Reinforcement Learning (CRL) domains. Sequoia also includes a growing suite of methods which are easy to extend and customize, in addition to more specialized methods from external libraries. We hope that this new paradigm and its first implementation can help unify and accelerate research in CL. You can help us grow the tree by visiting (this GitHub URL).
Self-Supervised Attention-Aware Reinforcement Learning
Visual saliency has emerged as a major visualization tool for interpreting deep reinforcement learning (RL) agents. However, much of the exi… (voir plus)sting research uses it as an analyzing tool rather than an inductive bias for policy learning. In this work, we use visual attention as an inductive bias for RL agents. We propose a novel self-supervised attention learning approach which can 1. learn to select regions of interest without explicit annotations, and 2. act as a plug for existing deep RL methods to improve the learning performance. We empirically show that the self-supervised attention-aware deep RL methods outperform the baselines in the context of both the rate of convergence and performance. Furthermore, the proposed self-supervised attention is not tied with specific policies, nor restricted to a specific scene. We posit that the proposed approach is a general self-supervised attention module for multi-task learning and transfer learning, and empirically validate the generalization ability of the proposed method. Finally, we show that our method learns meaningful object keypoints highlighting improvements both qualitatively and quantitatively.
Variance Penalized On-Policy and Off-Policy Actor-Critic
Arushi Jain
Gandharv Patil
Ayush Jain
Safe option-critic: learning safety in the option-critic architecture
Abstract Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications b… (voir plus)ut also facilitates a better understanding of an agent’s decisions. We tackle this problem in the options framework (Sutton, Precup & Singh, 1999), a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the quantitative and qualitative behaviours of the proposed approach in a tabular grid world, continuous-state puddle world, and three games from the Arcade Learning Environment: Ms. Pacman, Amidar, and Q*Bert. Our approach achieves a reduction in the variance of return, boosts performance in environments with intrinsic variability in the reward structure, and compares favourably both with primitive actions and with risk-neutral options.
Learning Robust State Abstractions for Hidden-Parameter Block MDPs
Amy Zhang
Shagun Sodhani