Portrait de Khimya Khetarpal

Khimya Khetarpal

Membre affilié
Chercheuse scientifique, Google DeepMind
Sujets de recherche
Apprentissage de représentations
Apprentissage en ligne
Apprentissage par renforcement
Théorie de l'apprentissage automatique

Biographie

Khimya Khetarpal est chercheuse chez Google Deepmind. Elle a obtenu son doctorat en informatique au Reasoning and Learning Lab de l'Université McGill et à Mila, csupervisée par Doina Precup. Elle s'intéresse de manière générale à l'intelligence artificielle et à l'apprentissage par renforcement. Ses recherches actuelles portent sur la manière dont les agents RL apprennent à représenter efficacement les connaissances du monde, à planifier avec elles et à s'adapter aux changements au fil du temps. Les travaux de Khimya ont été publiés dans les principales revues et conférences sur l'intelligence artificielle, notamment NeurIPS, ICML, AAAI, AISTATS, ICLR, The Knowledge Engineering Review, ACM, JAIR et TMLR. Ses travaux ont également été présentés dans la MIT Technology Review. Elle a été reconnue comme examinatrice experte de TMLR en 2023, l'une des étoiles montantes d'EECS 2020, finaliste du concours Three Minute Thesis (3MT) d'AAAI 2019, sélectionnée pour le consortium doctoral d'AAAI 2019, et a reçu le prix du meilleur article (3e prix) pour un atelier ICML 2018 sur l'apprentissage tout au long de la vie. Tout au long de sa carrière, elle s'est efforcée d'être une mentore active par le biais d'initiatives telles que la cofondation de l'initiative de conseil par les pairs Mila, l'enseignement et l'assistance au AI4Good Lab, le bénévolat à Skype A Scientist et le mentorat à FIRST Robotics.

Ses recherches visent à (1) comprendre le comportement intelligent qui fait le lien entre l'action et la perception en s'appuyant sur les fondements théoriques de l'apprentissage par renforcement, et (2) construire des agents d'intelligence artificielle pour représenter efficacement la connaissance du monde, planifier avec elle et s'adapter aux changements au fil du temps grâce à l'apprentissage et à l'interaction.

Elle aborde actuellement ces questions dans les directions de recherche suivantes :

- Attention sélective pour une adaptation et une robustesse rapides

- Apprentissage des abstractions et des affordances

- Découverte et apprentissage par renforcement continu

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :

Publications

Difference-Aware Retrieval Policies for Imitation Learning
Quinn Pfeifer
Ethan Pronovost
Paarth Shah
Siddhartha Srinivasa
Abhishek Gupta
Parametric imitation learning via behavior cloning can suffer from poor generalization to out-of-distribution states due to compounding erro… (voir plus)rs during deployment. We show that reusing the training data during inference via a semi-parametric retrieval-based imitation learning approach can alleviate this challenge. We present Difference-Aware Retrieval Policies for Imitation Learning (DARP), a semi-parametric retrieval-based imitation learning approach that addresses this limitation by reparameterizing the imitation learning problem in terms of local neighborhood structure rather than direct state-to-action mappings. Instead of learning a global policy, DARP trains a model to predict actions based on
Preventing Learning Stagnation in PPO by Scaling to 1 Million Parallel Environments
Michael Beukman
Zeyu Zheng
Will Dabney
Jakob Foerster
Michael Dennis
Clare Lyle
Plateaus, where an agent's performance stagnates at a suboptimal level, are a common problem in deep on-policy RL. Focusing on PPO due to it… (voir plus)s widespread adoption, we show that plateaus in certain regimes arise not because of known exploration, capacity, or optimization challenges, but because sample-based estimates of the loss eventually become poor proxies for the true objective over the course of training. As a recap, PPO switches between sampling rollouts from several parallel environments online using the current policy (which we call the outer loop) and performing repeated minibatch SGD steps against this offline dataset (the inner loop). In our work we consider only the outer loop, and conceptually model it as stochastic optimization. The step size is then controlled by the regularization strength towards the previous policy and the gradient noise by the number of samples collected between policy update steps. This model predicts that performance will plateau at a suboptimal level if the outer step size is too large relative to the noise. Recasting PPO in this light makes it clear that there are two ways to address this particular type of learning stagnation: either reduce the step size or increase the number of samples collected between updates. We first validate the predictions of our model and investigate how hyperparameter choices influence the step size and update noise, concluding that increasing the number of parallel environments is a simple and robust way to reduce both factors. Next, we propose a recipe for how to co-scale the other hyperparameters when increasing parallelization, and show that incorrectly doing so can lead to severe performance degradation. Finally, we vastly outperform prior baselines in a complex open-ended domain by scaling PPO to more than 1M parallel environments, thereby enabling monotonic performance improvement up to one trillion transitions.
Affordances Enable Partial World Modeling with LLMs
Gheorghe Comanici
Jonathan Richens
Jeremy Shar
Fei Xia
Laurent Orseau
Aleksandra Faust
Robust Intervention Learning from Emergency Stop Interventions
Ethan Pronovost
Siddhartha Srinivasa
Plasticity as the Mirror of Empowerment
David Abel
Michael Bowling
Andre Barreto
Will Dabney
Shi Dong
Steven Hansen
Anna Harutyunyan
Clare Lyle
Georgios Piliouras
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity … (voir plus)is captured by empowerment, which has served as a vital framing concept across artificial intelligence and cognitive science. This former capacity, however, is equally foundational: In what ways, and to what extent, can an agent be influenced by what it observes? In this paper, we ground this concept in a universal agent-centric measure that we refer to as plasticity, and reveal a fundamental connection to empowerment. Following a set of desiderata on a suitable definition, we define plasticity using a new information-theoretic quantity we call the generalized directed information. We show that this new quantity strictly generalizes the directed information introduced by Massey (1990) while preserving all of its desirable properties. Under this definition, we find that plasticity is well thought of as the mirror of empowerment: The two concepts are defined using the same measure, with only the direction of influence reversed. Our main result establishes a tension between the plasticity and empowerment of an agent, suggesting that agent design needs to be mindful of both characteristics. We explore the implications of these findings, and suggest that plasticity, empowerment, and their relationship are essential to understanding agency
Long Range Navigator (LRN): Extending robot planning horizons beyond metric maps
Matt Schmittle
Rohan Baijal
Nathan Hatch
Rosario Scalise
Mateo Guaman Castro
Sidharth Talia
Byron Boots
Siddhartha Srinivasa
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (voir plus)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,
Representation Learning via Non-Contrastive Mutual Information
Zhaohan Daniel Guo
Bernardo Avila Pires
Dale Schuurmans
Bo Dai
Cracking the Code of Action: A Generative Approach to Affordances for Reinforcement Learning
Agents that can autonomously navigate the web through a graphical user interface (GUI) using a unified action space (e.g., mouse and keyboar… (voir plus)d actions) can require very large amounts of domain-specific expert demonstrations to achieve good performance. Low sample efficiency is often exacerbated in sparse-reward and large-action-space environments, such as a web GUI, where only a few actions are relevant in any given situation. In this work, we consider the low-data regime, with limited or no access to expert behavior. To enable sample-efficient learning, we explore the effect of constraining the action space through
Agency Is Frame-Dependent
David Abel
Andre Barreto
Michael Bowling
Will Dabney
Shi Dong
Steven Stenberg Hansen
Anna Harutyunyan
Clare Lyle
Georgios Piliouras
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science… (voir plus), and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.
Optimizing Return Distributions with Distributional Dynamic Programming
Bernardo Avila Pires
Mark Rowland
Diana Borsa
Zhaohan Daniel Guo
Andre Barreto
David Abel
Remi Munos
Will Dabney
We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standar… (voir plus)d reinforcement learning as a special case. Previous distributional DP methods could optimize the same class of expected utilities as classic DP. To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL, where the MDP state is augmented with a statistic of the rewards obtained so far (since the first time step). We find that a number of recently studied problems can be formulated as stock-augmented return distribution optimization, and we show that we can use distributional DP to solve them. We analyze distributional value and policy iteration, with bounds and a study of what objectives these distributional DP methods can or cannot optimize. We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems, for example maximizing conditional value-at-risk, and homeostatic regulation. To highlight the practical potential of stock-augmented return distribution optimization and distributional DP, we combine the core ideas of distributional value iteration with the deep RL agent DQN, and empirically evaluate it for solving instances of the applications discussed.
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
Zhaohan Daniel Guo
Bernardo Avila Pires
Yunhao Tang
Clare Lyle
Mark Rowland
Nicolas Heess
Diana Borsa
Arthur Guez
Will Dabney