Khimya Khetarpal

Temporally Abstract Partial Models

Zafarali Ahmed

Gheorghe Comanici

Humans and animals have the ability to reason and make predictions about different courses of action at many time scales. In reinforcement l… (see more)earning, option models (Sutton, Precup \& Singh, 1999; Precup, 2000) provide the framework for this kind of temporally abstract prediction and reasoning. Natural intelligent agents are also able to focus their attention on courses of action that are relevant or feasible in a given situation, sometimes termed affordable actions. In this paper, we define a notion of affordances for options, and develop temporally abstract partial option models, that take into account the fact that an option might be affordable only in certain situations. We analyze the trade-offs between estimation and approximation error in planning and learning when using such models, and identify some interesting special cases. Additionally, we empirically demonstrate the ability to learn both affordances and partial option models online resulting in improved sample efficiency and planning time in the Taxi domain.

openreview.net

What can I do here? A Theory of Affordances in Reinforcement Learning

Khimya Khetarpal

Zafarali Ahmed

Gheorghe Comanici

David Abel

Doina Precup

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (published)

proceedings.mlr.press

arxiv.org

Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP

Amy Zhang

Shagun Sodhani

Khimya Khetarpal

Joelle Pineau

Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (see more)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.

2020-07-14

ArXiv (preprint)

arxiv.org

Value Preserving State-Action Abstractions

David Abel

Nathan Umbanhowar

Khimya Khetarpal

Dilip Arumugam

Doina Precup

Michael L. Littman

Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information… (see more), potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.

2020-06-03

International Conference on Artificial Intelligence and Statistics (published)

dblp.uni-trier.de

Value Preserving State-Action Abstractions

David Abel

Nathan Umbanhowar

Khimya Khetarpal

Dilip Arumugam

Doina Precup

Michael L. Littman

Abstraction can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information… (see more), potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.ion can improve the sample efficiency of reinforcement learning. However, the process of abstraction inherently discards information, potentially compromising an agent’s ability to represent high-value policies. To mitigate this, we here introduce combinations of state abstractions and options that are guaranteed to preserve the representation of near-optimal policies. We first define φ-relative options, a general formalism for analyzing the value loss of options paired with a state abstraction, and present necessary and sufficient conditions for φ-relative options to preserve near-optimal behavior in any finite Markov Decision Process. We further show that, under appropriate assumptions, φ-relative options can be composed to induce hierarchical abstractions that are also guaranteed to represent high-value policies.

2020-06-03

International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Options of Interest: Temporal Abstraction with Interest Functions

Khimya Khetarpal

Martin Klissarov

Maxime Chevalier-Boisvert

Pierre-Luc Bacon

Doina Precup

Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. Th… (see more)e options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. We investigate how interest functions can be leveraged to learn interpretable and reusable temporal abstractions. We demonstrate the efficacy of the proposed approach through quantitative and qualitative results, in both discrete and continuous environments.

2020-04-03

Proceedings of the AAAI Conference on Artificial Intelligence (published)

doi.org

arxiv.org

Learning Options with Interest Functions

Khimya Khetarpal

Doina Precup

Learning temporal abstractions which are partial solutions to a task and could be reused for solving other tasks is an ingredient that can h… (see more)elp agents to plan and learn efficiently. In this work, we tackle this problem in the options framework. We aim to autonomously learn options which are specialized in different state space regions by proposing a notion of interest functions, which generalizes initiation sets from the options framework for function approximation. We build on the option-critic framework to derive policy gradient theorems for interest functions, leading to a new interest-option-critic architecture.

2019-07-17

Proceedings of the AAAI Conference on Artificial Intelligence (published)

doi.org

Environments for Lifelong Reinforcement Learning

Khimya Khetarpal

Shagun Sodhani

Sarath Chandar Anbil Parthipan

Doina Precup

To achieve general artificial intelligence, reinforcement learning (RL) agents should learn not only to optimize returns for one specific ta… (see more)sk but also to constantly build more complex skills and scaffold their knowledge about the world, without forgetting what has already been learned. In this paper, we discuss the desired characteristics of environments that can support the training and evaluation of lifelong reinforcement learning agents, review existing environments from this perspective, and propose recommendations for devising suitable environments in the future.

2018-11-26

ArXiv (preprint)

arxiv.org

Attend Before you Act: Leveraging human visual attention for continual learning

Khimya Khetarpal

Doina Precup

When humans perform a task, such as playing a game, they selectively pay attention to certain parts of the visual input, gathering relevant … (see more)information and sequentially combining it to build a representation from the sensory data. In this work, we explore leveraging where humans look in an image as an implicit indication of what is salient for decision making. We build on top of the UNREAL architecture in DeepMind Lab's 3D navigation maze environment. We train the agent both with original images and foveated images, which were generated by overlaying the original images with saliency maps generated using a real-time spectral residual technique. We investigate the effectiveness of this approach in transfer learning by measuring performance in the context of noise in the environment.

2018-07-25

ArXiv (preprint)

arxiv.org

RE-EVALUATE: Reproducibility in Evaluating Reinforcement Learning Algorithms

Khimya Khetarpal

Zafarali Ahmed

Andre Cianflone

Riashat Islam

Joelle Pineau

Reinforcement learning (RL) has recently achieved tremendous success in solving complex tasks. Careful considerations are made towards repro… (see more)ducible research in machine learning. Reproducibility in RL often becomes more difficult, due to the lack of standard evaluation method and detailed methodology for algorithms and comparisons with existing work. In this work, we highlight key differences in evaluation in RL compared to supervised learning, and discuss specific issues that are often non-intuitive for newcomers. We study the importance of reproducibility in evaluation in RL, and propose an evaluation pipeline that can be decoupled from the algorithm code. We hope such an evaluation pipeline can be standardized, as a step towards robust and reproducible research in RL.

2018-06-27

ICML.cc/2018/RML (poster)

openreview.net

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Khimya Khetarpal

Biography

Current Students

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Khimya Khetarpal

Biography

Current Students

Publications