Shagun Sodhani

Angelos Filos

Marta Kwiatkowska

Joelle Pineau

Yarin Gal

Doina Precup

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. … (see more)In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.

2020-11-20

Proceedings of the 37th International Conference on Machine Learning (published)

proceedings.mlr.press

Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP

Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (see more)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.

2020-07-13

ArXiv (preprint)

Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives

Anirudh Goyal

Jonathan Binas

Xue Bin Peng

Sergey Levine

Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavi… (see more)or. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decompose a policy into lower-level primitives or options, and a higher-level meta-policy that triggers the appropriate behaviors for a given situation. However, the meta-policy must still produce appropriate decisions in all states. In this work, we propose a policy design that decomposes into primitives, similarly to hierarchical reinforcement learning, but without a high-level meta-policy. Instead, each primitive can decide for themselves whether they wish to act in the current state. We use an information-theoretic mechanism for enabling this decentralized decision: each primitive chooses how much information it needs about the current state to make a decision and the primitive that requests the most information about the current state acts in the world. The primitives are regularized to use as little information as possible, which leads to natural competition and specialization. We experimentally demonstrate that this policy architecture improves over both flat and hierarchical policies in terms of generalization.

2019-12-31

ICLR.cc/2020/Conference (poster)

openreview.net

Toward Training Recurrent Neural Networks for Lifelong Learning.

Sarath Chandar

Catastrophic forgetting and capacity saturation are the central challenges of any parametric lifelong learning system. In this work, we stud… (see more)y these challenges in the context of sequential supervised learning with an emphasis on recurrent neural networks. To evaluate the models in the lifelong learning setting, we propose a curriculum-based, simple, and intuitive benchmark where the models are trained on tasks with increasing levels of difficulty. To measure the impact of catastrophic forgetting, the model is tested on all the previous tasks as it completes any task. As a step toward developing true lifelong learning systems, we unify gradient episodic memory (a catastrophic forgetting alleviation approach) and Net2Net (a capacity expansion approach). Both models are proposed in the context of feedforward networks, and we evaluate the feasibility of using them for recurrent networks. Evaluation on the proposed benchmark shows that the unified model is more suitable than the constituent models for lifelong learning setting.

2019-12-31

Neural Computation (published)

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

Miles Brundage

Shahar Avin

Jasmine Wang

Haydn Belfield

Gretchen Krueger

Gillian Hadfield

Heidy Khlaaf

Jingying Yang

Helen Toner

Ruth Fong

Tegan Maharaj

Pang Wei Koh

Sara Hooker

Jade Leung

Andrew Trask

Emma Bluemke

Jonathan Lebensold

Cullen O'Keefe

Mark Koren

Théo Ryffel … (see 39 more)

JB Rubinovitz

Tamay Besiroglu

Federica Carugati

Jack Clark

Peter Eckersley

Sarah de Haas

Maritza Johnson

Ben Laurie

Alex Ingerman

Igor Krawczuk

Amanda Askell

Rosario Cammarota

Andrew Lohn

David Krueger

Charlotte Stix

Peter Henderson

Logan Graham

Carina Prunkl

Bianca Martin

Elizabeth Seger

Noa Zilberman

Seán Ó hÉigeartaigh

Frens Kroeger

Girish Sastry

Rebecca Kagan

Adrian Weller

Brian Tse

Elizabeth Barnes

Allan Dafoe

Paul Scharre

Ariel Herbert-Voss

Martijn Rasser

Carrick Flynn

Thomas Krendl Gilbert

Lisa Dyer

Saif Khan

Markus Anderljung

With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and … (see more)recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.

2019-12-31

arXiv (preprint)

Learning Powerful Policies by Using Consistent Dynamics Model

Sergey Levine

Model-based Reinforcement Learning approaches have the promise of being sample efficient. Much of the progress in learning dynamics models i… (see more)n RL has been made by learning models via supervised learning. But traditional model-based approaches lead to `compounding errors' when the model is unrolled step by step. Essentially, the state transitions that the learner predicts (by unrolling the model for multiple steps) and the state transitions that the learner experiences (by acting in the environment) may not be consistent. There is enough evidence that humans build a model of the environment, not only by observing the environment but also by interacting with the environment. Interaction with the environment allows humans to carry out experiments: taking actions that help uncover true causal relationships which can be used for building better dynamics models. Analogously, we would expect such interactions to be helpful for a learning agent while learning to model the environment dynamics. In this paper, we build upon this intuition by using an auxiliary cost function to ensure consistency between what the agent observes (by acting in the real world) and what it imagines (by acting in the `learned' world). We consider several tasks - Mujoco based control tasks and Atari games - and show that the proposed approach helps to train powerful policies and better dynamics models.

2019-06-10

ArXiv (preprint)

Environments for Lifelong Reinforcement Learning

To achieve general artificial intelligence, reinforcement learning (RL) agents should learn not only to optimize returns for one specific ta… (see more)sk but also to constantly build more complex skills and scaffold their knowledge about the world, without forgetting what has already been learned. In this paper, we discuss the desired characteristics of environments that can support the training and evaluation of lifelong reinforcement learning agents, review existing environments from this perspective, and propose recommendations for devising suitable environments in the future.

2018-11-25

ArXiv (preprint)

On Training Recurrent Neural Networks for Lifelong Learning

A. Chandar

Catastrophic forgetting and capacity saturation are the central challenges of any parametric lifelong learning system. In this work, we stud… (see more)y these challenges in the context of sequential supervised learning with emphasis on recurrent neural networks. To evaluate the models in the lifelong learning setting, we propose a curriculum-based, simple, and intuitive benchmark where the models are trained on tasks with increasing levels of difficulty. To measure the impact of catastrophic forgetting, the model is tested on all the previous tasks as it completes any task. As a step towards developing true lifelong learning systems, we unify Gradient Episodic Memory (a catastrophic forgetting alleviation approach) and Net2Net(a capacity expansion approach). Both these models are proposed in the context of feedforward networks and we evaluate the feasibility of using them for recurrent networks. Evaluation on the proposed benchmark shows that the unified model is more suitable than the constituent models for lifelong learning setting.

2018-11-15

ArXiv (preprint)