Portrait de Shagun Sodhani n'est pas disponible

Shagun Sodhani

Alumni

Publications

Robust Policy Learning over Multiple Uncertainty Sets
Annie Xie
Chelsea Finn
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments. While system identification methods prov… (voir plus)ide a way to infer the variation from online experience, they can fail in settings where fast identification is not possible. Another dominant approach is robust RL which produces a policy that can handle worst-case scenarios, but these methods are generally designed to achieve robustness to a single uncertainty set that must be specified at train time. Towards a more general solution, we formulate the multi-set robustness problem to learn a policy robust to different perturbation sets. We then design an algorithm that enjoys the benefits of both system identification and robust RL: it reduces uncertainty where possible given a few interactions, but can still act robustly with respect to the remaining uncertainty. On a diverse set of control tasks, our approach demonstrates improved worst-case performance on new environments compared to prior methods based on system identification and on robust RL alone.
Learning Robust State Abstractions for Hidden-Parameter Block MDPs
A Simple and Effective Model for Multi-Hop Question Generation
Jimmy Lei Ba
Jamie Ryan Kiros
Geoffrey E Hin-602
Peter W. Battaglia
Jessica Blake
Chandler Hamrick
Vic-613 tor Bapst
Alvaro Sanchez
Vinicius Zambaldi
M. Malinowski
Andrea Tacchetti
David Raposo
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam … (voir 72 de plus)
Girish Sastry
William L. Hamilton
Clutrr
Nitish Srivastava
Geoffrey Hinton
Alex Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov. 2014
Gabriel Stanovsky
Julian Michael
Luke Zettlemoyer
Dan Su
Yan Xu
Wenliang Dai
Ziwei Ji
Tiezheng Yu
Minghao Tu
Kevin Huang
Guangtao Wang
Jing Huang
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin. 2017
Attention
Petar Veliˇckovi´c
Guillem Cucurull
Arantxa Casanova
Pietro Lio’
Johannes Welbl
Pontus Stenetorp
Yonghui Wu
Mike Schuster
Quoc Zhifeng Chen
Mohammad Le
Wolfgang Norouzi
Macherey
M. Krikun
Yuan Cao
Qin Gao
William W. Cohen
Jianxing Yu
Xiaojun Quan
Qinliang Su
Jian Yin
Yuyu Zhang
Hanjun Dai
Zornitsa Kozareva
Cheng Zhao
Chenyan Xiong
Corby Rosset
Xia
Paul Song
Bennett Saurabh
Tiwary
Yao Zhao
Xiaochuan Ni
Yuanyuan Ding
Qingyu Zhou
Nan Yang
Furu Wei
Chuanqi Tan
Previous research on automated question gen-001 eration has almost exclusively focused on gen-002 erating factoid questions whose answers ca… (voir plus)n 003 be extracted from a single document. How-004 ever, there is an increasing interest in develop-005 ing systems that are capable of more complex 006 multi-hop question generation (QG), where an-007 swering the question requires reasoning over 008 multiple documents. In this work, we pro-009 pose a simple and effective approach based on 010 the transformer model for multi-hop QG. Our 011 approach consists of specialized input repre-012 sentations, a supporting sentence classification 013 objective, and training data weighting. Prior 014 work on multi-hop QG considers the simpli-015 fied setting of shorter documents and also ad-016 vocates the use of entity-based graph struc-017 tures as essential ingredients in model design. 018 On the contrary, we showcase that our model 019 can scale to the challenging setting of longer 020 documents as input, does not rely on graph 021 structures, and substantially outperforms the 022 state-of-the-art approaches as measured by au-023 tomated metrics and human evaluation. 024
Invariant Causal Prediction for Block MDPs
Clare Lyle
Angelos Filos
Marta Z. Kwiatkowska
Yarin Gal
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. … (voir plus)In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.
Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP
Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (voir plus)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
Miles Brundage
Shahar Avin
Haydn Belfield
Gretchen Krueger
Gillian K. Hadfield
Heidy Khlaaf
Jingying Yang
H. Toner
Ruth Catherine Fong
Pang Wei Koh
Sara Hooker
Jade Leung
Andrew John Trask
Emma Bluemke
Jonathan Lebensbold
Cullen C. O'keefe
Mark Koren
Th'eo Ryffel … (voir 39 de plus)
JB Rubinovitz
Tamay Besiroglu
Federica Carugati
Jack Clark
Peter Eckersley
Sarah de Haas
Maritza L. Johnson
Ben Laurie
Alex Ingerman
Igor Krawczuk
Amanda Askell
Rosario Cammarota
A. Lohn
Charlotte Stix
Peter Mark Henderson
Logan Graham
Carina E. A. Prunkl
Bianca Martin
Elizabeth Seger
Noa Zilberman
Sean O hEigeartaigh
Frens Kroeger
Girish Sastry
R. Kagan
Adrian Weller
Brian Shek-kam Tse
Elizabeth Barnes
Allan Dafoe
Paul D. Scharre
Ariel Herbert-Voss
Martijn Rasser
Carrick Flynn
Thomas Krendl Gilbert
Lisa Dyer
Saif M. Khan
Markus Anderljung
Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives
Reinforcement learning agents that operate in diverse and complex environments can benefit from the structured decomposition of their behavi… (voir plus)or. Often, this is addressed in the context of hierarchical reinforcement learning, where the aim is to decompose a policy into lower-level primitives or options, and a higher-level meta-policy that triggers the appropriate behaviors for a given situation. However, the meta-policy must still produce appropriate decisions in all states. In this work, we propose a policy design that decomposes into primitives, similarly to hierarchical reinforcement learning, but without a high-level meta-policy. Instead, each primitive can decide for themselves whether they wish to act in the current state. We use an information-theoretic mechanism for enabling this decentralized decision: each primitive chooses how much information it needs about the current state to make a decision and the primitive that requests the most information about the current state acts in the world. The primitives are regularized to use as little information as possible, which leads to natural competition and specialization. We experimentally demonstrate that this policy architecture improves over both flat and hierarchical policies in terms of generalization.
Toward Training Recurrent Neural Networks for Lifelong Learning
Learning Powerful Policies by Using Consistent Dynamics Model
Model-based Reinforcement Learning approaches have the promise of being sample efficient. Much of the progress in learning dynamics models i… (voir plus)n RL has been made by learning models via supervised learning. But traditional model-based approaches lead to `compounding errors' when the model is unrolled step by step. Essentially, the state transitions that the learner predicts (by unrolling the model for multiple steps) and the state transitions that the learner experiences (by acting in the environment) may not be consistent. There is enough evidence that humans build a model of the environment, not only by observing the environment but also by interacting with the environment. Interaction with the environment allows humans to carry out experiments: taking actions that help uncover true causal relationships which can be used for building better dynamics models. Analogously, we would expect such interactions to be helpful for a learning agent while learning to model the environment dynamics. In this paper, we build upon this intuition by using an auxiliary cost function to ensure consistency between what the agent observes (by acting in the real world) and what it imagines (by acting in the `learned' world). We consider several tasks - Mujoco based control tasks and Atari games - and show that the proposed approach helps to train powerful policies and better dynamics models.
Environments for Lifelong Reinforcement Learning
To achieve general artificial intelligence, reinforcement learning (RL) agents should learn not only to optimize returns for one specific ta… (voir plus)sk but also to constantly build more complex skills and scaffold their knowledge about the world, without forgetting what has already been learned. In this paper, we discuss the desired characteristics of environments that can support the training and evaluation of lifelong reinforcement learning agents, review existing environments from this perspective, and propose recommendations for devising suitable environments in the future.
On Training Recurrent Neural Networks for Lifelong Learning
Catastrophic forgetting and capacity saturation are the central challenges of any parametric lifelong learning system. In this work, we stud… (voir plus)y these challenges in the context of sequential supervised learning with emphasis on recurrent neural networks. To evaluate the models in the lifelong learning setting, we propose a curriculum-based, simple, and intuitive benchmark where the models are trained on tasks with increasing levels of difficulty. To measure the impact of catastrophic forgetting, the model is tested on all the previous tasks as it completes any task. As a step towards developing true lifelong learning systems, we unify Gradient Episodic Memory (a catastrophic forgetting alleviation approach) and Net2Net(a capacity expansion approach). Both these models are proposed in the context of feedforward networks and we evaluate the feasibility of using them for recurrent networks. Evaluation on the proposed benchmark shows that the unified model is more suitable than the constituent models for lifelong learning setting.