Joelle Pineau

2021-01-01

ICLR (publié)

openreview.net

A Simple and Effective Model for Multi-Hop Question Generation

Jimmy Lei Ba

Jamie Ryan Kiros

Geoffrey E Hin-602

Peter W. Battaglia

Jessica Blake

Chandler Hamrick

Vic-613 tor Bapst

Alvaro Sanchez

Vinicius Zambaldi

M. Malinowski

Andrea Tacchetti

David Raposo

Tom B. Brown

Benjamin Mann

Nick Ryder

Melanie Subbiah

Jared Kaplan

Prafulla Dhariwal

Arvind Neelakantan

Pranav Shyam … (voir 72 de plus)

Girish Sastry

Koustuv Sinha

Shagun Sodhani

Jin Dong

William L. Hamilton

Clutrr

Nitish Srivastava

Geoffrey Hinton

Alex Krizhevsky

Ilya Sutskever

Ruslan Salakhutdinov. 2014

Gabriel Stanovsky

Julian Michael

Luke Zettlemoyer

Dan Su

Yan Xu

Wenliang Dai

Ziwei Ji

Tiezheng Yu

Minghao Tu

Kevin Huang

Guangtao Wang

Jing Huang

Ashish Vaswani

Noam M. Shazeer

Niki Parmar

Jakob Uszkoreit

Llion Jones

Aidan N. Gomez

Łukasz Kaiser

Illia Polosukhin. 2017

Attention

Petar Veliˇckovi´c

Guillem Cucurull

Arantxa Casanova

Adriana Romero Soriano

Pietro Lio’

Yoshua Bengio

Johannes Welbl

Pontus Stenetorp

Yonghui Wu

Mike Schuster

Quoc Zhifeng Chen

Mohammad Le

Wolfgang Norouzi

Macherey

M. Krikun

Yuan Cao

Qin Gao

William W. Cohen

Jianxing Yu

Xiaojun Quan

Qinliang Su

Jian Yin

Yuyu Zhang

Hanjun Dai

Zornitsa Kozareva

Chen Zhao

Chenyan Xiong

Corby Rosset

Xia

Paul Song

Bennett Saurabh

Tiwary

Yao Zhao

Xiaochuan Ni

Yuanyuan Ding

Qingyu Zhou

Nan Yang

Furu Wei

Chuanqi Tan

Previous research on automated question gen-001 eration has almost exclusively focused on gen-002 erating factoid questions whose answers ca… (voir plus)n 003 be extracted from a single document. How-004 ever, there is an increasing interest in develop-005 ing systems that are capable of more complex 006 multi-hop question generation (QG), where an-007 swering the question requires reasoning over 008 multiple documents. In this work, we pro-009 pose a simple and effective approach based on 010 the transformer model for multi-hop QG. Our 011 approach consists of specialized input repre-012 sentations, a supporting sentence classiﬁcation 013 objective, and training data weighting. Prior 014 work on multi-hop QG considers the simpli-015 ﬁed setting of shorter documents and also ad-016 vocates the use of entity-based graph struc-017 tures as essential ingredients in model design. 018 On the contrary, we showcase that our model 019 can scale to the challenging setting of longer 020 documents as input, does not rely on graph 021 structures, and substantially outperforms the 022 state-of-the-art approaches as measured by au-023 tomated metrics and human evaluation. 024

2021-01-01

(publié)

www.semanticscholar.org

Interference and Generalization in Temporal Difference Learning

Emmanuel Bengio

Doina Precup

We study the link between generalization and interference in temporal-difference (TD) learning. Interference is defined as the inner product… (voir plus) of two different gradients, representing their alignment. This quantity emerges as being of interest from a variety of observations about neural networks, parameter sharing and the dynamics of learning. We find that TD easily leads to low-interference, under-generalizing parameters, while the effect seems reversed in supervised learning. We hypothesize that the cause can be traced back to the interplay between the dynamics of interference and bootstrapping. This is supported empirically by several observations: the negative relationship between the generalization gap and interference in TD, the negative effect of bootstrapping on interference and the local coherence of targets, and the contrast between the propagation rate of information in TD(0) versus TD(

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

Invariant Causal Prediction for Block MDPs

Amy Zhang

Clare Lyle

Shagun Sodhani

Angelos Filos

Marta Z. Kwiatkowska

Yarin Gal

Doina Precup

Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges. … (voir plus)In this paper, we consider the problem of learning abstractions that generalize in block MDPs, families of environments with a shared latent state space and dynamics structure over that latent space, but varying observations. We leverage tools from causal inference to propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting. We prove that for certain classes of environments, this approach outputs with high probability a state abstraction corresponding to the causal feature set with respect to the return. We further provide more general bounds on model error and generalization error in the multi-environment setting, in the process showing a connection between causal variable selection and the state abstraction framework for MDPs. We give empirical evidence that our methods work in both linear and nonlinear settings, attaining improved generalization over single- and multi-task baselines.

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Iulian V. Serban

Chinnadhurai Sankar

Michael Pieper

Yoshua Bengio

Deep reinforcement learning has recently shown many impressive successes. However, one major obstacle towards applying such methods to real-… (voir plus)world problems is their lack of data-efficiency. To this end, we propose the Bottleneck Simulator: a model-based reinforcement learning method which combines a learned, factorized transition model of the environment with rollout simulations to learn an effective policy from few examples. The learned transition model employs an abstract, discrete (bottleneck) state, which increases sample efficiency by reducing the number of model parameters and by exploiting structural properties of the environment. We provide a mathematical analysis of the Bottleneck Simulator in terms of fixed points of the learned policy, which reveals how performance is affected by four distinct sources of error: an error related to the abstract space structure, an error related to the transition model estimation variance, an error related to the transition model estimation bias, and an error related to the transition model class bias. Finally, we evaluate the Bottleneck Simulator on two natural language processing tasks: a text adventure game and a real-world, complex dialogue response selection task. On both tasks, the Bottleneck Simulator yields excellent performance beating competing approaches.

2020-10-27

Journal of Artificial Intelligence Research (publié)

Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP

Amy Zhang

Shagun Sodhani

Khimya Khetarpal

Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (voir plus)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.

2020-07-14

ArXiv (prépublication)

Deep interpretability for GWAS

Deepak Sharma

Audrey Durand

Marc-André Legault

Louis-philippe Lemieux Perreault

Audrey Lemaccon

Marie-Pierre Dub'e

Genome-Wide Association Studies are typically conducted using linear models to find genetic variants associated with common diseases. In the… (voir plus)se studies, association testing is done on a variant-by-variant basis, possibly missing out on non-linear interaction effects between variants. Deep networks can be used to model these interactions, but they are difficult to train and interpret on large genetic datasets. We propose a method that uses the gradient based deep interpretability technique named DeepLIFT to show that known diabetes genetic risk factors can be identified using deep models along with possibly novel associations.

2020-07-03

ArXiv (prépublication)

Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks

Maxime Wabartha

Audrey Durand

Vincent Francois-Lavet

By virtue of their expressive power, neural networks (NNs) are well suited to fitting large, complex datasets, yet they are also known to … (voir plus)produce similar predictions for points outside the training distribution. As such, they are, like humans, under the influence of the Black Swan theory: models tend to be extremely "surprised" by rare events, leading to potentially disastrous consequences, while justifying these same events in hindsight. To avoid this pitfall, we introduce DENN, an ensemble approach building a set of Diversely Extrapolated Neural Networks that fits the training data and is able to generalize more diversely when extrapolating to novel data points. This leads DENN to output highly uncertain predictions for unexpected inputs. We achieve this by adding a diversity term in the loss function used to train the model, computed at specific inputs. We first illustrate the usefulness of the method on a low-dimensional regression problem. Then, we show how the loss can be adapted to tackle anomaly detection during classification, as well as safe imitation learning problems.

2020-07-01

International Joint Conference on Artificial Intelligence (publié)

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability (Extended Abstract)

Vincent Francois-Lavet

Guillaume Rabusseau

Damien Ernst

Raphael Fonteneau

When an agent has limited information on its environment, the suboptimality of an RL algorithm can be decomposed into the sum of two terms: … (voir plus)a term related to an asymptotic bias (suboptimality with unlimited data) and a term due to overfitting (additional suboptimality due to limited data). In the context of reinforcement learning with partial observability, this paper provides an analysis of the tradeoff between these two error sources. In particular, our theoretical analysis formally characterizes how a smaller state representation increases the asymptotic bias while decreasing the risk of overfitting.

2020-07-01

International Joint Conference on Artificial Intelligence (publié)

A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM

Iulian V. Serban

Varun Gupta

Ekaterina Kochmar

Dung D. Vu

Robert Belfer

2020-06-10

Artificial Intelligence in Education (publié)

Leveraging exploration in off-policy algorithms via normalizing flows

Thang Doan

Exploration is a crucial component for discovering approximately optimal policies in most high-dimensional reinforcement learning (RL) setti… (voir plus)ngs with sparse rewards. Approaches such as neural density models and continuous exploration (e.g., Go-Explore) have been instrumental in recent advances. Soft actor-critic (SAC) is a method for improving exploration that aims to combine off-policy updates while maximizing the policy entropy. We extend SAC to a richer class of probability distributions through normalizing flows, which we show improves performance in exploration, sample complexity, and convergence. Finally, we show that not only the normalizing flow policy outperforms SAC on MuJoCo domains, it is also significantly lighter, using as low as 5.6% of the original network's parameters for similar performance.

2020-05-12

Proceedings of the Conference on Robot Learning (publié)

proceedings.mlr.press

Literature Mining for Incorporating Inductive Bias in Biomedical Prediction Tasks (Student Abstract)

Qizhen Zhang

Audrey Durand

2020-04-03

AAAI Conference on Artificial Intelligence (publié)