Saizheng Zhang

William W Cohen

Russ Salakhutdinov

Transformers have been shown to be able to 001 perform deductive reasoning on a logical rule-002 base containing rules and statements writte… (see more)n 003 in natural language. Recent works show that 004 such models can also produce the reasoning 005 steps (i.e., the proof graph ) that emulate the 006 model’s logical reasoning process. But these 007 models behave as a black-box unit that emu-008 lates the reasoning process without any causal 009 constraints in the reasoning steps, thus ques-010 tioning the faithfulness. In this work, we frame 011 the deductive logical reasoning task as a causal 012 process by defining three modular components: 013 rule selection, fact selection, and knowledge 014 composition. The rule and fact selection steps 015 select the candidate rule and facts to be used 016 and then the knowledge composition combines 017 them to generate new inferences. This ensures 018 model faithfulness by assured causal relation 019 from the proof step to the inference reasoning. 020 To test our causal reasoning framework, we 021 propose C AUSAL R where the above three com-022 ponents are independently modeled by trans-023 formers. We observe that C AUSAL R is robust 024 to novel language perturbations, and is com-025 petitive with previous works on existing rea-026 soning datasets. Furthermore, the errors made 027 by C AUSAL R are more interpretable due to 028 the multi-modular approach compared to black-029 box generative models. 1 030

2020-12-31

(published)

www.semanticscholar.org

Explaining by Analogy: Case-based Abductive Natural Language Inference

Ruben Cartuyvels

Graham Spinks

Marie Francine

Peter Clark

Isaac Cowhey

Oren Etzioni

Tushar Khot

Rajarshi Das

Ameya Godbole

Shehzaad Dhuliawala

Manzil Zaheer

Andrew McCallum

Dung Ngoc Thai

Ameya

Ethan Godbole

Jay-Yoon Perez

Lee

Lizhen

Ramón López De Mántaras

David Mcsherry … (see 37 more)

David Bridge

Barry Leake

Susan Smyth

Craw.

Boi

Maryalice Faltings

Michael T Maher

Ken-552 Cox

Dorottya Demszky

Kelvin Guu

Percy Liang

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Daniel Fried

Peter Jansen

Gus Hahn-Powell

Higher-575

Rebecca Emilie Sharp

M. Surdeanu

Zhengnan Xie

Sebastian Thiem

Jaycie Ryrholm Martin

Eliz-721 abeth Wainwright

Steven Marmorstein

Wenhan Xiong

Xiang Lorraine Li

Srini Iyer

Jingfei Du

Vikas Yadav

Steven Bethard

Zhilin Yang

Peng Qi

William W Cohen

Russ Salakhutdinov

Existing accounts of explanation emphasise 001 the role of prior experience and analogy in 002 the solution of new problems. However, most 0… (see more)03 of the contemporary models for multi-hop tex-004 tual inference construct explanations consider-005 ing each test case in isolation. This paradigm 006 is known to suffer from semantic drift, which 007 causes the construction of spurious explana-008 tions leading to wrong predictions. In con-009 trast, we propose an abductive framework for 010 multi-hop inference that adopts the retrieve - 011 reuse - revise paradigm largely studied in case-012 based reasoning . Speciﬁcally, we present 013 ETNA ( E xplana t io n by A nalogy), a novel 014 model that addresses unseen inference prob-015 lems by retrieving and adapting prior expla-016 nations from similar training examples. We 017 empirically evaluate the case-based abductive 018 framework on downstream commonsense and 019 scientiﬁc reasoning tasks. Our experiments 020 demonstrate that ETNA can be effectively in-021 tegrated with sparse and dense encoding mech-022 anisms or downstream transformers, achiev-023 ing strong performance when compared to ex-024 isting explainable approaches. Moreover, we 025 study the impact of the retrieve - reuse - revise 026 paradigm on explainability and semantic drift, 027 showing that it boosts the quality of the con-028 structed explanations, resulting in improved 029 downstream inference performance. 030

2020-12-31

(published)

www.semanticscholar.org

What Makes Machine Reading Comprehension Questions Difﬁcult? Investigating Variation in Passage Sources and Question Types

Susan Bartlett

Grzegorz Kondrak

Max Bartolo

Alastair Roberts

Johannes Welbl

Steven Bird

Ewan Klein

Edward Loper

Samuel R. Bowman

George Dahl. 2021

What

Chao Pang

Junyuan Shang

Jiaxiang Liu

Xuyi Chen

Yanbin Zhao

Yuxiang Lu

Weixin Liu

Zhi-901 hua Wu

Weibao Gong … (see 21 more)

Jianzhong Liang

Zhizhou Shang

Peng Sun

Ouyang Xuan

Dianhai

Houwen Tian

Hua Wu

Haifeng Wang

Adam Trischler

Tong Wang

Xingdi Yuan

Justin Har-908

Alessandro Sordoni

Philip Bachman

Adina Williams

Nikita Nangia

Zhilin Yang

Peng Qi

ing. In

For a natural language understanding bench-001 mark to be useful in research, it has to con-002 sist of examples that are diverse and difﬁ… (see more)-003 cult enough to discriminate among current and 004 near-future state-of-the-art systems. However, 005 we do not yet know how best to select pas-006 sages to collect a variety of challenging exam-007 ples. In this study, we crowdsource multiple-008 choice reading comprehension questions for 009 passages taken from seven qualitatively dis-010 tinct sources, analyzing what attributes of pas-011 sages contribute to the difﬁculty and question 012 types of the collected examples. To our sur-013 prise, we ﬁnd that passage source, length, and 014 readability measures do not signiﬁcantly affect 015 question difﬁculty. Through our manual anno-016 tation of seven reasoning types, we observe 017 several trends between passage sources and 018 reasoning types, e.g., logical reasoning is more 019 often required in questions written for techni-020 cal passages. These results suggest that when 021 creating a new benchmark dataset, selecting a 022 diverse set of passages can help ensure a di-023 verse range of question types, but that passage 024 difﬁculty need not be a priority. 025

2020-12-31

(published)

www.semanticscholar.org

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Zhilin Yang

Peng Qi

William W. Cohen

Russ Salakhutdinov

Christopher D Manning

Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We int… (see more)roduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

2018-09-30

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (published)

Neural Models for Key Phrase Extraction and Question Generation

Sandeep Subramanian

Tong Wang

Xingdi Yuan

Adam Trischler

We propose a two-stage neural model to tackle question generation from documents. First, our model estimates the probability that word seque… (see more)nces in a document are ones that a human would pick when selecting candidate answers by training a neural key-phrase extractor on the answers in a question-answering corpus. Predicted key phrases then act as target answers and condition a sequence-to-sequence question-generation model with a copy mechanism. Empirically, our key-phrase extraction model significantly outperforms an entity-tagging baseline and existing rule-based approaches. We further demonstrate that our question generation system formulates fluent, answerable questions from key phrases. This two-stage system could be used to augment or generate reading comprehension datasets, which may be leveraged to improve machine reading systems or in educational settings.

2018-06-30

QA@ACL (published)

A Deep Reinforcement Learning Chatbot (Short Version)

Iulian V. Serban

Mathieu Germain

Michael Pieper

Nan Rosemary Ke

Sai Rajeswar

Alexandre De Brébisson

Jose M. R. Sotelo

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon … (see more)Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including neural network and template-based models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than other systems. The results highlight the potential of coupling ensemble systems with deep reinforcement learning as a fruitful path for developing real-world, open-domain conversational agents.

2017-12-31

arXiv (preprint)

A Deep Reinforcement Learning Chatbot

Iulian V. Serban

Mathieu Germain

Michael Pieper

Nan Rosemary Ke

Sai Mudumba

Alexandre De Brébisson

Jose M. R. Sotelo

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon … (see more)Alexa Prize competition. MILABOT is capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language generation and retrieval models, including template-based models, bag-of-words models, sequence-to-sequence neural network and latent variable neural network models. By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble. The system has been evaluated through A/B testing with real-world users, where it performed significantly better than many competing systems. Due to its machine learning architecture, the system is likely to improve with additional data.

2017-09-06

ArXiv (preprint)

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic fe… (see more)atures for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an end-to-end speech recognition system instead of hybrid settings. However, RNNs are computationally expensive and sometimes difficult to train. In this paper, inspired by the advantages of both CNNs and the CTC approach, we propose an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections. By evaluating the approach on the TIMIT phoneme recognition task, we show that the proposed model is not only computationally efficient, but also competitive with the existing baseline systems. Moreover, we argue that CNNs have the capability to model temporal correlations with appropriate context information.

2016-09-07

Interspeech 2016 (published)

Professor Forcing: A New Algorithm for Training Recurrent Networks

The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the networ… (see more)k’s own one-step-ahead predictions to do multi-step sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation. Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST. We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps. This is supported by human evaluation of sample quality. Trade-offs between Professor Forcing and Scheduled Sampling are discussed. We produce T-SNEs showing that Professor Forcing successfully makes the dynamics of the network during training and sampling more similar.

2015-12-31

Advances in Neural Information Processing Systems 29 (NIPS 2016) (published)

Nicolas Boulanger-Lewandowski

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-Rfou

Guillaume Alain

Amjad Almahairi

Christof Angermueller

Dzmitry Bahdanau

Nicolas Ballas

Frédéric Bastien

Justin Bayer

Anatoly Belikov

Alexander Belopolsky

Josh Bleecher Snyder

Xavier Bouthillier

Alexandre De Brébisson

Olivier Breuleux … (see 92 more)

Pierre-Luc Carrier

Paul Christiano

Myriam Côté

Yann N. Dauphin

Julien Demouth

Sander Dieleman

Samira Ebrahimi Kahou

Ziye Fan

Mathieu Germain

Matt Graham

Balázs Hidasi

Arjun Jain

Kai Jia

Mikhail Korobov

Vivek Kulkarni

Alex Lamb

Pascal Lamblin

Eric Larsen

César Laurent

Sean Lee

Simon Lefrancois

Simon Lemieux

Nicholas Léonard

Zhouhan Lin

Jesse A. Livezey

Cory Lorenz

Jeremiah Lowin

Qianli Ma

Pierre-Antoine Manzagol

Robert T. McGibbon

Mehdi Mirza

Alberto Orlandi

Christopher Pal

Razvan Pascanu

Mohammad Pezeshki

Colin Raffel

Daniel Renshaw

Matthew Rocklin

Adriana Romero

Markus Roth

Peter Sadowski

John Salvatier

François Savard

Jan Schlüter

John Schulman

Gabriel Schwartz

Iulian Vlad Serban

Dmitriy Serdyuk

Samira Shabanian

Etienne Simon

Sigurd Spieckermann

S. Ramana Subramanyam

Gijs van Tulder

Sebastian Urban

Dustin J. Webb

Matthew Willson

Lijun Xue

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (see more)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

2015-12-31

arXiv (preprint)