Siva Reddy

Arkil Patel

Doctorat - McGill University

Superviseur⋅e principal⋅e :

Dzmitry Bahdanau

arkil.patel@mila.quebec

Doctorat - McGill University

benno.krojer@mila.quebec

gaurav.kamath@mila.quebec

Gaurav Kamath

Doctorat - McGill University

Karolina Ewa Stańczak

Postdoctorat - McGill University

karolina.stanczak@mila.quebec

Doctorat - McGill University

laurestine.bradford@mila.quebec

Marius Mosbach

Postdoctorat - McGill University

marius.mosbach@mila.quebec

nicholas.meade@mila.quebec

Nicholas Meade

Doctorat - McGill University

Github

parishad.behnamghader@mila.quebec

Parishad BehnamGhader

Maîtrise recherche - McGill University

Collaborateur·rice de recherche

spandana.gella@mila.quebec

vaibhav.adlakha@mila.quebec

Vaibhav Adlakha

Doctorat - McGill University

Xing Han Lu

Doctorat - McGill University

Doctorat - None

zdenek.kasner@mila.quebec

Zichao Li

Doctorat - McGill University

Co-superviseur⋅e :

Jackie Cheung

zichao.li@mila.quebec

Publications

Words Aren’t Enough, Their Order Matters: On the Robustness of Grounding Visual Referring Expressions

Arjun Reddy Akula

Spandana Gella

Yaser Al-Onaizan

Song-Chun Zhu

Visual referring expression recognition is a challenging task that requires natural language understanding in the context of an image. We cr… (voir plus)itically examine RefCOCOg, a standard benchmark for this task, using a human study and show that 83.7% of test instances do not require reasoning on linguistic structure, i.e., words are enough to identify the target object, the word order doesn’t matter. To measure the true progress of existing models, we split the test set into two sets, one which requires reasoning on linguistic structure and the other which doesn’t. Additionally, we create an out-of-distribution dataset Ref-Adv by asking crowdworkers to perturb in-domain examples such that the target object changes. Using these datasets, we empirically show that existing methods fail to exploit linguistic structure and are 12% to 23% lower in performance than the established progress for this task. We also propose two methods, one based on contrastive learning and the other based on multi-task learning, to increase the robustness of ViLBERT, the current state-of-the-art model for this task. Our datasets are publicly available at https://github.com/aws/aws-refcocog-adv.

2020-07-01

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (publié)

Measuring Systematic Generalization in Neural Proof Generation with Transformers

Nicolas Gontier

Koustuv Sinha

Chris Pal

We are interested in understanding how well Transformer language models (TLMs) can perform reasoning tasks when trained on knowledge encoded… (voir plus) in the form of natural language. We investigate their systematic generalization abilities on a logical reasoning task in natural language, which involves reasoning over relationships between entities grounded in first-order logical proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to generate natural language proofs. We test the generated proofs for logical consistency, along with the accuracy of the final inference. We observe length-generalization issues when evaluated on longer-than-trained sequences. However, we observe TLMs improve their generalization performance after being exposed to longer, exhaustive proofs. In addition, we discover that TLMs are able to generalize better using backward-chaining proofs compared to their forward-chaining counterparts, while they find it easier to generate forward chaining proofs. We observe that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs. This suggests that Transformers have efficient internal reasoning strategies that are harder to interpret. These results highlight the systematic generalization behavior of TLMs in the context of logical reasoning, and we believe this work motivates deeper inspection of their underlying reasoning strategies.

You could have said that instead: Improving Chatbots with Natural Language Feedback

Makesh Narsimhan Sreedhar

Kun Ni

The ubiquitous nature of dialogue systems and their interaction with users generate an enormous amount of data. Can we improve chatbots usin… (voir plus)g this data? A self-feeding chatbot improves itself by asking natural language feedback when a user is dissatisfied with its response and uses this feedback as an additional training sample. However, user feedback in most cases contains extraneous sequences hindering their usefulness as a training sample. In this work, we propose a generative adversarial model that converts noisy feedback into a plausible natural response in a conversation. The generator’s goal is to convert the feedback into a response that answers the user’s previous utterance and to fool the discriminator which distinguishes feedback from natural responses. We show that augmenting original training data with these modified feedback responses improves the original chatbot performance from 69.94%to 75.96% in ranking correct responses on the PERSONACHATdataset, a large improvement given that the original model is already trained on 131k samples.

2020-01-01

Conference on Empirical Methods in Natural Language Processing (publié)

CoQA: A Conversational Question Answering Challenge

Danqi Chen

Christopher D. Manning

Humans gather information through conversations involving a series of interconnected questions and answers. For machines to assist in inform… (voir plus)ation gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning). We evaluate strong dialogue and reading comprehension models on CoQA. The best system obtains an F1 score of 65.4%, which is 23.4 points behind human performance (88.8%), indicating that there is ample room for improvement. We present CoQA as a challenge to the community at https://stanfordnlp.github.io/coqa.

2019-11-01

Transactions of the Association for Computational Linguistics (publié)

Building a Neural Semantic Parser from a Domain Ontology

Jianpeng Cheng

Mirella Lapata

Semantic parsing is the task of converting natural language utterances into machine interpretable meaning representations which can be execu… (voir plus)ted against a real-world environment such as a database. Scaling semantic parsing to arbitrary domains faces two interrelated challenges: obtaining broad coverage training data effectively and cheaply; and developing a model that generalizes to compositional utterances and complex intentions. We address these challenges with a framework which allows to elicit training data from a domain ontology and bootstrap a neural parser which recursively builds derivations of logical forms. In our framework meaning representations are described by sequences of natural language templates, where each template corresponds to a decomposed fragment of the underlying meaning representation. Although artificial, templates can be understood and paraphrased by humans to create natural utterances, resulting in parallel triples of utterances, meaning representations, and their decompositions. These allow us to train a neural semantic parser which learns to compose rules in deriving meaning representations. We crowdsource training data on six domains, covering both single-turn utterances which exhibit rich compositionality, and sequential utterances where a complex task is procedurally performed in steps. We then develop neural semantic parsers which perform such compositional tasks. In general, our approach allows to deploy neural semantic parsers quickly and cheaply from a given domain ontology.

2018-12-25

ArXiv (prépublication)

Learning Typed Entailment Graphs with Global Soft Constraints

Mohammad Javad Hosseini

Nathanael Chambers

Xavier R. Holt

Shay B. Cohen

Mark Johnson

Mark Steedman

This paper presents a new method for learning typed entailment graphs from text. We extract predicate-argument structures from multiple-sour… (voir plus)ce news corpora, and compute local distributional similarity scores to learn entailments between predicates with typed arguments (e.g., person contracted disease). Previous work has used transitivity constraints to improve local decisions, but these constraints are intractable on large graphs. We instead propose a scalable method that learns globally consistent similarity scores based on new soft constraints that consider both the structures across typed entailment graphs and inside each graph. Learning takes only a few hours to run over 100K predicates and our results show large improvements over local similarity scores on two entailment data sets. We further show improvements over paraphrases and entailments from the Paraphrase Database, and prior state-of-the-art entailment graphs. We show that the entailment graphs improve performance in a downstream task.

2018-12-01

Transactions of the Association for Computational Linguistics (publié)