Jackie Cheung

antoine.dangeard@mila.quebec

Antoine Dangeard

Research Intern - McGill University

cesare.spinoso@mila.quebec

Aylin Erman

Master's Research - McGill University

Co-supervisor :

Dan Poenaru

aylin.erman@mila.quebec

Caleb Moses

PhD - McGill University

caleb.moses@mila.quebec

Cesare Spinoso-Di Piano

PhD - McGill University

Bonaventure Dossou

PhD - McGill University

bonaventure.dossou@mila.quebec

Google Scholar

Haowei Qiu

Professional Master's - McGill University

haowei.qiu@mila.quebec

Ian Porada

PhD - McGill University

Ines Arous

Postdoctorate - McGill University

ines.arous@mila.quebec

Google Scholar

Jules Gagnon-marchand

Master's Research - McGill University

gagnonju@mila.quebec

maxime.darrin@mila.quebec

Kushal Arora

PhD - McGill University

Co-supervisor :

Master's Research - McGill University

martin.pomsl@mila.quebec

PhD - McGill University

Co-supervisor :

Pablo Piantanida

PhD - McGill University

PhD - McGill University

michael.runningwolf@mila.quebec

Nathan Zeweniuk

Research Intern - McGill University University

nathan.zeweniuk@mila.quebec

Ori Ernst

Postdoctorate - McGill University

ori.ernst@mila.quebec

Postdoctorate - McGill University

rahul.aralikatte@mila.quebec

Research Intern - McGill University

steven.koniaev@mila.quebec

Xiyuan Zou

Research Intern - McGill University

xiyuan.zou@mila.quebec

Yu Bai

Research Intern - McGill University

Yu Lu Liu

Master's Research - McGill University

yu-lu.liu@mila.quebec

Zichao Li

PhD - McGill University

Principal supervisor :

Siva Reddy

zichao.li@mila.quebec

Ziling Cheng

Research Intern - McGill University

ziling.cheng@mila.quebec

Publications

The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources

Akshatha Arodi

Martin Pömsl

Kaheer Suleman

Adam Trischler

Alexandra Olteanu

Many state-of-the-art natural language understanding (NLU) models are based on pretrained neural language models. These models often make in… (see more)ferences using information from multiple sources. An important class of such inferences are those that require both background knowledge, presumably contained in a model’s pretrained parameters, and instance-specific information that is supplied at inference time. However, the integration and reasoning abilities of NLU models in the presence of multiple knowledge sources have been largely understudied. In this work, we propose a test suite of coreference resolution subtasks that require reasoning over multiple facts. These subtasks differ in terms of which knowledge sources contain the relevant facts. We also introduce subtasks where knowledge is present only at inference time using fictional knowledge. We evaluate state-of-the-art coreference resolution models on our dataset. Our results indicate that several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time. However, with task-specific training, a subset of models demonstrates the ability to integrate certain knowledge types from multiple sources. Still, even the best performing models seem to have difficulties with reliably integrating knowledge presented only at inference time.

2022-12-15

ArXiv (preprint)

A Multifaceted Framework to Evaluate Evasion, Content Preservation, and Misattribution in Authorship Obfuscation Techniques

Malik H. Altakrori

Thomas Scialom

Benjamin Fung

2022-12-01

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (published)

Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

Ian Porada

Alessandro Sordoni

2022-07-01

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (published)

MaskEval: Weighted MLM-Based Evaluation for Text Summarization and Simplification

Yu Lu Liu

Rachel Bawden

Thomas Scaliom

Benoı̂t Sagot

2022-05-24

ArXiv (preprint)

Characterizing Idioms: Conventionality and Contingency

Michaela Socolof

Michael Wagner

Timothy O'Donnell

Idioms are unlike most phrases in two important ways. First, words in an idiom have non-canonical meanings. Second, the non-canonical meanin… (see more)gs of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define two measures that correspond to the properties above, and we show that idioms fall at the expected intersection of the two dimensions, but that the dimensions themselves are not correlated. Our results suggest that introducing special machinery to handle idioms may not be warranted.

2022-05-01

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization

Meng Cao

Yue Dong

2022-05-01

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

Zichao Li

Prakhar Sharma

Xing Han Lu

Siva Reddy

2022-05-01

Findings of the Association for Computational Linguistics: ACL 2022 (published)

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

Zichao Li

Prakhar Sharma

Xing Han Lu

Siva Reddy

Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment.In this paper, we a… (see more)sk the question: Can we improve QA systems further post-deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system’s performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer.We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users. We collect this dataset by deploying a base QA system to crowdworkers who then engage with the system and provide feedback on the quality of its answers.The feedback contains both structured ratings and unstructured natural language explanations.We train a neural model with this feedback data that can generate explanations and re-score answer candidates. We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems. The generated explanations also help users make informed decisions about the correctness of answers.

2022-04-06

ArXiv (preprint)

Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation

Kushal Arora

Layla El Asri

Hareesh Bahuleyan

Current language generation models suffer from issues such as repetition, incoherence, and hallucinations. An often-repeated hypothesis for … (see more)this brittleness of generation models is that it is caused by the training and the generation procedure mismatch, also referred to as exposure bias. In this paper, we verify this hypothesis by analyzing exposure bias from an imitation learning perspective. We show that exposure bias leads to an accumulation of errors during generation, analyze why perplexity fails to capture this accumulation of errors, and empirically show that this accumulation results in poor generation quality.

2022-04-03

ArXiv (preprint)

Investigating the Performance of Transformer-Based NLI Models on Presuppositional Inferences

Jad Kabbara

Presuppositions are assumptions that are taken for granted by an utterance, and identifying them is key to a pragmatic interpretation of lan… (see more)guage. In this paper, we investigate the capabilities of transformer models to perform NLI on cases involving presupposition. First, we present simple heuristics to create alternative “contrastive” test cases based on the ImpPres dataset and investigate the model performance on those test cases. Second, to better understand how the model is making its predictions, we analyze samples from sub-datasets of ImpPres and examine model performance on them. Overall, our findings suggest that NLI-trained transformer models seem to be exploiting specific structural and lexical cues as opposed to performing some kind of pragmatic reasoning.

2022-01-01

COLING (published)

dblp.uni-trier.de

Learning with Rejection for Abstractive Text Summarization

Meng Cao

Yue Dong

Jingyi He

2022-01-01

EMNLP (published)

Question Personalization in an Intelligent Tutoring System

Sabina Elkins

Robert Belfer

Ekaterina Kochmar

Iulian V. Serban

2022-01-01

AIED (2) (published)