Jackie Cheung

Nadhem Benhadjali

Collaborating researcher

Meng (Caden) Cao

Collaborating Alumni - McGill University

Google Scholar

Aishik Chakraborty

PhD - McGill University

Khaoula Chehbouni

PhD - McGill University

Principal supervisor :

Master's Research - McGill University

Nelson Filipe Costa

Collaborating researcher - Concordia University University

Maxime Darrin

PhD - McGill University

Co-supervisor :

PhD - McGill University

Google Scholar

Aylin Erman

PhD - McGill University

Co-supervisor :

Dan Poenaru

Ori Ernst

Postdoctorate - McGill University

Master's Research - McGill University

Google Scholar

Jules Gagnon-marchand

Master's Research - McGill University

Zichao Li

PhD - McGill University

Principal supervisor :

Siva Reddy

Caleb Moses

PhD - McGill University

Ian Porada

PhD - McGill University

PhD - McGill University

Sina Salmannia

Undergraduate - McGill University

Cesare Spinoso-Di Piano

PhD - McGill University

Sihui Wei

Undergraduate - McGill University

Xiyuan Zou

Master's Research - McGill University

Publications

ADEPT: An Adjective-Dependent Plausibility Task

Ali Emami

Ian Porada

Alexandra Olteanu

Kaheer Suleman

Adam Trischler

2021-01-01

Annual Meeting of the Association for Computational Linguistics (published)

Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization

Meng Cao

Yue Dong

State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (see more)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneﬁcial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and ﬁnetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classiﬁcation tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or ﬁne-tuning step. 1

2021-01-01

arXiv.org (preprint)

Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization

Meng Cao

Yue Dong

2021-01-01

arXiv.org (preprint)

On-the-Fly Attention Modularization for Neural Generation

Yue Dong

Chandra Bhagavatula

Ximing Lu

Jena D. Hwang

Antoine Bosselut

Yejin Choi

Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from de generation: generated… (see more) text is repetitive, generic, self-inconsistent, and lacking commonsense. The empirical analyses on sentence-level attention patterns reveal that neural text degeneration may be associated with insufﬁcient learning of inductive biases by the attention mechanism. Our ﬁndings motivate on-the-ﬂy attention modularization, a simple but effective method for injecting inductive biases into attention computation during inference. The resulting text produced by the language model with attention modularization can yield enhanced diversity and commonsense reasoning while maintaining ﬂuency and coherence.

2021-01-01

arXiv.org (preprint)

Post-Editing Extractive Summaries by Definiteness Prediction

Jad Kabbara

Extractive summarization has been the main-stay of automatic summarization for decades. Despite all the progress, extractive summarizers sti… (see more)ll suffer from shortcomings including coreference issues arising from extracting sentences away from their original context in the source document. This affects the coherence and readability of extractive summaries. In this work, we propose a lightweight postediting step for extractive summaries that centers around a single linguistic decision: the definiteness of noun phrases. We conduct human evaluation studies that show that human expert judges substantially prefer the output of our proposed system over the original summaries. Moreover, based on an automatic evaluation study, we provide evidence for our system’s ability to generate linguistic decisions that lead to improved extractive summaries. We also draw insights about how the automatic system is exploiting some local cues related to the writing style of the main article texts or summary texts to make the decisions, rather than reasoning about the contexts pragmatically.

2021-01-01

Conference on Empirical Methods in Natural Language Processing (published)

Textual Time Travel: A Temporally Informed Approach to Theory of Mind

Akshatha Arodi

Natural language processing systems such as dialogue agents should be able to reason about other people’s beliefs, intentions and desires.… (see more) This capability, called theory of mind (ToM), is crucial, as it allows a model to predict and interpret the needs of users based on their mental states. A recent line of research evaluates the ToM capability of existing memoryaugmented neural models through questionanswering. These models perform poorly on false belief tasks where beliefs differ from reality, especially when the dataset contains distracting sentences. In this paper, we propose a new temporally informed approach for improving the ToM capability of memory-augmented neural models. Our model incorporates priors about the entities’ minds and tracks their mental states as they evolve over time through an extended passage. It then responds to queries through textual time travel—i.e., by accessing the stored memory of an earlier time step. We evaluate our model on ToM datasets and find that this approach improves performance, particularly by correcting the predicted mental states to match the false belief.

2021-01-01

Conference on Empirical Methods in Natural Language Processing (published)

The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Malik H. Altakrori

Benjamin Fung

Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researc… (see more)hers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by failure to capture authorship style, by the topic shift or by other factors. Motivated by this, we propose the topic confusion task, where we switch the author-topic conﬁg-uration between training and testing set. This setup allows us to probe errors in the attribution process. We investigate the accuracy and two error measures: one caused by the models’ confusion by the switch because the features capture the topics, and one caused by the features’ inability to capture the writing styles, leading to weaker models. By evaluating different features, we show that stylometric features with part-of-speech tags are less susceptible to topic variations and can increase the accuracy of the attribution process. We further show that combining them with word-level n - grams can outperform the state-of-the-art technique in the cross-topic scenario. Finally, we show that pretrained language models such as BERT and RoBERTa perform poorly on this task, and are outperformed by simple n -gram features.

2021-01-01

arXiv.org (preprint)

An Analysis of Dataset Overlap on Winograd-Style Tasks

Ali Emami

Adam Trischler

Kaheer Suleman

The Winograd Schema Challenge (WSC) and variants inspired by it have become important benchmarks for common-sense reasoning (CSR). Model per… (see more)formance on the WSC has quickly progressed from chance-level to near-human using neural language models trained on massive corpora. In this paper, we analyze the effects of varying degrees of overlaps that occur between these corpora and the test instances in WSC-style tasks. We find that a large number of test instances overlap considerably with the pretraining corpora on which state-of-the-art models are trained, and that a significant drop in classification accuracy occurs when models are evaluated on instances with minimal overlap. Based on these results, we provide the WSC-Web dataset, consisting of over 60k pronoun disambiguation problems scraped from web data, being both the largest corpus to date, and having a significantly lower proportion of overlaps with current pretraining corpora.

2020-12-01

Proceedings of the 28th International Conference on Computational Linguistics (published)

arxiv.org

Learning Efficient Task-Specific Meta-Embeddings with Word Prisms

Jingyi He

Kc Tsiolis

Kian Kenyon-Dean

Word embeddings are trained to predict word cooccurrence statistics, which leads them to possess different lexical properties (syntactic, se… (see more)mantic, etc.) depending on the notion of context defined at training time. These properties manifest when querying the embedding space for the most similar vectors, and when used at the input layer of deep neural networks trained to solve downstream NLP problems. Meta-embeddings combine multiple sets of differently trained word embeddings, and have been shown to successfully improve intrinsic and extrinsic performance over equivalent models which use just one set of source embeddings. We introduce word prisms: a simple and efficient meta-embedding method that learns to combine source embeddings according to the task at hand. Word prisms learn orthogonal transformations to linearly combine the input source embeddings, which allows them to be very efficient at inference time. We evaluate word prisms in comparison to other meta-embedding methods on six extrinsic evaluations and observe that word prisms offer improvements in performance on all tasks.

2020-12-01

Proceedings of the 28th International Conference on Computational Linguistics (published)

arxiv.org

Learning Lexical Subspaces in a Distributional Vector Space

Kushal Arora

Aishik Chakraborty

Abstract In this paper, we propose LexSub, a novel approach towards unifying lexical and distributional semantics. We inject knowledge about… (see more) lexical-semantic relations into distributional word embeddings by defining subspaces of the distributional vector space in which a lexical relation should hold. Our framework can handle symmetric attract and repel relations (e.g., synonymy and antonymy, respectively), as well as asymmetric relations (e.g., hypernymy and meronomy). In a suite of intrinsic benchmarks, we show that our model outperforms previous approaches on relatedness tasks and on hypernymy classification and detection, while being competitive on word similarity tasks. It also outperforms previous systems on extrinsic classification tasks that benefit from exploiting lexical relational cues. We perform a series of analyses to understand the behaviors of our model.1 Code available at https://github.com/aishikchakraborty/LexSub.

2020-12-01

Transactions of the Association for Computational Linguistics (published)

On Posterior Collapse and Encoder Feature Dispersion in Sequence VAEs.

Teng Long

Yanshuai Cao

Variational autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic … (see more)properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate into a language model. In this paper, we argue that posterior collapse is in part caused by the lack of dispersion in encoder features. We provide empirical evidence to verify this hypothesis, and propose a straightforward fix using pooling. This simple technique effectively prevents posterior collapse, allowing model to achieve significantly better data log-likelihood than standard sequence VAEs. Comparing to existing work, our proposed method is able to achieve comparable or superior performances while being more computationally efficient.

Deconstructing Word Embedding Algorithms

Kian Kenyon-Dean

Edward Daniel Newell

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)