Jackie Cheung

Nadhem Benhadjali

Collaborateur·rice de recherche

Doctorat - McGill

Doctorat - McGill

Doctorat - McGill

Superviseur⋅e principal⋅e :

Maîtrise recherche - McGill

Site web

Maxime Darrin

Doctorat - McGill

Co-superviseur⋅e :

Doctorat - McGill

Aylin Erman

Doctorat - McGill

Co-superviseur⋅e :

Dan Poenaru

Ori Ernst

Postdoctorat - McGill

Maîtrise recherche - McGill

Site web

Google Scholar

Jules Gagnon-marchand

Maîtrise recherche - McGill

Sienna Hsu

Stagiaire de recherche - McGill University

Zichao Li

Doctorat - McGill

Superviseur⋅e principal⋅e :

Siva Reddy

Caleb Moses

Doctorat - McGill

Ian Porada

Doctorat - McGill

Doctorat - McGill

Baccalauréat - McGill

Cesare Spinoso-Di Piano

Doctorat - McGill

Sihui Wei

Baccalauréat - McGill

Xiyuan Zou

Maîtrise recherche - McGill

Publications

On-the-Fly Attention Modularization for Neural Generation

Yue Dong

Chandra Bhagavatula

Ximing Lu

Jena D. Hwang

Antoine Bosselut

Yejin Choi

Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from de generation: generated… (voir plus) text is repetitive, generic, self-inconsistent, and lacking commonsense. The empirical analyses on sentence-level attention patterns reveal that neural text degeneration may be associated with insufﬁcient learning of inductive biases by the attention mechanism. Our ﬁndings motivate on-the-ﬂy attention modularization, a simple but effective method for injecting inductive biases into attention computation during inference. The resulting text produced by the language model with attention modularization can yield enhanced diversity and commonsense reasoning while maintaining ﬂuency and coherence.

2021-01-01

arXiv.org (prépublication)

dblp.uni-trier.de

Post-Editing Extractive Summaries by Definiteness Prediction

Jad Kabbara

Extractive summarization has been the main-stay of automatic summarization for decades. Despite all the progress, extractive summarizers sti… (voir plus)ll suffer from shortcomings including coreference issues arising from extracting sentences away from their original context in the source document. This affects the coherence and readability of extractive summaries. In this work, we propose a lightweight postediting step for extractive summaries that centers around a single linguistic decision: the definiteness of noun phrases. We conduct human evaluation studies that show that human expert judges substantially prefer the output of our proposed system over the original summaries. Moreover, based on an automatic evaluation study, we provide evidence for our system’s ability to generate linguistic decisions that lead to improved extractive summaries. We also draw insights about how the automatic system is exploiting some local cues related to the writing style of the main article texts or summary texts to make the decisions, rather than reasoning about the contexts pragmatically.

2021-01-01

Conference on Empirical Methods in Natural Language Processing (publié)

Textual Time Travel: A Temporally Informed Approach to Theory of Mind

Akshatha Arodi

Natural language processing systems such as dialogue agents should be able to reason about other people’s beliefs, intentions and desires.… (voir plus) This capability, called theory of mind (ToM), is crucial, as it allows a model to predict and interpret the needs of users based on their mental states. A recent line of research evaluates the ToM capability of existing memoryaugmented neural models through questionanswering. These models perform poorly on false belief tasks where beliefs differ from reality, especially when the dataset contains distracting sentences. In this paper, we propose a new temporally informed approach for improving the ToM capability of memory-augmented neural models. Our model incorporates priors about the entities’ minds and tracks their mental states as they evolve over time through an extended passage. It then responds to queries through textual time travel—i.e., by accessing the stored memory of an earlier time step. We evaluate our model on ToM datasets and find that this approach improves performance, particularly by correcting the predicted mental states to match the false belief.

2021-01-01

Conference on Empirical Methods in Natural Language Processing (publié)

The Topic Confusion Task: A Novel Scenario for Authorship Attribution

Malik H. Altakrori

Benjamin Fung

Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researc… (voir plus)hers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by failure to capture authorship style, by the topic shift or by other factors. Motivated by this, we propose the topic confusion task, where we switch the author-topic conﬁg-uration between training and testing set. This setup allows us to probe errors in the attribution process. We investigate the accuracy and two error measures: one caused by the models’ confusion by the switch because the features capture the topics, and one caused by the features’ inability to capture the writing styles, leading to weaker models. By evaluating different features, we show that stylometric features with part-of-speech tags are less susceptible to topic variations and can increase the accuracy of the attribution process. We further show that combining them with word-level n - grams can outperform the state-of-the-art technique in the cross-topic scenario. Finally, we show that pretrained language models such as BERT and RoBERTa perform poorly on this task, and are outperformed by simple n -gram features.

2021-01-01

arXiv.org (prépublication)

dblp.uni-trier.de

An Analysis of Dataset Overlap on Winograd-Style Tasks

Ali Emami

Adam Trischler

Kaheer Suleman

The Winograd Schema Challenge (WSC) and variants inspired by it have become important benchmarks for common-sense reasoning (CSR). Model per… (voir plus)formance on the WSC has quickly progressed from chance-level to near-human using neural language models trained on massive corpora. In this paper, we analyze the effects of varying degrees of overlaps that occur between these corpora and the test instances in WSC-style tasks. We find that a large number of test instances overlap considerably with the pretraining corpora on which state-of-the-art models are trained, and that a significant drop in classification accuracy occurs when models are evaluated on instances with minimal overlap. Based on these results, we provide the WSC-Web dataset, consisting of over 60k pronoun disambiguation problems scraped from web data, being both the largest corpus to date, and having a significantly lower proportion of overlaps with current pretraining corpora.

2020-12-01

Proceedings of the 28th International Conference on Computational Linguistics (publié)

Learning Efficient Task-Specific Meta-Embeddings with Word Prisms

Jingyi He

Kc Tsiolis

Kian Kenyon-Dean

Word embeddings are trained to predict word cooccurrence statistics, which leads them to possess different lexical properties (syntactic, se… (voir plus)mantic, etc.) depending on the notion of context defined at training time. These properties manifest when querying the embedding space for the most similar vectors, and when used at the input layer of deep neural networks trained to solve downstream NLP problems. Meta-embeddings combine multiple sets of differently trained word embeddings, and have been shown to successfully improve intrinsic and extrinsic performance over equivalent models which use just one set of source embeddings. We introduce word prisms: a simple and efficient meta-embedding method that learns to combine source embeddings according to the task at hand. Word prisms learn orthogonal transformations to linearly combine the input source embeddings, which allows them to be very efficient at inference time. We evaluate word prisms in comparison to other meta-embedding methods on six extrinsic evaluations and observe that word prisms offer improvements in performance on all tasks.

2020-12-01

Proceedings of the 28th International Conference on Computational Linguistics (publié)

Learning Lexical Subspaces in a Distributional Vector Space

Kushal Arora

Aishik Chakraborty

Abstract In this paper, we propose LexSub, a novel approach towards unifying lexical and distributional semantics. We inject knowledge about… (voir plus) lexical-semantic relations into distributional word embeddings by defining subspaces of the distributional vector space in which a lexical relation should hold. Our framework can handle symmetric attract and repel relations (e.g., synonymy and antonymy, respectively), as well as asymmetric relations (e.g., hypernymy and meronomy). In a suite of intrinsic benchmarks, we show that our model outperforms previous approaches on relatedness tasks and on hypernymy classification and detection, while being competitive on word similarity tasks. It also outperforms previous systems on extrinsic classification tasks that benefit from exploiting lexical relational cues. We perform a series of analyses to understand the behaviors of our model.1 Code available at https://github.com/aishikchakraborty/LexSub.

2020-12-01

Transactions of the Association for Computational Linguistics (publié)

On Posterior Collapse and Encoder Feature Dispersion in Sequence VAEs.

Teng Long

Yanshuai Cao

Variational autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic … (voir plus)properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate into a language model. In this paper, we argue that posterior collapse is in part caused by the lack of dispersion in encoder features. We provide empirical evidence to verify this hypothesis, and propose a straightforward fix using pooling. This simple technique effectively prevents posterior collapse, allowing model to achieve significantly better data log-likelihood than standard sequence VAEs. Comparing to existing work, our proposed method is able to achieve comparable or superior performances while being more computationally efficient.

Deconstructing Word Embedding Algorithms

Kian Kenyon-Dean

Edward Daniel Newell

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (publié)

Factual Error Correction for Abstractive Summarization Models

Meng Cao

Yue Dong

Jiapeng Wu

Neural abstractive summarization systems have achieved promising progress, thanks to the availability of large-scale datasets and models pre… (voir plus)-trained with self-supervised methods. However, ensuring the factual consistency of the generated summaries for abstractive summarization systems is a challenge. We propose a post-editing corrector module to address this issue by identifying and correcting factual errors in generated summaries. The neural corrector model is pre-trained on artificial examples that are created by applying a series of heuristic transformations on reference summaries. These transformations are inspired by an error analysis of state-of-the-art summarization model outputs. Experimental results show that our model is able to correct factual errors in summaries generated by other neural summarization models and outperforms previous models on factual consistency evaluation on the CNN/DailyMail dataset. We also find that transferring from artificial error correction to downstream settings is still very challenging.

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (publié)

Multi-Fact Correction in Abstractive Text Summarization

Yue Dong

Shuohang Wang

Zhe Gan

Yu Cheng

Jingjing Liu

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (publié)

TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion

Jiapeng Wu

Meng Cao

William L. Hamilton

Inferring missing facts in temporal knowledge graphs (TKGs) is a fundamental and challenging task. Previous works have approached this probl… (voir plus)em by augmenting methods for static knowledge graphs to leverage time-dependent representations. However, these methods do not explicitly leverage multi-hop structural information and temporal facts from recent time steps to enhance their predictions. Additionally, prior work does not explicitly address the temporal sparsity and variability of entity distributions in TKGs. We propose the Temporal Message Passing (TeMP) framework to address these challenges by combining graph neural networks, temporal dynamics models, data imputation and frequency-based gating techniques. Experiments on standard TKG tasks show that our approach provides substantial gains compared to the previous state of the art, achieving a 10.7% average relative improvement in Hits@10 across three standard benchmarks. Our analysis also reveals important sources of variability both within and across TKG datasets, and we introduce several simple but strong baselines that outperform the prior state of the art in certain settings.

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (publié)