Jackie Cheung

Nadhem Benhadjali

Collaborating researcher

Meng (Caden) Cao

PhD - McGill University

Google Scholar

Aishik Chakraborty

PhD - McGill University

Khaoula Chehbouni

PhD - McGill University

Principal supervisor :

Master's Research - McGill University

Maxime Darrin

PhD - McGill University

Co-supervisor :

PhD - McGill University

Google Scholar

Aylin Erman

PhD - McGill University

Co-supervisor :

Dan Poenaru

Ori Ernst

Postdoctorate - McGill University

Master's Research - McGill University

Google Scholar

Jules Gagnon-marchand

Master's Research - McGill University

Sienna Hsu

Research Intern - McGill University University

Zichao Li

PhD - McGill University

Principal supervisor :

Siva Reddy

Caleb Moses

PhD - McGill University

Ian Porada

PhD - McGill University

PhD - McGill University

Sina Salmannia

Undergraduate - McGill University

Cesare Spinoso-Di Piano

PhD - McGill University

Sihui Wei

Undergraduate - McGill University

Xiyuan Zou

Master's Research - McGill University

Publications

Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

Ian Porada

Alessandro Sordoni

Transformer models pre-trained with a masked-language-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behaviora… (see more)l probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the pre-training minibatches of BERT and evaluate how well the model generalizes to supported inferences after pre-training on the injected knowledge. We find generalization does not improve over the course of pre-training BERT from scratch, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.

2021-12-16

ArXiv (preprint)

The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution

Malik H. Altakrori

Benjamin Fung

2021-11-01

Findings of the Association for Computational Linguistics: EMNLP 2021 (published)

On-the-Fly Attention Modulation for Neural Generation

Yue Dong

Chandra Bhagavatula

Ximing Lu

Jena D. Hwang

Antoine Bosselut

Yejin Choi

Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from degeneration: the genera… (see more)ted text is repetitive, generic, self-contradictory, and often lacks commonsense. Our analyses on sentence-level attention patterns in LMs reveal that neural degeneration may be associated with insufficient learning of task-specific characteristics by the attention mechanism. This finding motivates on-the-fly attention modulation -- a simple but effective method that enables the injection of priors into attention computation during inference. Automatic and human evaluation results on three text generation benchmarks demonstrate that attention modulation helps LMs generate text with enhanced fluency, creativity, and commonsense reasoning, in addition to significantly reduce sentence-level repetition.

2021-08-01

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (published)

Optimizing Deeper Transformers on Small Datasets

Peng Xu

Dhruv Kumar

Wei Yang

Wenjie Zi

Keyi Tang

Chenyang Huang

S. Prince

Yanshuai Cao

It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually… (see more) use shallow and simple additional layers on top of pre-trained models during fine-tuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to challenging tasks with small datasets, including Text-to-SQL semantic parsing and logical reading comprehension. In particular, we successfully train 48 layers of transformers, comprising 24 fine-tuned layers from pre-trained RoBERTa and 24 relation-aware layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state of the art performance on the challenging cross-domain Text-to-SQL parsing benchmark Spider. We achieve this by deriving a novel Data dependent Transformer Fixed-update initialization scheme (DT-Fixup), inspired by the prior T-Fixup work. Further error analysis shows that increasing depth can help improve generalization on small datasets for hard cases that require reasoning and structural understanding.

2021-08-01

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (published)

TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Completion

Jiapeng Wu

Yishi Xu

Yingxue Zhang

Chen Ma

Mark Coates

Reasoning in a temporal knowledge graph (TKG) is a critical task for information retrieval and semantic search. It is particularly challengi… (see more)ng when the TKG is updated frequently. The model has to adapt to changes in the TKG for efficient training and inference while preserving its performance on historical knowledge. Recent work approaches TKG completion (TKGC) by augmenting the encoder-decoder framework with a time-aware encoding function. However, naively fine-tuning the model at every time step using these methods does not address the problems of 1) catastrophic forgetting, 2) the model's inability to identify the change of facts (e.g., the change of the political affiliation and end of a marriage), and 3) the lack of training efficiency. To address these challenges, we present the Time-aware Incremental Embedding (TIE) framework, which combines TKG representation learning, experience replay, and temporal regularization. We introduce a set of metrics that characterizes the intransigence of the model and propose a constraint that associates the deleted facts with negative labels. Experimental results on Wikidata12k and YAGO11k datasets demonstrate that the proposed TIE framework reduces training time by about ten times and improves on the proposed metrics compared to vanilla full-batch training. It comes without a significant loss in performance for any traditional measures. Extensive ablation studies reveal performance trade-offs among different evaluation metrics, which is essential for decision-making around real-world TKG applications.

2021-07-11

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (published)

Modeling Event Plausibility with Consistent Conceptual Abstraction

Ian Porada

Kaheer Suleman

Adam Trischler

2021-06-01

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (published)

Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems

Matt Grenander

Robert Belfer

Ekaterina Kochmar

Iulian V. Serban

Franccois St-Hilaire

We explore creating automated, personalized feedback in an intelligent tutoring system (ITS). Our goal is to pinpoint correct and incorrect … (see more)concepts in student answers in order to achieve better student learning gains. Although automatic methods for providing personalized feedback exist, they do not explicitly inform students about which concepts in their answers are correct or incorrect. Our approach involves decomposing students answers using neural discourse segmentation and classification techniques. This decomposition yields a relational graph over all discourse units covered by the reference solutions and student answers. We use this inferred relational graph structure and a neural classifier to match student answers with reference solutions and generate personalized feedback. Although the process is completely automated and data-driven, the personalized feedback generated is highly contextual, domain-aware and effectively targets each student's misconceptions and knowledge gaps. We test our method in a dialogue-based ITS and demonstrate that our approach results in high-quality feedback and significantly improved student learning gains.

2021-05-18

Proceedings of the AAAI Conference on Artificial Intelligence (published)

Characterizing Idioms: Conventionality and Contingency

Michaela Socolof

Michael Wagner

Timothy O'Donnell

Idioms are unlike most phrases in two important ways. First, words in an idiom have non-canonical meanings. Second, the non-canonical meanin… (see more)gs of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define two measures that correspond to the properties above, and we show that idioms fall at the expected intersection of the two dimensions, but that the dimensions themselves are not correlated. Our results suggest that introducing special machinery to handle idioms may not be warranted.

2021-04-17

ArXiv (preprint)

Discourse-Aware Unsupervised Summarization for Long Scientific Documents

Yue Dong

Andrei Mircea

2021-04-01

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (published)

ADEPT: An Adjective-Dependent Plausibility Task

Ali Emami

Ian Porada

Alexandra Olteanu

Kaheer Suleman

Adam Trischler

2021-01-01

Annual Meeting of the Association for Computational Linguistics (published)

Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization

Meng Cao

Yue Dong

State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (see more)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneﬁcial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and ﬁnetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classiﬁcation tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or ﬁne-tuning step. 1

2021-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization

Meng Cao

Yue Dong