Portrait de Jackie Cheung

Jackie Cheung

Membre académique principal
Chaire en IA Canada-CIFAR
Directeur scientifique adjoint, Mila, Professeur agrégé, McGill University, École d'informatique
Chercheur consultant, Microsoft Research
Sujets de recherche
Apprentissage automatique médical
Apprentissage profond
Raisonnement
Traitement du langage naturel

Biographie

Je suis professeur agrégé à l'École d’informatique de l’Université McGill et chercheur consultant à Microsoft Research.

Mon groupe mène des recherches sur le traitement du langage naturel (NLP), un domaine de l'intelligence artificielle qui implique la construction de modèles informatiques de langages humains tels que l'anglais ou le français. Le but de nos recherches est de développer des méthodes informatiques de compréhension du texte et de la parole, afin de générer un langage fluide et adapté au contexte.

Dans notre laboratoire, nous étudions des techniques statistiques d’apprentissage automatique pour analyser et faire des prédictions sur la langue. Plusieurs projets en cours incluent la synthèse de fiction, l'extraction d'événements à partir d’un texte et l'adaptation de la langue à différents genres.

Étudiants actuels

Doctorat - McGill
Co-superviseur⋅e :
Postdoctorat - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Doctorat - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Doctorat - McGill
Co-superviseur⋅e :
Postdoctorat - McGill
Maîtrise recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - McGill University
Stagiaire de recherche - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Doctorat - McGill
Maîtrise recherche - McGill
Doctorat - McGill
Baccalauréat - McGill
Doctorat - McGill
Stagiaire de recherche - McGill University
Stagiaire de recherche - McGill

Publications

Modeling Event Plausibility with Consistent Conceptual Abstraction
Ian Porada
Kaheer Suleman
Adam Trischler
Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems
Matt Grenander
Robert Belfer
Ekaterina Kochmar
Iulian V. Serban
Franccois St-Hilaire
We explore creating automated, personalized feedback in an intelligent tutoring system (ITS). Our goal is to pinpoint correct and incorrect … (voir plus)concepts in student answers in order to achieve better student learning gains. Although automatic methods for providing personalized feedback exist, they do not explicitly inform students about which concepts in their answers are correct or incorrect. Our approach involves decomposing students answers using neural discourse segmentation and classification techniques. This decomposition yields a relational graph over all discourse units covered by the reference solutions and student answers. We use this inferred relational graph structure and a neural classifier to match student answers with reference solutions and generate personalized feedback. Although the process is completely automated and data-driven, the personalized feedback generated is highly contextual, domain-aware and effectively targets each student's misconceptions and knowledge gaps. We test our method in a dialogue-based ITS and demonstrate that our approach results in high-quality feedback and significantly improved student learning gains.
Characterizing Idioms: Conventionality and Contingency
Michaela Socolof
Michael Wagner
Idioms are unlike most phrases in two important ways. First, words in an idiom have non-canonical meanings. Second, the non-canonical meanin… (voir plus)gs of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define two measures that correspond to the properties above, and we show that idioms fall at the expected intersection of the two dimensions, but that the dimensions themselves are not correlated. Our results suggest that introducing special machinery to handle idioms may not be warranted.
Discourse-Aware Unsupervised Summarization for Long Scientific Documents
Yue Dong
Andrei Mircea
ADEPT: An Adjective-Dependent Plausibility Task
Ali Emami
Ian Porada
Kaheer Suleman
Adam Trischler
Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization
Meng Cao
Yue Dong
State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneficial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classification tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or fine-tuning step. 1
Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization
Meng Cao
Yue Dong
State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneficial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classification tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or fine-tuning step. 1
On-the-Fly Attention Modularization for Neural Generation
Yue Dong
Chandra Bhagavatula
Ximing Lu
Jena D. Hwang
Antoine Bosselut
Yejin Choi
Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from de generation: generated… (voir plus) text is repetitive, generic, self-inconsistent, and lacking commonsense. The empirical analyses on sentence-level attention patterns reveal that neural text degeneration may be associated with insufficient learning of inductive biases by the attention mechanism. Our findings motivate on-the-fly attention modularization, a simple but effective method for injecting inductive biases into attention computation during inference. The resulting text produced by the language model with attention modularization can yield enhanced diversity and commonsense reasoning while maintaining fluency and coherence.
Post-Editing Extractive Summaries by Definiteness Prediction
Jad Kabbara
Extractive summarization has been the main-stay of automatic summarization for decades. Despite all the progress, extractive summarizers sti… (voir plus)ll suffer from shortcomings including coreference issues arising from extracting sentences away from their original context in the source document. This affects the coherence and readability of extractive summaries. In this work, we propose a lightweight postediting step for extractive summaries that centers around a single linguistic decision: the definiteness of noun phrases. We conduct human evaluation studies that show that human expert judges substantially prefer the output of our proposed system over the original summaries. Moreover, based on an automatic evaluation study, we provide evidence for our system’s ability to generate linguistic decisions that lead to improved extractive summaries. We also draw insights about how the automatic system is exploiting some local cues related to the writing style of the main article texts or summary texts to make the decisions, rather than reasoning about the contexts pragmatically.
Textual Time Travel: A Temporally Informed Approach to Theory of Mind
Akshatha Arodi
Natural language processing systems such as dialogue agents should be able to reason about other people’s beliefs, intentions and desires.… (voir plus) This capability, called theory of mind (ToM), is crucial, as it allows a model to predict and interpret the needs of users based on their mental states. A recent line of research evaluates the ToM capability of existing memoryaugmented neural models through questionanswering. These models perform poorly on false belief tasks where beliefs differ from reality, especially when the dataset contains distracting sentences. In this paper, we propose a new temporally informed approach for improving the ToM capability of memory-augmented neural models. Our model incorporates priors about the entities’ minds and tracks their mental states as they evolve over time through an extended passage. It then responds to queries through textual time travel—i.e., by accessing the stored memory of an earlier time step. We evaluate our model on ToM datasets and find that this approach improves performance, particularly by correcting the predicted mental states to match the false belief.
The Topic Confusion Task: A Novel Scenario for Authorship Attribution
Malik H. Altakrori
Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researc… (voir plus)hers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by failure to capture authorship style, by the topic shift or by other factors. Motivated by this, we propose the topic confusion task, where we switch the author-topic config-uration between training and testing set. This setup allows us to probe errors in the attribution process. We investigate the accuracy and two error measures: one caused by the models’ confusion by the switch because the features capture the topics, and one caused by the features’ inability to capture the writing styles, leading to weaker models. By evaluating different features, we show that stylometric features with part-of-speech tags are less susceptible to topic variations and can increase the accuracy of the attribution process. We further show that combining them with word-level n - grams can outperform the state-of-the-art technique in the cross-topic scenario. Finally, we show that pretrained language models such as BERT and RoBERTa perform poorly on this task, and are outperformed by simple n -gram features.
An Analysis of Dataset Overlap on Winograd-Style Tasks
Ali Emami
Adam Trischler
Kaheer Suleman
The Winograd Schema Challenge (WSC) and variants inspired by it have become important benchmarks for common-sense reasoning (CSR). Model per… (voir plus)formance on the WSC has quickly progressed from chance-level to near-human using neural language models trained on massive corpora. In this paper, we analyze the effects of varying degrees of overlaps that occur between these corpora and the test instances in WSC-style tasks. We find that a large number of test instances overlap considerably with the pretraining corpora on which state-of-the-art models are trained, and that a significant drop in classification accuracy occurs when models are evaluated on instances with minimal overlap. Based on these results, we provide the WSC-Web dataset, consisting of over 60k pronoun disambiguation problems scraped from web data, being both the largest corpus to date, and having a significantly lower proportion of overlaps with current pretraining corpora.