Portrait de Jackie Cheung

Jackie Cheung

Membre académique principal
Chaire en IA Canada-CIFAR
Directeur scientifique adjoint, Mila, Professeur agrégé, McGill University, École d'informatique
Chercheur consultant, Microsoft Research
Sujets de recherche
Apprentissage automatique médical
Apprentissage profond
Raisonnement
Traitement du langage naturel

Biographie

Je suis professeur agrégé à l'École d’informatique de l’Université McGill et chercheur consultant à Microsoft Research.

Mon groupe mène des recherches sur le traitement du langage naturel (NLP), un domaine de l'intelligence artificielle qui implique la construction de modèles informatiques de langages humains tels que l'anglais ou le français. Le but de nos recherches est de développer des méthodes informatiques de compréhension du texte et de la parole, afin de générer un langage fluide et adapté au contexte.

Dans notre laboratoire, nous étudions des techniques statistiques d’apprentissage automatique pour analyser et faire des prédictions sur la langue. Plusieurs projets en cours incluent la synthèse de fiction, l'extraction d'événements à partir d’un texte et l'adaptation de la langue à différents genres.

Étudiants actuels

Doctorat - McGill
Collaborateur·rice alumni - McGill
Collaborateur·rice de recherche
Collaborateur·rice de recherche
Collaborateur·rice alumni - McGill
Doctorat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Collaborateur·rice de recherche - Concordia University
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Collaborateur·rice de recherche - McGill University
Co-superviseur⋅e :
Maîtrise recherche - Paris-Saclay University
Superviseur⋅e principal⋅e :
Postdoctorat - École de technologie suprérieure
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Baccalauréat - McGill
Doctorat - McGill
Baccalauréat - McGill
Doctorat - McGill
Baccalauréat - McGill
Collaborateur·rice de recherche - McGill University
Co-superviseur⋅e :
Maîtrise recherche - McGill

Publications

Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
S. Prince
Yanshuai Cao
It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually… (voir plus) use shallow and simple additional layers on top of pre-trained models during fine-tuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to challenging tasks with small datasets, including Text-to-SQL semantic parsing and logical reading comprehension. In particular, we successfully train 48 layers of transformers, comprising 24 fine-tuned layers from pre-trained RoBERTa and 24 relation-aware layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state of the art performance on the challenging cross-domain Text-to-SQL parsing benchmark Spider. We achieve this by deriving a novel Data dependent Transformer Fixed-update initialization scheme (DT-Fixup), inspired by the prior T-Fixup work. Further error analysis shows that increasing depth can help improve generalization on small datasets for hard cases that require reasoning and structural understanding.
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Completion
Jiapeng Wu
Yingxue Zhang
Reasoning in a temporal knowledge graph (TKG) is a critical task for information retrieval and semantic search. It is particularly challengi… (voir plus)ng when the TKG is updated frequently. The model has to adapt to changes in the TKG for efficient training and inference while preserving its performance on historical knowledge. Recent work approaches TKG completion (TKGC) by augmenting the encoder-decoder framework with a time-aware encoding function. However, naively fine-tuning the model at every time step using these methods does not address the problems of 1) catastrophic forgetting, 2) the model's inability to identify the change of facts (e.g., the change of the political affiliation and end of a marriage), and 3) the lack of training efficiency. To address these challenges, we present the Time-aware Incremental Embedding (TIE) framework, which combines TKG representation learning, experience replay, and temporal regularization. We introduce a set of metrics that characterizes the intransigence of the model and propose a constraint that associates the deleted facts with negative labels. Experimental results on Wikidata12k and YAGO11k datasets demonstrate that the proposed TIE framework reduces training time by about ten times and improves on the proposed metrics compared to vanilla full-batch training. It comes without a significant loss in performance for any traditional measures. Extensive ablation studies reveal performance trade-offs among different evaluation metrics, which is essential for decision-making around real-world TKG applications.
Modeling Event Plausibility with Consistent Conceptual Abstraction
Kaheer Suleman
Adam Trischler
Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems
Matt Grenander
Robert Belfer
Ekaterina Kochmar
Iulian V. Serban
Franccois St-Hilaire
We explore creating automated, personalized feedback in an intelligent tutoring system (ITS). Our goal is to pinpoint correct and incorrect … (voir plus)concepts in student answers in order to achieve better student learning gains. Although automatic methods for providing personalized feedback exist, they do not explicitly inform students about which concepts in their answers are correct or incorrect. Our approach involves decomposing students answers using neural discourse segmentation and classification techniques. This decomposition yields a relational graph over all discourse units covered by the reference solutions and student answers. We use this inferred relational graph structure and a neural classifier to match student answers with reference solutions and generate personalized feedback. Although the process is completely automated and data-driven, the personalized feedback generated is highly contextual, domain-aware and effectively targets each student's misconceptions and knowledge gaps. We test our method in a dialogue-based ITS and demonstrate that our approach results in high-quality feedback and significantly improved student learning gains.
Characterizing Idioms: Conventionality and Contingency
Michaela Socolof
Michael Wagner
Idioms are unlike most phrases in two important ways. First, words in an idiom have non-canonical meanings. Second, the non-canonical meanin… (voir plus)gs of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define two measures that correspond to the properties above, and we show that idioms fall at the expected intersection of the two dimensions, but that the dimensions themselves are not correlated. Our results suggest that introducing special machinery to handle idioms may not be warranted.
Discourse-Aware Unsupervised Summarization for Long Scientific Documents
ADEPT: An Adjective-Dependent Plausibility Task
Kaheer Suleman
Adam Trischler
Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization
State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneficial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classification tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or fine-tuning step. 1
Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization
State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneficial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classification tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or fine-tuning step. 1
On-the-Fly Attention Modularization for Neural Generation
Chandra Bhagavatula
Ximing Lu
Jena D. Hwang
Antoine Bosselut
Yejin Choi
Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from de generation: generated… (voir plus) text is repetitive, generic, self-inconsistent, and lacking commonsense. The empirical analyses on sentence-level attention patterns reveal that neural text degeneration may be associated with insufficient learning of inductive biases by the attention mechanism. Our findings motivate on-the-fly attention modularization, a simple but effective method for injecting inductive biases into attention computation during inference. The resulting text produced by the language model with attention modularization can yield enhanced diversity and commonsense reasoning while maintaining fluency and coherence.
Post-Editing Extractive Summaries by Definiteness Prediction
Extractive summarization has been the main-stay of automatic summarization for decades. Despite all the progress, extractive summarizers sti… (voir plus)ll suffer from shortcomings including coreference issues arising from extracting sentences away from their original context in the source document. This affects the coherence and readability of extractive summaries. In this work, we propose a lightweight postediting step for extractive summaries that centers around a single linguistic decision: the definiteness of noun phrases. We conduct human evaluation studies that show that human expert judges substantially prefer the output of our proposed system over the original summaries. Moreover, based on an automatic evaluation study, we provide evidence for our system’s ability to generate linguistic decisions that lead to improved extractive summaries. We also draw insights about how the automatic system is exploiting some local cues related to the writing style of the main article texts or summary texts to make the decisions, rather than reasoning about the contexts pragmatically.
Textual Time Travel: A Temporally Informed Approach to Theory of Mind
Akshatha Arodi
Natural language processing systems such as dialogue agents should be able to reason about other people’s beliefs, intentions and desires.… (voir plus) This capability, called theory of mind (ToM), is crucial, as it allows a model to predict and interpret the needs of users based on their mental states. A recent line of research evaluates the ToM capability of existing memoryaugmented neural models through questionanswering. These models perform poorly on false belief tasks where beliefs differ from reality, especially when the dataset contains distracting sentences. In this paper, we propose a new temporally informed approach for improving the ToM capability of memory-augmented neural models. Our model incorporates priors about the entities’ minds and tracks their mental states as they evolve over time through an extended passage. It then responds to queries through textual time travel—i.e., by accessing the stored memory of an earlier time step. We evaluate our model on ToM datasets and find that this approach improves performance, particularly by correcting the predicted mental states to match the false belief.