Portrait de Jackie Cheung

Jackie Cheung

Membre académique principal
Chaire en IA Canada-CIFAR
Directeur scientifique adjoint, Mila, Professeur agrégé, McGill University, École d'informatique
Chercheur consultant, Microsoft Research
Sujets de recherche
Apprentissage automatique médical
Apprentissage profond
Raisonnement
Traitement du langage naturel

Biographie

Je suis professeur agrégé à l'École d’informatique de l’Université McGill et chercheur consultant à Microsoft Research.

Mon groupe mène des recherches sur le traitement du langage naturel (NLP), un domaine de l'intelligence artificielle qui implique la construction de modèles informatiques de langages humains tels que l'anglais ou le français. Le but de nos recherches est de développer des méthodes informatiques de compréhension du texte et de la parole, afin de générer un langage fluide et adapté au contexte.

Dans notre laboratoire, nous étudions des techniques statistiques d’apprentissage automatique pour analyser et faire des prédictions sur la langue. Plusieurs projets en cours incluent la synthèse de fiction, l'extraction d'événements à partir d’un texte et l'adaptation de la langue à différents genres.

Étudiants actuels

Doctorat - McGill
Co-superviseur⋅e :
Postdoctorat - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Doctorat - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Doctorat - McGill
Co-superviseur⋅e :
Postdoctorat - McGill
Maîtrise recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - McGill University
Stagiaire de recherche - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Doctorat - McGill
Maîtrise recherche - McGill
Doctorat - McGill
Baccalauréat - McGill
Doctorat - McGill
Stagiaire de recherche - McGill University
Stagiaire de recherche - McGill

Publications

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment
Zichao Li
Prakhar Sharma
Xing Han Lu
Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment
Zichao Li
Prakhar Sharma
Xing Han Lu
Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment.In this paper, we a… (voir plus)sk the question: Can we improve QA systems further post-deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system’s performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer.We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users. We collect this dataset by deploying a base QA system to crowdworkers who then engage with the system and provide feedback on the quality of its answers.The feedback contains both structured ratings and unstructured natural language explanations.We train a neural model with this feedback data that can generate explanations and re-score answer candidates. We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems. The generated explanations also help users make informed decisions about the correctness of answers.
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
Kushal Arora
Layla El Asri
Hareesh Bahuleyan
Current language generation models suffer from issues such as repetition, incoherence, and hallucinations. An often-repeated hypothesis for … (voir plus)this brittleness of generation models is that it is caused by the training and the generation procedure mismatch, also referred to as exposure bias. In this paper, we verify this hypothesis by analyzing exposure bias from an imitation learning perspective. We show that exposure bias leads to an accumulation of errors during generation, analyze why perplexity fails to capture this accumulation of errors, and empirically show that this accumulation results in poor generation quality.
Investigating the Performance of Transformer-Based NLI Models on Presuppositional Inferences
Jad Kabbara
Presuppositions are assumptions that are taken for granted by an utterance, and identifying them is key to a pragmatic interpretation of lan… (voir plus)guage. In this paper, we investigate the capabilities of transformer models to perform NLI on cases involving presupposition. First, we present simple heuristics to create alternative “contrastive” test cases based on the ImpPres dataset and investigate the model performance on those test cases. Second, to better understand how the model is making its predictions, we analyze samples from sub-datasets of ImpPres and examine model performance on them. Overall, our findings suggest that NLI-trained transformer models seem to be exploiting specific structural and lexical cues as opposed to performing some kind of pragmatic reasoning.
Learning with Rejection for Abstractive Text Summarization
Meng Cao
Yue Dong
Jingyi He
Question Personalization in an Intelligent Tutoring System
Sabina Elkins
Robert Belfer
Ekaterina Kochmar
Iulian V. Serban
Source-summary Entity Aggregation in Abstractive Summarization.
José-ángel González
Annie Priyadarshini Louis
Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge
Transformer models pre-trained with a masked-language-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behaviora… (voir plus)l probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the pre-training minibatches of BERT and evaluate how well the model generalizes to supported inferences after pre-training on the injected knowledge. We find generalization does not improve over the course of pre-training BERT from scratch, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.
The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution
Malik H. Altakrori
On-the-Fly Attention Modulation for Neural Generation
Yue Dong
Chandra Bhagavatula
Ximing Lu
Jena D. Hwang
Antoine Bosselut
Yejin Choi
Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from degeneration: the genera… (voir plus)ted text is repetitive, generic, self-contradictory, and often lacks commonsense. Our analyses on sentence-level attention patterns in LMs reveal that neural degeneration may be associated with insufficient learning of task-specific characteristics by the attention mechanism. This finding motivates on-the-fly attention modulation -- a simple but effective method that enables the injection of priors into attention computation during inference. Automatic and human evaluation results on three text generation benchmarks demonstrate that attention modulation helps LMs generate text with enhanced fluency, creativity, and commonsense reasoning, in addition to significantly reduce sentence-level repetition.
Optimizing Deeper Transformers on Small Datasets
Peng Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
S. Prince
Yanshuai Cao
It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually… (voir plus) use shallow and simple additional layers on top of pre-trained models during fine-tuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to challenging tasks with small datasets, including Text-to-SQL semantic parsing and logical reading comprehension. In particular, we successfully train 48 layers of transformers, comprising 24 fine-tuned layers from pre-trained RoBERTa and 24 relation-aware layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state of the art performance on the challenging cross-domain Text-to-SQL parsing benchmark Spider. We achieve this by deriving a novel Data dependent Transformer Fixed-update initialization scheme (DT-Fixup), inspired by the prior T-Fixup work. Further error analysis shows that increasing depth can help improve generalization on small datasets for hard cases that require reasoning and structural understanding.
TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Completion
Jiapeng Wu
Yishi Xu
Yingxue Zhang
Chen Ma
Reasoning in a temporal knowledge graph (TKG) is a critical task for information retrieval and semantic search. It is particularly challengi… (voir plus)ng when the TKG is updated frequently. The model has to adapt to changes in the TKG for efficient training and inference while preserving its performance on historical knowledge. Recent work approaches TKG completion (TKGC) by augmenting the encoder-decoder framework with a time-aware encoding function. However, naively fine-tuning the model at every time step using these methods does not address the problems of 1) catastrophic forgetting, 2) the model's inability to identify the change of facts (e.g., the change of the political affiliation and end of a marriage), and 3) the lack of training efficiency. To address these challenges, we present the Time-aware Incremental Embedding (TIE) framework, which combines TKG representation learning, experience replay, and temporal regularization. We introduce a set of metrics that characterizes the intransigence of the model and propose a constraint that associates the deleted facts with negative labels. Experimental results on Wikidata12k and YAGO11k datasets demonstrate that the proposed TIE framework reduces training time by about ten times and improves on the proposed metrics compared to vanilla full-batch training. It comes without a significant loss in performance for any traditional measures. Extensive ablation studies reveal performance trade-offs among different evaluation metrics, which is essential for decision-making around real-world TKG applications.