Portrait of Jackie Cheung

Jackie Cheung

Core Academic Member
Canada CIFAR AI Chair
Associate Scientific Director, Mila, Associate Professor, McGill University, School of Computer Science
Consultant Researcher, Microsoft Research
Research Topics
Deep Learning
Medical Machine Learning
Natural Language Processing
Reasoning

Biography

I am an associate professor in the School of Computer Science at McGill University and a consultant researcher at Microsoft Research.

My group investigates natural language processing, an area of AI research that builds computational models of human languages, such as English or French. The goal of our research is to develop computational methods for understanding text and speech in order to generate language that is fluent and context appropriate.

In our lab, we investigate statistical machine learning techniques for analyzing and making predictions about language. Some of my current projects focus on summarizing fiction, extracting events from text, and adapting language across genres.

Current Students

Postdoctorate - McGill University
PhD - McGill University
Co-supervisor :
Postdoctorate - McGill University
Research Intern - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
PhD - McGill University
Research Intern - McGill University
PhD - McGill University
Co-supervisor :
Master's Research - McGill University
PhD - McGill University
Co-supervisor :
Postdoctorate - McGill University
Master's Research - McGill University
Master's Research - McGill University
Research Intern - McGill University University
Research Intern - McGill University
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - McGill University
PhD - McGill University
Undergraduate - McGill University
PhD - McGill University
Research Intern - McGill University University
Research Intern - McGill University

Publications

Deconstructing and reconstructing word embedding algorithms
Edward Daniel Newell
Kian Kenyon-Dean
Uncontextualized word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applicati… (see more)ons. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the necessary and sufficient conditions required for making performant word embeddings. We find that each algorithm: (1) fits vector-covector dot products to approximate pointwise mutual information (PMI); and, (2) modulates the loss gradient to balance weak and strong signals. We demonstrate that these two algorithmic features are sufficient conditions to construct a novel word embedding algorithm, Hilbert-MLE. We find that its embeddings obtain equivalent or better performance against other algorithms across 17 intrinsic and extrinsic datasets.
Preventing Posterior Collapse in Sequence VAEs with Pooling
Teng Long
Yanshuai Cao
Variational Autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic … (see more)properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate into a language model. Previous works attempt to solve this problem with complex architectural changes or costly optimization schemes. In this paper, we argue that posterior collapse is caused in part by the encoder network failing to capture the input variabilities. We verify this hypothesis empirically and propose a straightforward fix using pooling. This simple technique effectively prevents posterior collapse, allowing the model to achieve significantly better data log-likelihood than standard sequence VAEs. Compared to the previous SOTA on preventing posterior collapse, we are able to achieve comparable performances while being significantly faster.
Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text
Ian Porada
Kaheer Suleman
Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses
Matt Grenander
Yue Dong
Annie Priyadarshini Louis
Sentence position is a strong feature for news summarization, since the lead often (but not always) summarizes the key points of the article… (see more). In this paper, we show that recent neural systems excessively exploit this trend, which although powerful for many inputs, is also detrimental when summarizing documents where important content should be extracted from later parts of the article. We propose two techniques to make systems sensitive to the importance of content in different parts of the article. The first technique employs ‘unbiased’ data; i.e., randomly shuffled sentences of the source document, to pretrain the model. The second technique uses an auxiliary ROUGE-based loss that encourages the model to distribute importance scores throughout a document by mimicking sentence-level ROUGE scores on the training data. We show that these techniques significantly improve the performance of a competitive reinforcement learning based extractive system, with the auxiliary loss being more powerful than pretraining.
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG
Paul Trichelair
Ali Emami
Adam Trischler
Kaheer Suleman
Recent studies have significantly improved the state-of-the-art on common-sense reasoning (CSR) benchmarks like the Winograd Schema Challeng… (see more)e (WSC) and SWAG. The question we ask in this paper is whether improved performance on these benchmarks represents genuine progress towards common-sense-enabled systems. We make case studies of both benchmarks and design protocols that clarify and qualify the results of previous work by analyzing threats to the validity of previous experimental designs. Our protocols account for several properties prevalent in common-sense benchmarks including size limitations, structural regularities, and variable instance difficulty.
Referring Expression Generation Using Entity Profiles
Meng Cao
Contextualized Non-local Neural Networks for Sequence Learning
Pengfei Liu
Shuaichen Chang
Xuanjing Huang
Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which selfattention, as exemplified by… (see more) the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood.Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users.
Generating Character Descriptions for Automatic Summarization of Fiction
Weiwei Zhang
J. Oren
Summaries of fictional stories allow readers to quickly decide whether or not a story catches their interest. A major challenge in automatic… (see more) summarization of fiction is the lack of standardized evaluation methodology or high-quality datasets for experimentation. In this work, we take a bottomup approach to this problem by assuming that story authors are uniquely qualified to inform such decisions. We collect a dataset of one million fiction stories with accompanying author-written summaries from Wattpad, an online story sharing platform. We identify commonly occurring summary components, of which a description of the main characters is the most frequent, and elicit descriptions of main characters directly from the authors for a sample of the stories. We propose two approaches to generate character descriptions, one based on ranking attributes found in the story text, the other based on classifying into a list of pre-defined attributes. We find that the classification-based approach performs the best in predicting character descriptions.
Learning Multi-Task Communication with Message Passing for Sequence Learning
Pengfei Liu
Jie Fu
Yue Dong
Xipeng Qiu
We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different ta… (see more)sks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks, and propose a general graph multi-task learning framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labelling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines, but also learn interpretable and transferable patterns across tasks.
A Cross-Domain Transferable Neural Coherence Model
Peng Xu
H. Saghir
Jin Sung Kang
Teng Long
Avishek Joey Bose
Yanshuai Cao
Coherence is an important aspect of text quality and is crucial for ensuring its readability. One important limitation of existing coherence… (see more) models is that training on one domain does not easily generalize to unseen categories of text. Previous work advocates for generative models for cross-domain generalization, because for discriminative models, the space of incoherent sentence orderings to discriminate against during training is prohibitively large. In this work, we propose a local discriminative neural model with a much smaller negative sampling space that can efficiently learn against incorrect orderings. The proposed coherence model is simple in structure, yet it significantly outperforms previous state-of-art methods on a standard benchmark dataset on the Wall Street Journal corpus, as well as in multiple new challenging settings of transfer to unseen categories of discourse on Wikipedia articles.
EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
Yue Dong
Zichao Li
Mehdi Rezagholizadeh
We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-inte… (see more)rpreter approach. Most current neural sentence simplification systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as a byproduct of the fact that they are trained on complex-simple sentence pairs. By contrast, our neural programmer-interpreter is directly trained to predict explicit edit operations on targeted parts of the input sentence, resembling the way that humans perform simplification and revision. Our model outperforms previous state-of-the-art neural sentence simplification models (without external knowledge) by large margins on three benchmark text simplification corpora in terms of SARI (+0.95 WikiLarge, +1.89 WikiSmall, +1.41 Newsela), and is judged by humans to produce overall better and simpler output sentences.
Understanding the Behaviour of Neural Abstractive Summarizers using Contrastive Examples
Krtin Kumar
Neural abstractive summarizers generate summary texts using a language model conditioned on the input source text, and have recently achieve… (see more)d high ROUGE scores on benchmark summarization datasets. We investigate how they achieve this performance with respect to human-written gold-standard abstracts, and whether the systems are able to understand deeper syntactic and semantic structures. We generate a set of contrastive summaries which are perturbed, deficient versions of human-written summaries, and test whether existing neural summarizers score them more highly than the human-written summaries. We analyze their performance on different datasets and find that these systems fail to understand the source text, in a majority of the cases.