Portrait of Jackie Cheung

Jackie Cheung

Core Academic Member
Canada CIFAR AI Chair
Associate Scientific Director, Mila, Associate Professor, McGill University, School of Computer Science
Consultant Researcher, Microsoft Research
Research Topics
Deep Learning
Medical Machine Learning
Natural Language Processing
Reasoning

Biography

I am an associate professor in the School of Computer Science at McGill University and a consultant researcher at Microsoft Research.

My group investigates natural language processing, an area of AI research that builds computational models of human languages, such as English or French. The goal of our research is to develop computational methods for understanding text and speech in order to generate language that is fluent and context appropriate.

In our lab, we investigate statistical machine learning techniques for analyzing and making predictions about language. Some of my current projects focus on summarizing fiction, extracting events from text, and adapting language across genres.

Current Students

PhD - McGill University
Collaborating Alumni - McGill University
PhD - McGill University
Collaborating researcher
Collaborating researcher
Collaborating Alumni - McGill University
PhD - McGill University
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Collaborating researcher - Concordia University University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
Postdoctorate - McGill University
Master's Research - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
PhD - McGill University
Undergraduate - McGill University
PhD - McGill University
Undergraduate - McGill University
Master's Research - McGill University

Publications

Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text
Kaheer Suleman
Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses
Matt Grenander
Yue Dong
Annie Priyadarshini Louis
Sentence position is a strong feature for news summarization, since the lead often (but not always) summarizes the key points of the article… (see more). In this paper, we show that recent neural systems excessively exploit this trend, which although powerful for many inputs, is also detrimental when summarizing documents where important content should be extracted from later parts of the article. We propose two techniques to make systems sensitive to the importance of content in different parts of the article. The first technique employs ‘unbiased’ data; i.e., randomly shuffled sentences of the source document, to pretrain the model. The second technique uses an auxiliary ROUGE-based loss that encourages the model to distribute importance scores throughout a document by mimicking sentence-level ROUGE scores on the training data. We show that these techniques significantly improve the performance of a competitive reinforcement learning based extractive system, with the auxiliary loss being more powerful than pretraining.
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG
Paul Trichelair
Ali Emami
Adam Trischler
Kaheer Suleman
Recent studies have significantly improved the state-of-the-art on common-sense reasoning (CSR) benchmarks like the Winograd Schema Challeng… (see more)e (WSC) and SWAG. The question we ask in this paper is whether improved performance on these benchmarks represents genuine progress towards common-sense-enabled systems. We make case studies of both benchmarks and design protocols that clarify and qualify the results of previous work by analyzing threats to the validity of previous experimental designs. Our protocols account for several properties prevalent in common-sense benchmarks including size limitations, structural regularities, and variable instance difficulty.
Referring Expression Generation Using Entity Profiles
Meng Cao
Contextualized Non-local Neural Networks for Sequence Learning
Pengfei Liu
Shuaichen Chang
Xuanjing Huang
Recently, a large number of neural mechanisms and models have been proposed for sequence learning, of which selfattention, as exemplified by… (see more) the Transformer model, and graph neural networks (GNNs) have attracted much attention. In this paper, we propose an approach that combines and draws on the complementary strengths of these two methods. Specifically, we propose contextualized non-local neural networks (CN3), which can both dynamically construct a task-specific structure of a sentence and leverage rich local dependencies within a particular neighbourhood.Experimental results on ten NLP tasks in text classification, semantic matching, and sequence labelling show that our proposed model outperforms competitive baselines and discovers task-specific dependency structures, thus providing better interpretability to users.
Generating Character Descriptions for Automatic Summarization of Fiction
Weiwei Zhang
J. Oren
Summaries of fictional stories allow readers to quickly decide whether or not a story catches their interest. A major challenge in automatic… (see more) summarization of fiction is the lack of standardized evaluation methodology or high-quality datasets for experimentation. In this work, we take a bottomup approach to this problem by assuming that story authors are uniquely qualified to inform such decisions. We collect a dataset of one million fiction stories with accompanying author-written summaries from Wattpad, an online story sharing platform. We identify commonly occurring summary components, of which a description of the main characters is the most frequent, and elicit descriptions of main characters directly from the authors for a sample of the stories. We propose two approaches to generate character descriptions, one based on ranking attributes found in the story text, the other based on classifying into a list of pre-defined attributes. We find that the classification-based approach performs the best in predicting character descriptions.
Learning Multi-Task Communication with Message Passing for Sequence Learning
Pengfei Liu
Jie Fu
Yue Dong
Xipeng Qiu
We present two architectures for multi-task learning with neural sequence models. Our approach allows the relationships between different ta… (see more)sks to be learned dynamically, rather than using an ad-hoc pre-defined structure as in previous work. We adopt the idea from message-passing graph neural networks, and propose a general graph multi-task learning framework in which different tasks can communicate with each other in an effective and interpretable way. We conduct extensive experiments in text classification and sequence labelling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines, but also learn interpretable and transferable patterns across tasks.
A Cross-Domain Transferable Neural Coherence Model
Peng Xu
H. Saghir
Jin Sung Kang
Teng Long
Yanshuai Cao
Coherence is an important aspect of text quality and is crucial for ensuring its readability. One important limitation of existing coherence… (see more) models is that training on one domain does not easily generalize to unseen categories of text. Previous work advocates for generative models for cross-domain generalization, because for discriminative models, the space of incoherent sentence orderings to discriminate against during training is prohibitively large. In this work, we propose a local discriminative neural model with a much smaller negative sampling space that can efficiently learn against incorrect orderings. The proposed coherence model is simple in structure, yet it significantly outperforms previous state-of-art methods on a standard benchmark dataset on the Wall Street Journal corpus, as well as in multiple new challenging settings of transfer to unseen categories of discourse on Wikipedia articles.
EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
Yue Dong
Mehdi Rezagholizadeh
We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-inte… (see more)rpreter approach. Most current neural sentence simplification systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as a byproduct of the fact that they are trained on complex-simple sentence pairs. By contrast, our neural programmer-interpreter is directly trained to predict explicit edit operations on targeted parts of the input sentence, resembling the way that humans perform simplification and revision. Our model outperforms previous state-of-the-art neural sentence simplification models (without external knowledge) by large margins on three benchmark text simplification corpora in terms of SARI (+0.95 WikiLarge, +1.89 WikiSmall, +1.41 Newsela), and is judged by humans to produce overall better and simpler output sentences.
Understanding the Behaviour of Neural Abstractive Summarizers using Contrastive Examples
Krtin Kumar
Neural abstractive summarizers generate summary texts using a language model conditioned on the input source text, and have recently achieve… (see more)d high ROUGE scores on benchmark summarization datasets. We investigate how they achieve this performance with respect to human-written gold-standard abstracts, and whether the systems are able to understand deeper syntactic and semantic structures. We generate a set of contrastive summaries which are perturbed, deficient versions of human-written summaries, and test whether existing neural summarizers score them more highly than the human-written summaries. We analyze their performance on different datasets and find that these systems fail to understand the source text, in a majority of the cases.
Unsupervised Controllable Text Generation with Global Variation Discovery and Disentanglement
Peng Xu
Yanshuai Cao
Existing controllable text generation systems rely on annotated attributes, which greatly limits their capabilities and applications. In thi… (see more)s work, we make the first successful attempt to use VAEs to achieve controllable text generation without supervision. We do so by decomposing the latent space of the VAE into two parts: one incorporates structural constraints to capture dominant global variations implicitly present in the data, e.g., sentiment or topic; the other is unstructured and is used for the reconstruction of the source sentences. With the enforced structural constraint, the underlying global variations will be discovered and disentangled during the training of the VAE. The structural constraint also provides a natural recipe for mitigating posterior collapse for the structured part, which cannot be fully resolved by the existing techniques. On the task of text style transfer, our unsupervised approach achieves significantly better performance than previous supervised approaches. By showcasing generation with finer-grained control including Cards-Against-Humanity-style topic transitions within a sentence, we demonstrate that our model can perform controlled text generation in a more flexible way than existing methods.
What comes next? Extractive summarization by next-sentence prediction
Jingyun Liu
Annie Priyadarshini Louis
Existing approaches to automatic summarization assume that a length limit for the summary is given, and view content selection as an optimiz… (see more)ation problem to maximize informativeness and minimize redundancy within this budget. This framework ignores the fact that human-written summaries have rich internal structure which can be exploited to train a summarization system. We present NEXTSUM, a novel approach to summarization based on a model that predicts the next sentence to include in the summary using not only the source article, but also the summary produced so far. We show that such a model successfully captures summary-specific discourse moves, and leads to better content selection performance, in addition to automatically predicting how long the target summary should be. We perform experiments on the New York Times Annotated Corpus of summaries, where NEXTSUM outperforms lead and content-model summarization baselines by significant margins. We also show that the lengths of summaries produced by our system correlates with the lengths of the human-written gold standards.