Portrait of Jackie Cheung

Jackie Cheung

Core Academic Member
Canada CIFAR AI Chair
Associate Scientific Director, Mila, Associate Professor, McGill University, School of Computer Science
Consultant Researcher, Microsoft Research
Research Topics
Deep Learning
Medical Machine Learning
Natural Language Processing
Reasoning

Biography

I am an associate professor in the School of Computer Science at McGill University and a consultant researcher at Microsoft Research.

My group investigates natural language processing, an area of AI research that builds computational models of human languages, such as English or French. The goal of our research is to develop computational methods for understanding text and speech in order to generate language that is fluent and context appropriate.

In our lab, we investigate statistical machine learning techniques for analyzing and making predictions about language. Some of my current projects focus on summarizing fiction, extracting events from text, and adapting language across genres.

Current Students

PhD - McGill University
Collaborating Alumni - McGill University
PhD - McGill University
Collaborating researcher
Collaborating researcher
Collaborating Alumni - McGill University
PhD - McGill University
PhD - McGill University
Principal supervisor :
Master's Research - McGill University
Collaborating researcher - Concordia University University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Co-supervisor :
Postdoctorate - McGill University
Master's Research - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
PhD - McGill University
PhD - McGill University
Undergraduate - McGill University
PhD - McGill University
Undergraduate - McGill University
Master's Research - McGill University

Publications

On Posterior Collapse and Encoder Feature Dispersion in Sequence VAEs.
Teng Long
Yanshuai Cao
Variational autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic … (see more)properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate into a language model. In this paper, we argue that posterior collapse is in part caused by the lack of dispersion in encoder features. We provide empirical evidence to verify this hypothesis, and propose a straightforward fix using pooling. This simple technique effectively prevents posterior collapse, allowing model to achieve significantly better data log-likelihood than standard sequence VAEs. Comparing to existing work, our proposed method is able to achieve comparable or superior performances while being more computationally efficient.
Deconstructing Word Embedding Algorithms
Kian Kenyon-Dean
Edward Daniel Newell
Factual Error Correction for Abstractive Summarization Models
Meng Cao
Yue Dong
Jiapeng Wu
Neural abstractive summarization systems have achieved promising progress, thanks to the availability of large-scale datasets and models pre… (see more)-trained with self-supervised methods. However, ensuring the factual consistency of the generated summaries for abstractive summarization systems is a challenge. We propose a post-editing corrector module to address this issue by identifying and correcting factual errors in generated summaries. The neural corrector model is pre-trained on artificial examples that are created by applying a series of heuristic transformations on reference summaries. These transformations are inspired by an error analysis of state-of-the-art summarization model outputs. Experimental results show that our model is able to correct factual errors in summaries generated by other neural summarization models and outperforms previous models on factual consistency evaluation on the CNN/DailyMail dataset. We also find that transferring from artificial error correction to downstream settings is still very challenging.
Multi-Fact Correction in Abstractive Text Summarization
Yue Dong
Shuohang Wang
Zhe Gan
Yu Cheng
Jingjing Liu
TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion
Jiapeng Wu
Meng Cao
William L. Hamilton
Inferring missing facts in temporal knowledge graphs (TKGs) is a fundamental and challenging task. Previous works have approached this probl… (see more)em by augmenting methods for static knowledge graphs to leverage time-dependent representations. However, these methods do not explicitly leverage multi-hop structural information and temporal facts from recent time steps to enhance their predictions. Additionally, prior work does not explicitly address the temporal sparsity and variability of entity distributions in TKGs. We propose the Temporal Message Passing (TeMP) framework to address these challenges by combining graph neural networks, temporal dynamics models, data imputation and frequency-based gating techniques. Experiments on standard TKG tasks show that our approach provides substantial gains compared to the previous state of the art, achieving a 10.7% average relative improvement in Hits@10 across three standard benchmarks. Our analysis also reveals important sources of variability both within and across TKG datasets, and we introduce several simple but strong baselines that outperform the prior state of the art in certain settings.
TESA: A Task in Entity Semantic Aggregation for Abstractive Summarization
Clément Jumel
Annie Priyadarshini Louis
Human-written texts contain frequent generalizations and semantic aggregation of content. In a document, they may refer to a pair of named e… (see more)ntities such as ‘London’ and ‘Paris’ with different expressions: “the major cities”, “the capital cities” and “two European cities”. Yet generation, especially, abstractive summarization systems have so far focused heavily on paraphrasing and simplifying the source content, to the exclusion of such semantic abstraction capabilities. In this paper, we present a new dataset and task aimed at the semantic aggregation of entities. TESA contains a dataset of 5.3K crowd-sourced entity aggregations of Person, Organization, and Location named entities. The aggregations are document-appropriate, meaning that they are produced by annotators to match the situational context of a given news article from the New York Times. We then build baseline models for generating aggregations given a tuple of entities and document context. We finetune on TESA an encoder-decoder language model and compare it with simpler classification methods based on linguistically informed features. Our quantitative and qualitative evaluations show reasonable performance in making a choice from a given list of expressions, but free-form expressions are understandably harder to generate and evaluate.
HipoRank: Incorporating Hierarchical and Positional Information into Graph-based Unsupervised Long Document Extractive Summarization
We propose a novel graph-based ranking model for unsupervised extractive summarization of long documents. Graph-based ranking models typical… (see more)ly represent documents as undirected fully-connected graphs, where a node is a sentence, an edge is weighted based on sentence-pair similarity, and sentence importance is measured via node centrality. Our method leverages positional and hierarchical information grounded in discourse structure to augment a document's graph representation with hierarchy and directionality. Experimental results on PubMed and arXiv datasets show that our approach outperforms strong unsupervised baselines by wide margins and performs comparably to some of the state-of-the-art supervised models that are trained on hundreds of thousands of examples. In addition, we find that our method provides comparable improvements with various distributional sentence representations; including BERT and RoBERTa models fine-tuned on sentence similarity.
Investigating the Influence of Selected Linguistic Features on Authorship Attribution using German News Articles
Manuel Sage
Pietro Cruciata
Raed Abdo
Yaoyao Fiona Zhao
In this work, we perform authorship attri-bution on a new dataset of German news articles. We seek to classify over 3,700 articles to their … (see more)five corresponding authors, using four conventional machine learning approaches (na¨ıve Bayes, logistic regression, SVM and kNN) and a convolutional neural network. We analyze the effect of character and word n-grams on the prediction accuracy, as well as the influence of stop words, punctuation, numbers, and lowercasing when preprocessing raw text. The experiments show that higher order character n-grams (n = 5,6) perform better than lower orders and word n-grams slightly outperform those with characters. Combining both in fusion models further improves results up to 92% for SVM. A multilayer convolutional structure allows the CNN to achieve 90.5% accuracy. We found stop words and punctuation to be important features for author identification; removing them leads to a measurable decrease in performance. Finally, we evaluate the topic dependency of the algorithms by gradually replacing named entities, nouns, verbs and eventually all to-kens in the dataset according to their POS-tags.
On the Systematicity of Probing Contextualized Word Representations: The Case of Hypernymy in BERT.
Abhilasha Ravichander
Eduard Hovy
Kaheer Suleman
Adam Trischler
On Variational Learning of Controllable Representations for Text without Supervision
Peng Xu
Yanshuai Cao
The variational autoencoder (VAE) can learn the manifold of natural images on certain datasets, as evidenced by meaningful interpolating or … (see more)extrapolating in the continuous latent space. However, on discrete data such as text, it is unclear if unsupervised learning can discover similar latent space that allows controllable manipulation. In this work, we find that sequence VAEs trained on text fail to properly decode when the latent codes are manipulated, because the modified codes often land in holes or vacant regions in the aggregated posterior latent space, where the decoding network fails to generalize. Both as a validation of the explanation and as a fix to the problem, we propose to constrain the posterior mean to a learned probability simplex, and performs manipulation within this simplex. Our proposed method mitigates the latent vacancy problem and achieves the first success in unsupervised learning of controllable representations for text. Empirically, our method outperforms unsupervised baselines and strong supervised approaches on text style transfer, and is capable of performing more flexible fine-grained control over text generation than existing methods.
Deconstructing and reconstructing word embedding algorithms
Edward Daniel Newell
Kian Kenyon-Dean
Uncontextualized word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applicati… (see more)ons. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the necessary and sufficient conditions required for making performant word embeddings. We find that each algorithm: (1) fits vector-covector dot products to approximate pointwise mutual information (PMI); and, (2) modulates the loss gradient to balance weak and strong signals. We demonstrate that these two algorithmic features are sufficient conditions to construct a novel word embedding algorithm, Hilbert-MLE. We find that its embeddings obtain equivalent or better performance against other algorithms across 17 intrinsic and extrinsic datasets.
Preventing Posterior Collapse in Sequence VAEs with Pooling
Teng Long
Yanshuai Cao
Variational Autoencoders (VAEs) hold great potential for modelling text, as they could in theory separate high-level semantic and syntactic … (see more)properties from local regularities of natural language. Practically, however, VAEs with autoregressive decoders often suffer from posterior collapse, a phenomenon where the model learns to ignore the latent variables, causing the sequence VAE to degenerate into a language model. Previous works attempt to solve this problem with complex architectural changes or costly optimization schemes. In this paper, we argue that posterior collapse is caused in part by the encoder network failing to capture the input variabilities. We verify this hypothesis empirically and propose a straightforward fix using pooling. This simple technique effectively prevents posterior collapse, allowing the model to achieve significantly better data log-likelihood than standard sequence VAEs. Compared to the previous SOTA on preventing posterior collapse, we are able to achieve comparable performances while being significantly faster.