Portrait de Jackie Cheung

Jackie Cheung

Membre académique principal
Chaire en IA Canada-CIFAR
Directeur scientifique adjoint, Mila, Professeur agrégé, McGill University, École d'informatique
Chercheur consultant, Microsoft Research
Sujets de recherche
Apprentissage automatique médical
Apprentissage profond
Raisonnement
Traitement du langage naturel

Biographie

Je suis professeur agrégé à l'École d’informatique de l’Université McGill et chercheur consultant à Microsoft Research.

Mon groupe mène des recherches sur le traitement du langage naturel (NLP), un domaine de l'intelligence artificielle qui implique la construction de modèles informatiques de langages humains tels que l'anglais ou le français. Le but de nos recherches est de développer des méthodes informatiques de compréhension du texte et de la parole, afin de générer un langage fluide et adapté au contexte.

Dans notre laboratoire, nous étudions des techniques statistiques d’apprentissage automatique pour analyser et faire des prédictions sur la langue. Plusieurs projets en cours incluent la synthèse de fiction, l'extraction d'événements à partir d’un texte et l'adaptation de la langue à différents genres.

Étudiants actuels

Collaborateur·rice alumni - McGill
Maîtrise recherche - McGill
Collaborateur·rice de recherche
Collaborateur·rice de recherche
Doctorat - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Stagiaire de recherche - McGill
Doctorat - McGill
Co-superviseur⋅e :
Doctorat - McGill
Co-superviseur⋅e :
Postdoctorat - McGill
Maîtrise recherche - McGill
Maîtrise recherche - McGill
Stagiaire de recherche - McGill University
Stagiaire de recherche - McGill
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Doctorat - McGill
Baccalauréat - McGill
Doctorat - McGill
Stagiaire de recherche - McGill University
Maîtrise recherche - McGill

Publications

BanditSum: Extractive Summarization as a Contextual Bandit
Yue Dong
Yikang Shen
Eric Crawford
Herke van Hoof
In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristical… (voir plus)ly-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BanditSum is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BanditSum performs significantly better than competing approaches when good summary sentences appear late in the source document.
A Knowledge Hunting Framework for Common Sense Reasoning
Ali Emami
Noelia De La Cruz
Adam Trischler
Kaheer Suleman
We introduce an automatic system that achieves state-of-the-art results on the Winograd Schema Challenge (WSC), a common sense reasoning tas… (voir plus)k that requires diverse, complex forms of inference and knowledge. Our method uses a knowledge hunting module to gather text from the web, which serves as evidence for candidate problem resolutions. Given an input problem, our system generates relevant queries to send to a search engine, then extracts and classifies knowledge from the returned results and weighs them to make a resolution. Our approach improves F1 performance on the full WSC by 0.21 over the previous best and represents the first system to exceed 0.5 F1. We further demonstrate that the approach is competitive on the Choice of Plausible Alternatives (COPA) task, which suggests that it is generally applicable.
Let’s do it “again”: A First Computational Approach to Detecting Adverbial Presupposition Triggers
Andre Cianflone
Yulan Feng
Jad Kabbara
We introduce the novel task of predicting adverbial presupposition triggers, which is useful for natural language generation tasks such as s… (voir plus)ummarization and dialogue systems. We introduce two new corpora, derived from the Penn Treebank and the Annotated English Gigaword dataset and investigate the use of a novel attention mechanism tailored to this task. Our attention mechanism augments a baseline recurrent neural network without the need for additional trainable parameters, minimizing the added computational cost of our mechanism. We demonstrate that this model statistically outperforms our baselines.
Coarse Lexical Frame Acquisition at the Syntax–Semantics Interface Using a Latent-Variable PCFG Model
Laura Kallmeyer
Behrang QasemiZadeh
We present a method for unsupervised lexical frame acquisition at the syntax–semantics interface. Given a set of input strings derived fro… (voir plus)m dependency parses, our method generates a set of clusters that resemble lexical frame structures. Our work is motivated not only by its practical applications (e.g., to build, or expand the coverage of lexical frame databases), but also to gain linguistic insight into frame structures with respect to lexical distributions in relation to grammatical structures. We model our task using a hierarchical Bayesian network and employ tools and methods from latent variable probabilistic context free grammars (L-PCFGs) for statistical inference and parameter fitting, for which we propose a new split and merge procedure. We show that our model outperforms several baselines on a portion of the Wall Street Journal sentences that we have newly annotated for evaluation purposes.
Commonsense mining as knowledge base completion? A study on the impact of novelty
Stanisław Jastrzębski
Seyedarian Hosseini
Michael Noukhovitch
Commonsense knowledge bases such as ConceptNet represent knowledge in the form of relational triples. Inspired by recent work by Li et al., … (voir plus)we analyse if knowledge base completion models can be used to mine commonsense knowledge from raw text. We propose novelty of predicted triples with respect to the training set as an important factor in interpreting results. We critically analyse the difficulty of mining novel commonsense knowledge, and show that a simple baseline method that outperforms the previous state of the art on predicting more novel triples.
A Generalized Knowledge Hunting Framework for the Winograd Schema Challenge
Ali Emami
Adam Trischler
Kaheer Suleman
We introduce an automatic system that performs well on two common-sense reasoning tasks, the Winograd Schema Challenge (WSC) and the Choice … (voir plus)of Plausible Alternatives (COPA). Problem instances from these tasks require diverse, complex forms of inference and knowledge to solve. Our method uses a knowledge-hunting module to gather text from the web, which serves as evidence for candidate problem resolutions. Given an input problem, our system generates relevant queries to send to a search engine. It extracts and classifies knowledge from the returned results and weighs it to make a resolution. Our approach improves F1 performance on the WSC by 0.16 over the previous best and is competitive with the state-of-the-art on COPA, demonstrating its general applicability.
Resolving Event Coreference with Supervised Representation Learning and Clustering-Oriented Regularization
Kian Kenyon-Dean
We present an approach to event coreference resolution by developing a general framework for clustering that uses supervised representation … (voir plus)learning. We propose a neural network architecture with novel Clustering-Oriented Regularization (CORE) terms in the objective function. These terms encourage the model to create embeddings of event mentions that are amenable to clustering. We then use agglomerative clustering on these embeddings to build event coreference chains. For both within- and cross-document coreference on the ECB+ corpus, our model obtains better results than models that require significantly more pre-annotated information. This work provides insight and motivating results for a new general approach to solving coreference and clustering problems with representation learning.
Advances in Artificial Intelligence
Ebrahim Bagheri
Advances in Artificial Intelligence
Ebrahim Bagheri
A Hierarchical Neural Attention-based Text Classifier
Koustuv Sinha
Yue Dong
Derek Ruths
Deep neural networks have been displaying superior performance over traditional supervised classifiers in text classification. They learn to… (voir plus) extract useful features automatically when sufficient amount of data is presented. However, along with the growth in the number of documents comes the increase in the number of categories, which often results in poor performance of the multiclass classifiers. In this work, we use external knowledge in the form of topic category taxonomies to aide the classification by introducing a deep hierarchical neural attention-based classifier. Our model performs better than or comparable to state-of-the-art hierarchical models at significantly lower computational cost while maintaining high interpretability.
World Knowledge for Reading Comprehension: Rare Entity Prediction with Hierarchical LSTMs Using External Descriptions
Humans interpret texts with respect to some background information, or world knowledge, and we would like to develop automatic reading compr… (voir plus)ehension systems that can do the same. In this paper, we introduce a task and several models to drive progress towards this goal. In particular, we propose the task of rare entity prediction: given a web document with several entities removed, models are tasked with predicting the correct missing entities conditioned on the document context and the lexical resources. This task is challenging due to the diversity of language styles and the extremely large number of rare entities. We propose two recurrent neural network architectures which make use of external knowledge in the form of entity descriptions. Our experiments show that our hierarchical LSTM model performs significantly better at the rare entity prediction task than those that do not make use of external resources.
Predicting Success in Goal-Driven Human-Human Dialogues
Michael Noseworthy
In goal-driven dialogue systems, success is often defined based on a structured definition of the goal. This requires that the dialogue syst… (voir plus)em be constrained to handle a specific class of goals and that there be a mechanism to measure success with respect to that goal. However, in many human-human dialogues the diversity of goals makes it infeasible to define success in such a way. To address this scenario, we consider the task of automatically predicting success in goal-driven human-human dialogues using only the information communicated between participants in the form of text. We build a dataset from stackoverflow.com which consists of exchanges between two users in the technical domain where ground-truth success labels are available. We then propose a turn-based hierarchical neural network model that can be used to predict success without requiring a structured goal definition. We show this model outperforms rule-based heuristics and other baselines as it is able to detect patterns over the course of a dialogue and capture notions such as gratitude.