Portrait de Chris Pal

Chris Pal

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Polytechnique Montréal, Département de génie informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage profond

Biographie

Christopher Pal est titulaire d'une chaire en IA Canada-CIFAR, professeur titulaire à Polytechnique Montréal et professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il est également chercheur émérite à ServiceNow Research. Il est engagé dans la recherche sur l'intelligence artificielle et l'apprentissage automatique depuis plus de 25 ans, publiant souvent des travaux sur les méthodes de modélisation du langage à grande échelle et les techniques de modélisation générative. Il a obtenu un doctorat en informatique à l'Université de Waterloo.

Étudiants actuels

Collaborateur·rice de recherche - Formerly McGill (but ending)
Collaborateur·rice de recherche - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Doctorat - Polytechnique
Collaborateur·rice alumni - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Collaborateur·rice alumni - Polytechnique
Doctorat - Polytechnique
Maîtrise recherche - Polytechnique
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Doctorat - UdeM
Doctorat - Polytechnique
Doctorat - Polytechnique
Doctorat - École de technologie suprérieure
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - HEC
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - Polytechnique
Co-superviseur⋅e :
Doctorat - UdeM

Publications

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data
Amine El hattami
Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MT… (voir plus)L must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer based Hypernetwork Adapter consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction, we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets.
Structural Inductive Biases in Emergent Communication
Agnieszka M Slowik
William L. Hamilton
M. Jamnik
S. Holden
In order to communicate, humans flatten a complex representation of ideas and their attributes into a single word or a sentence. We investig… (voir plus)ate the impact of representation learning in artificial agents by developing graph referential games. We empirically show that agents parametrized by graph neural networks develop a more compositional language compared to bag-of-words and sequence models, which allows them to systematically generalize to new combinations of familiar features.
Bijective-Contrastive Estimation
In this work, we propose Bijective-Contrastive Estimation (BCE), a classification-based learning criterion for energy-based models. We gener… (voir plus)ate a collection of contrasting distributions using bijections, and solve all the classification problems between the original data distribution and the distributions induced by the bijections using a classifier parameterized by an energy model. We show that if the classification objective is minimized, the energy function will uniquely recover the data density up to a normalizing constant. This has the benefit of not having to explicitly specify a contrasting distribution, like noise contrastive estimation. Experimentally, we demonstrate that the proposed method works well on 2D synthetic datasets. We discuss the difficulty in high dimensional cases, and propose potential directions to explore for future work.
AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation
On Extractive and Abstractive Neural Document Summarization with Transformer Language Models
We present a method to produce abstractive summaries of long documents that exceed several thousand words via neural abstractive summarizati… (voir plus)on. We perform a simple extractive step before generating a summary, which is then used to condition the transformer language model on relevant information before being tasked with generating a summary. We also show that this approach produces more abstractive summaries compared to prior work that employs a copy mechanism while still achieving higher ROUGE scores. We provide extensive comparisons with strong baseline methods, prior state of the art work as well as multiple variants of our approach including those using only transformers, only extractive techniques and combinations of the two. We examine these models using four different summarization tasks and datasets: arXiv papers, PubMed papers, the Newsroom and BigPatent datasets. We find that transformer based methods produce summaries with fewer n-gram copies, leading to n-gram copying statistics that are more similar to human generated abstracts. We include a human evaluation, finding that transformers are ranked highly for coherence and fluency, but purely extractive methods score higher for informativeness and relevance. We hope that these architectures and experiments may serve as strong points of comparison for future work. Note: The abstract above was collaboratively written by the authors and one of the models presented in this paper based on an earlier draft of this paper.
COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing
Prateek Gupta
Nasim Rahaman
Hannah Alsdurf
Abhinav Sharma
Nanor Minoyan
Soren Harnois-Leblanc
Andrew Robert Williams
Akshay Patel
gaetan caron
satya ortiz gagne
Yang Zhang
Bernhard Schölkopf
Joanna Merckx
Learning to Summarize Long Texts with Memory Compression and Transfer
Preface
Ismail Ben Ayed
Marleen de Bruijne
Maxime Descoteaux
Robust motion in-betweening
Félix Harvey
Mike Yurick
In this work we present a novel, robust transition generation technique that can serve as a new tool for 3D animators, based on adversarial … (voir plus)recurrent neural networks. The system synthesises high-quality motions that use temporally-sparse keyframes as animation constraints. This is reminiscent of the job of in-betweening in traditional animation pipelines, in which an animator draws motion frames between provided keyframes. We first show that a state-of-the-art motion prediction model cannot be easily converted into a robust transition generator when only adding conditioning information about future keyframes. To solve this problem, we then propose two novel additive embedding modifiers that are applied at each timestep to latent representations encoded inside the network's architecture. One modifier is a time-to-arrival embedding that allows variations of the transition length with a single model. The other is a scheduled target noise vector that allows the system to be robust to target distortions and to sample different transitions given fixed keyframes. To qualitatively evaluate our method, we present a custom MotionBuilder plugin that uses our trained model to perform in-betweening in production scenarios. To quantitatively evaluate performance on transitions and generalizations to longer time horizons, we present well-defined in-betweening benchmarks on a subset of the widely used Human3.6M dataset and on LaFAN1, a novel high quality motion capture dataset that is more appropriate for transition generation. We are releasing this new dataset along with this work, with accompanying code for reproducing our baseline results.
Towards an Unsupervised Method for Model Selection in Few-Shot Learning
The study of generalization of neural networks in gradient-based meta-learning has recently great research interest. Previous work on the st… (voir plus)udy of the objective landscapes within the scope of few-shot classification empirically demonstrated that generalization to new tasks might be linked to the average inner product between their respective gradients vectors (Guiroy et al., 2019). Following that work, we study the effect that meta-training has on the learned space of representation of the network. Notably, we demonstrate that the global similarity in the space of representation, measured by the average inner product between the embeddings of meta-test examples, also correlates to generalization. Based on these observations, we propose a novel model-selection criterion for gradient-based meta-learning and experimentally validate its effectiveness.
Interactive Machine Comprehension with Information Seeking Agents
Xingdi Yuan
Yi Tay
Adam Trischler
Existing machine reading comprehension (MRC) models do not scale effectively to real-world applications like web-level information retrieval… (voir plus) and question answering (QA). We argue that this stems from the nature of MRC datasets: most of these are static environments wherein the supporting documents and all necessary information are fully observed. In this paper, we propose a simple method that reframes existing MRC datasets as interactive, partially observable environments. Specifically, we “occlude” the majority of a document’s text and add context-sensitive commands that reveal “glimpses” of the hidden text to a model. We repurpose SQuAD and NewsQA as an initial case study, and then show how the interactive corpora can be used to train a model that seeks relevant information through sequential decision making. We believe that this setting can contribute in scaling models to web-level QA scenarios.
Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences
Yi Tay
Donovan Ong
Alvin Chan
Nancy Chen
Anh Tuan Luu