Chris Pal

Biography

Christopher Pal is a Canada CIFAR AI Chair, full professor at Polytechnique Montréal and adjunct professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He is also a Distinguished Scientist at ServiceNow Research.

Pal has been involved in AI and machine learning research for over twenty-five years and has published extensively on large-scale language modelling methods and generative modelling techniques. He has a PhD in computer science from the University of Waterloo.

Current Students

Mai Ababneh

Collaborating researcher - Formerly McGill University (but ending)

Paul Barde

Collaborating researcher - McGill University

Principal supervisor :

Master's Research - Université de Montréal

Chris Beckham

PhD - Polytechnique Montréal

Can (Sam) Chen

Collaborating Alumni - McGill University

Principal supervisor :

Xue (Steve) Liu

Léa Demeule

PhD - Université de Montréal

Principal supervisor :

PhD - Polytechnique Montréal

Chris Emezue

Master's Research - Université de Montréal

Co-supervisor :

Collaborating Alumni - Polytechnique Montréal

Roger Girgis

PhD - Polytechnique Montréal

Florian Golemo

Postdoctorate - McGill University

Master's Research - Polytechnique Montréal

PhD - Université de Montréal

Co-supervisor :

Yousef Kotp

Master's Research - Concordia University

Co-supervisor :

Master's Research - Université de Montréal

Olga Luo

PhD - Université de Montréal

Aristides Milios

PhD - Université de Montréal

Joel Moniz

PhD - Polytechnique Montréal

PhD - Polytechnique Montréal

Juan Rodriguez

PhD - École de technologie suprérieure

Luke Rowe

PhD - Université de Montréal

Principal supervisor :

Gaurav Sahu

Postdoctorate - HEC Montréal

Principal supervisor :

PhD - Polytechnique Montréal

Principal supervisor :

PhD - McGill University

Principal supervisor :

Postdoctorate - Polytechnique Montréal

Co-supervisor :

PhD - Université de Montréal

Direct Behavior Specification via Constrained Reinforcement Learning

Blog Posts

August 31, 2022

Julien Roy

Roger Girgis

Joshua Romoff

Pierre-Luc Bacon

Chris Pal

Read the article

Publications

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Amine El hattami

Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MT… (see more)L must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer based Hypernetwork Adapter consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction, we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets.

2021-01-01

ICLR (published)

Structural Inductive Biases in Emergent Communication

Agnieszka M Slowik

Abhinav Gupta

William L. Hamilton

M. Jamnik

S. Holden

In order to communicate, humans flatten a complex representation of ideas and their attributes into a single word or a sentence. We investig… (see more)ate the impact of representation learning in artificial agents by developing graph referential games. We empirically show that agents parametrized by graph neural networks develop a more compositional language compared to bag-of-words and sequence models, which allows them to systematically generalize to new combinations of familiar features.

2021-01-01

CogSci (published)

Bijective-Contrastive Estimation

In this work, we propose Bijective-Contrastive Estimation (BCE), a classification-based learning criterion for energy-based models. We gener… (see more)ate a collection of contrasting distributions using bijections, and solve all the classification problems between the original data distribution and the distributions induced by the bijections using a classifier parameterized by an energy model. We show that if the classification objective is minimized, the energy function will uniquely recover the data density up to a normalizing constant. This has the benefit of not having to explicitly specify a contrasting distribution, like noise contrastive estimation. Experimentally, we demonstrate that the proposed method works well on 2D synthetic datasets. We discuss the difficulty in high dimensional cases, and propose potential directions to explore for future work.

2020-12-21

approximateinference.org/AABI/2021/Symposium (accepted)

AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (published)

proceedings.mlr.press

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

Raymond Li

Sandeep Subramanian

We present a method to produce abstractive summaries of long documents that exceed several thousand words via neural abstractive summarizati… (see more)on. We perform a simple extractive step before generating a summary, which is then used to condition the transformer language model on relevant information before being tasked with generating a summary. We also show that this approach produces more abstractive summaries compared to prior work that employs a copy mechanism while still achieving higher ROUGE scores. We provide extensive comparisons with strong baseline methods, prior state of the art work as well as multiple variants of our approach including those using only transformers, only extractive techniques and combinations of the two. We examine these models using four different summarization tasks and datasets: arXiv papers, PubMed papers, the Newsroom and BigPatent datasets. We find that transformer based methods produce summaries with fewer n-gram copies, leading to n-gram copying statistics that are more similar to human generated abstracts. We include a human evaluation, finding that transformers are ranked highly for coherence and fluency, but purely extractive methods score higher for informativeness and relevance. We hope that these architectures and experiments may serve as strong points of comparison for future work. Note: The abstract above was collaboratively written by the authors and one of the models presented in this paper based on an earlier draft of this paper.

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (published)

COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing

Prateek Gupta

Tegan Maharaj

Martin Weiss

Nasim Rahaman

Hannah Alsdurf

Abhinav Sharma

Nanor Minoyan

Soren Harnois-Leblanc

Victor Schmidt

Pierre-Luc St-Charles

Tristan Deleu

Andrew Robert Williams

Akshay Patel

gaetan caron

satya ortiz gagne

David Buckeridge … (see 9 more)

Joumana Ghosn

Yang Zhang

Bernhard Schölkopf

Joanna Merckx

2020-10-02

OpenReview.net/Anonymous_Preprint (unknown)

Learning to Summarize Long Texts with Memory Compression and Transfer

Jaehong Park

2020-09-28

ArXiv (preprint)

Preface

Tal Arbel

Ismail Ben Ayed

Marleen de Bruijne

Maxime Descoteaux

Hervé Lombaert

2020-09-21

Proceedings of the Third Conference on Medical Imaging with Deep Learning (published)

proceedings.mlr.press

Robust motion in-betweening

Félix Harvey

Mike Yurick

Derek Nowrouzezahrai

In this work we present a novel, robust transition generation technique that can serve as a new tool for 3D animators, based on adversarial … (see more)recurrent neural networks. The system synthesises high-quality motions that use temporally-sparse keyframes as animation constraints. This is reminiscent of the job of in-betweening in traditional animation pipelines, in which an animator draws motion frames between provided keyframes. We first show that a state-of-the-art motion prediction model cannot be easily converted into a robust transition generator when only adding conditioning information about future keyframes. To solve this problem, we then propose two novel additive embedding modifiers that are applied at each timestep to latent representations encoded inside the network's architecture. One modifier is a time-to-arrival embedding that allows variations of the transition length with a single model. The other is a scheduled target noise vector that allows the system to be robust to target distortions and to sample different transitions given fixed keyframes. To qualitatively evaluate our method, we present a custom MotionBuilder plugin that uses our trained model to perform in-betweening in production scenarios. To quantitatively evaluate performance on transitions and generalizations to longer time horizons, we present well-defined in-betweening benchmarks on a subset of the widely used Human3.6M dataset and on LaFAN1, a novel high quality motion capture dataset that is more appropriate for transition generation. We are releasing this new dataset along with this work, with accompanying code for reproducing our baseline results.

2020-08-12

ACM Transactions on Graphics (published)

Towards an Unsupervised Method for Model Selection in Few-Shot Learning

Simon Guiroy

Vikas Verma

The study of generalization of neural networks in gradient-based meta-learning has recently great research interest. Previous work on the st… (see more)udy of the objective landscapes within the scope of few-shot classiﬁcation empirically demonstrated that generalization to new tasks might be linked to the average inner product between their respective gradients vectors (Guiroy et al., 2019). Following that work, we study the effect that meta-training has on the learned space of representation of the network. Notably, we demonstrate that the global similarity in the space of representation, measured by the average inner product between the embeddings of meta-test examples, also correlates to generalization. Based on these observations, we propose a novel model-selection criterion for gradient-based meta-learning and experimentally validate its effectiveness.

2020-07-13

ICML.cc/2020/Workshop/LifelongML (unknown)

Interactive Machine Comprehension with Information Seeking Agents

Xingdi Yuan

Jie Fu

Marc-Alexandre Côté

Yi Tay

Adam Trischler

Existing machine reading comprehension (MRC) models do not scale effectively to real-world applications like web-level information retrieval… (see more) and question answering (QA). We argue that this stems from the nature of MRC datasets: most of these are static environments wherein the supporting documents and all necessary information are fully observed. In this paper, we propose a simple method that reframes existing MRC datasets as interactive, partially observable environments. Specifically, we “occlude” the majority of a document’s text and add context-sensitive commands that reveal “glimpses” of the hidden text to a model. We repurpose SQuAD and NewsQA as an initial case study, and then show how the interactive corpora can be used to train a model that seeks relevant information through sequential decision making. We believe that this setting can contribute in scaling models to web-level QA scenarios.

2020-07-01

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (published)

Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences

Yi Tay

Donovan Ong

Jie Fu

Alvin Chan

Nancy Chen

Anh Tuan Luu

2020-07-01

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (published)