Chris Pal

Biographie

Christopher Pal est titulaire d'une chaire en IA Canada-CIFAR, professeur titulaire à Polytechnique Montréal et professeur adjoint au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal. Il est également chercheur émérite à ServiceNow Research. Il est engagé dans la recherche sur l'intelligence artificielle et l'apprentissage automatique depuis plus de 25 ans, publiant souvent des travaux sur les méthodes de modélisation du langage à grande échelle et les techniques de modélisation générative. Il a obtenu un doctorat en informatique à l'Université de Waterloo.

Étudiants actuels

Mai Ababneh

Collaborateur·rice de recherche - Formerly McGill (but ending)

Paul Barde

Collaborateur·rice de recherche - McGill

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Can (Sam) Chen

Collaborateur·rice alumni - McGill

Superviseur⋅e principal⋅e :

Xue (Steve) Liu

Léa Demeule

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - Polytechnique

Chris Emezue

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Doctorat - Polytechnique

Simon Guiroy

Doctorat - UdeM

Co-superviseur⋅e :

Yousef Kotp

Maîtrise recherche - Concordia

Co-superviseur⋅e :

Doctorat - Polytechnique

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Olga Luo

Doctorat - UdeM

Doctorat - UdeM

Joel Moniz

Doctorat - Polytechnique

Doctorat - Polytechnique

Juan Rodriguez

Doctorat - École de technologie suprérieure

Spécification directe du comportement par apprentissage par renforcement sous contrainte

Luke Rowe

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Gaurav Sahu

Postdoctorat - HEC

Superviseur⋅e principal⋅e :

Doctorat - Polytechnique

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - McGill

Superviseur⋅e principal⋅e :

Postdoctorat - Polytechnique

Co-superviseur⋅e :

Doctorat - UdeM

Collaborateur·rice de recherche

Billets de blogue

Direct Behavior Specification via Constrained Reinforcement Learning

31 août 2022

par

Julien Roy

Roger Girgis

Joshua Romoff

Pierre-Luc Bacon

Chris Pal

Lire l'article

Publications

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Amine El hattami

Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MT… (voir plus)L must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer based Hypernetwork Adapter consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction, we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets.

2021-01-01

ICLR (publié)

Structural Inductive Biases in Emergent Communication

Agnieszka M Slowik

Abhinav Gupta

William L. Hamilton

M. Jamnik

S. Holden

In order to communicate, humans flatten a complex representation of ideas and their attributes into a single word or a sentence. We investig… (voir plus)ate the impact of representation learning in artificial agents by developing graph referential games. We empirically show that agents parametrized by graph neural networks develop a more compositional language compared to bag-of-words and sequence models, which allows them to systematically generalize to new combinations of familiar features.

2021-01-01

CogSci (publié)

Bijective-Contrastive Estimation

In this work, we propose Bijective-Contrastive Estimation (BCE), a classification-based learning criterion for energy-based models. We gener… (voir plus)ate a collection of contrasting distributions using bijections, and solve all the classification problems between the original data distribution and the distributions induced by the bijections using a classifier parameterized by an energy model. We show that if the classification objective is minimized, the energy function will uniquely recover the data density up to a normalizing constant. This has the benefit of not having to explicitly specify a contrasting distribution, like noise contrastive estimation. Experimentally, we demonstrate that the proposed method works well on 2D synthetic datasets. We discuss the difficulty in high dimensional cases, and propose potential directions to explore for future work.

2020-12-21

approximateinference.org/AABI/2021/Symposium (accepté)

AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

Raymond Li

Sandeep Subramanian

We present a method to produce abstractive summaries of long documents that exceed several thousand words via neural abstractive summarizati… (voir plus)on. We perform a simple extractive step before generating a summary, which is then used to condition the transformer language model on relevant information before being tasked with generating a summary. We also show that this approach produces more abstractive summaries compared to prior work that employs a copy mechanism while still achieving higher ROUGE scores. We provide extensive comparisons with strong baseline methods, prior state of the art work as well as multiple variants of our approach including those using only transformers, only extractive techniques and combinations of the two. We examine these models using four different summarization tasks and datasets: arXiv papers, PubMed papers, the Newsroom and BigPatent datasets. We find that transformer based methods produce summaries with fewer n-gram copies, leading to n-gram copying statistics that are more similar to human generated abstracts. We include a human evaluation, finding that transformers are ranked highly for coherence and fluency, but purely extractive methods score higher for informativeness and relevance. We hope that these architectures and experiments may serve as strong points of comparison for future work. Note: The abstract above was collaboratively written by the authors and one of the models presented in this paper based on an earlier draft of this paper.

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (publié)

COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing

Prateek Gupta

Tegan Maharaj

Martin Weiss

Nasim Rahaman

Hannah Alsdurf

Abhinav Sharma

Nanor Minoyan

Soren Harnois-Leblanc

Victor Schmidt

Pierre-Luc St-Charles

Tristan Deleu

Andrew Robert Williams

Akshay Patel

gaetan caron

satya ortiz gagne

David Buckeridge … (voir 9 de plus)

Joumana Ghosn

Yang Zhang

Bernhard Schölkopf

Joanna Merckx

2020-10-02

OpenReview.net/Anonymous_Preprint (inconnu)

Learning to Summarize Long Texts with Memory Compression and Transfer

Jaehong Park

2020-09-28

ArXiv (prépublication)

Preface

Tal Arbel

Ismail Ben Ayed

Marleen de Bruijne

Maxime Descoteaux

Hervé Lombaert

2020-09-21

Proceedings of the Third Conference on Medical Imaging with Deep Learning (publié)

proceedings.mlr.press

Robust motion in-betweening

Félix Harvey

Mike Yurick

Derek Nowrouzezahrai

In this work we present a novel, robust transition generation technique that can serve as a new tool for 3D animators, based on adversarial … (voir plus)recurrent neural networks. The system synthesises high-quality motions that use temporally-sparse keyframes as animation constraints. This is reminiscent of the job of in-betweening in traditional animation pipelines, in which an animator draws motion frames between provided keyframes. We first show that a state-of-the-art motion prediction model cannot be easily converted into a robust transition generator when only adding conditioning information about future keyframes. To solve this problem, we then propose two novel additive embedding modifiers that are applied at each timestep to latent representations encoded inside the network's architecture. One modifier is a time-to-arrival embedding that allows variations of the transition length with a single model. The other is a scheduled target noise vector that allows the system to be robust to target distortions and to sample different transitions given fixed keyframes. To qualitatively evaluate our method, we present a custom MotionBuilder plugin that uses our trained model to perform in-betweening in production scenarios. To quantitatively evaluate performance on transitions and generalizations to longer time horizons, we present well-defined in-betweening benchmarks on a subset of the widely used Human3.6M dataset and on LaFAN1, a novel high quality motion capture dataset that is more appropriate for transition generation. We are releasing this new dataset along with this work, with accompanying code for reproducing our baseline results.

2020-08-12

ACM Transactions on Graphics (publié)

Towards an Unsupervised Method for Model Selection in Few-Shot Learning

Simon Guiroy

Vikas Verma

The study of generalization of neural networks in gradient-based meta-learning has recently great research interest. Previous work on the st… (voir plus)udy of the objective landscapes within the scope of few-shot classiﬁcation empirically demonstrated that generalization to new tasks might be linked to the average inner product between their respective gradients vectors (Guiroy et al., 2019). Following that work, we study the effect that meta-training has on the learned space of representation of the network. Notably, we demonstrate that the global similarity in the space of representation, measured by the average inner product between the embeddings of meta-test examples, also correlates to generalization. Based on these observations, we propose a novel model-selection criterion for gradient-based meta-learning and experimentally validate its effectiveness.

2020-07-13

ICML.cc/2020/Workshop/LifelongML (inconnu)

Interactive Machine Comprehension with Information Seeking Agents

Xingdi Yuan

Jie Fu

Marc-Alexandre Côté

Yi Tay

Adam Trischler

Existing machine reading comprehension (MRC) models do not scale effectively to real-world applications like web-level information retrieval… (voir plus) and question answering (QA). We argue that this stems from the nature of MRC datasets: most of these are static environments wherein the supporting documents and all necessary information are fully observed. In this paper, we propose a simple method that reframes existing MRC datasets as interactive, partially observable environments. Specifically, we “occlude” the majority of a document’s text and add context-sensitive commands that reveal “glimpses” of the hidden text to a model. We repurpose SQuAD and NewsQA as an initial case study, and then show how the interactive corpora can be used to train a model that seeks relevant information through sequential decision making. We believe that this setting can contribute in scaling models to web-level QA scenarios.

2020-07-01

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (publié)

Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences

Yi Tay

Donovan Ong

Jie Fu

Alvin Chan

Nancy Chen

Anh Tuan Luu

2020-07-01

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (publié)