Chris Pal

Biography

Christopher Pal is a Canada CIFAR AI Chair, full professor at Polytechnique Montréal and adjunct professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He is also a Distinguished Scientist at ServiceNow Research.

Pal has been involved in AI and machine learning research for over twenty-five years and has published extensively on large-scale language modelling methods and generative modelling techniques. He has a PhD in computer science from the University of Waterloo.

Current Students

Mai Ababneh

Collaborating researcher - Formerly McGill University (but ending)

Paul Barde

Collaborating researcher - McGill University

Principal supervisor :

Master's Research - Université de Montréal

Can (Sam) Chen

Collaborating Alumni - McGill University

Principal supervisor :

Xue (Steve) Liu

Léa Demeule

PhD - Université de Montréal

Principal supervisor :

PhD - Polytechnique Montréal

Chris Emezue

Master's Research - Université de Montréal

Co-supervisor :

PhD - Polytechnique Montréal

Simon Guiroy

PhD - Université de Montréal

Co-supervisor :

Yousef Kotp

Master's Research - Concordia University

Co-supervisor :

PhD - Polytechnique Montréal

Co-supervisor :

Master's Research - Université de Montréal

Olga Luo

PhD - Université de Montréal

Aristides Milios

PhD - Université de Montréal

Joel Moniz

PhD - Polytechnique Montréal

Jonathan Pilault

PhD - Polytechnique Montréal

Juan Rodriguez

PhD - École de technologie suprérieure

Luke Rowe

PhD - Université de Montréal

Principal supervisor :

Gaurav Sahu

Postdoctorate - HEC Montréal

Principal supervisor :

PhD - Polytechnique Montréal

Principal supervisor :

Collaborating researcher - McGill University

Principal supervisor :

Postdoctorate - Polytechnique Montréal

Co-supervisor :

PhD - Université de Montréal

Direct Behavior Specification via Constrained Reinforcement Learning

Joanna Wolski

Collaborating researcher

Blog Posts

August 31, 2022

Julien Roy

Roger Girgis

Joshua Romoff

Pierre-Luc Bacon

Chris Pal

Read the article

Publications

Does Entity Abstraction Help Generative Transformers Reason?

Nicolas Gontier

Siva Reddy

We study the utility of incorporating entity type abstractions into pre-trained Transformers and test these methods on four NLP tasks requir… (see more)ing different forms of logical reasoning: (1) compositional language understanding with text-based relational reasoning (CLUTRR), (2) abductive reasoning (ProofWriter), (3) multi-hop question answering (HotpotQA), and (4) conversational question answering (CoQA). We propose and empirically explore three ways to add such abstraction: (i) as additional input embeddings, (ii) as a separate sequence to encode, and (iii) as an auxiliary prediction task for the model. Overall, our analysis demonstrates that models with abstract entity knowledge performs better than without it. The best abstraction aware models achieved an overall accuracy of 88.8% and 91.8% compared to the baseline model achieving 62.9% and 89.8% on CLUTRR and ProofWriter respectively. However, for HotpotQA and CoQA, we find that F1 scores improve by only 0.5% on average. Our results suggest that the benefit of explicit abstraction is significant in formally defined logical reasoning settings requiring many reasoning hops, but point to the notion that it is less beneficial for NLP tasks having less formal logical structure.

2022-11-20

TMLR (accepted)

The Liver Tumor Segmentation Benchmark (LiTS)

Patrick Bilic

Patrick Christ

Eugene Vorontsov

Hongwei Bran Li

Grzegorz Chlebus

Hao Chen

Qi Dou

Chi-Wing Fu

Xu Han

Gabriel Efrain Humpire Mamani

Pheng Ann Heng

Jürgen Hesser

Samuel Kadoury

Julian Walter Holch

Tomasz Konopczynski

Miao Yue

Chunming Li

X. Li

Jana Lipková

John Lowengrub … (see 99 more)

Michal Marianne Amitai

Hans Meine

J. Moltz

Marie Piraud

Ivan Ezhov

Xiaojuan Qi

Fernando Navarro

Jin Qi

Florian Kofler

Markus Rempfler

Johannes C. Paetzold

Karsten Roth

Suprosanna Shit

Andrea Schenk

Xiaobin Hu

Anjany Sekuboyina

Ping Zhou

Christian Hülsemeyer

Marcel Beetz

Jan Kirschke

Florian Ettlinger

Felix Gruen

Benedikt Wiestler

Zhiheng Zhang

Georgios Kaissis

Fabian Lohöfer

Rickmer Braren

J. Holch

Michela Antonelli

Felix Hofmann

Woong Bae

Wieland Sommer

Míriam Bellver

Volker Heinemann

Lei Bi

Colin Jacobs

G. Mamani

Bram van Ginneken

Erik B. Dam

Gabriel Chartrand

An Tang

Michal Drozdzal

Bogdan Georgescu

Avi Ben-Cohen

Xavier Giró-i-Nieto

Eyal Klang

M. Amitai

E. Konen

Hayit Greenspan

Johan Moreau

Jan Hendrik Moltz

Alexandre Hostettler

Christian Igel

Luc Soler

Fabian Isensee

Refael Vivanti

Paul Jäger

Adi Szeskin

Fucang Jia

Naama Lev-Cohain

Krishna Chaitanya Kaluva

Jacob Sosna

Mahendra Khened

Leo Joskowicz

Ildoo Kim

Bjoern Menze

Jae-Hun Kim

Zengming Shen

Sungwoong Kim

Simon Kohl

Avinash Kori

Ganapathy Krishnamurthi

Fan Li

Hongchao Li

Junbo Li

Xiaomeng Li

Jun Ma

Klaus Maier-Hein

Kevis-Kokitsi Maninis

Dorit Merhof

Akshay Pai

Mathias Perslev

Jens Petersen

Jordi Pont-Tuset

Oliver Rippel

Ignacio Sarasua

Jordi Torres

Christian Wachinger

Chunliang Wang

Leon Weninger

Jianrong Wu

Daguang Xu

Xiaoping Yang

Simon Chun-Ho Yu

Yading Yuan

Liping Zhang

Jorge Cardoso

Spyridon Bakas

2022-11-17

Medical image analysis (published)

arxiv.org

A General-Purpose Neural Architecture for Geospatial Systems

Nasim Rahaman

Martin Weiss

Frederik Träuble

Francesco Locatello

Alexandre Lacoste

Yoshua Bengio

Li Erran Li

Bernhard Schölkopf

2022-11-02

OpenReview.net/Anonymous_Preprint (unknown)

Using Graph Algorithms to Pretrain Graph Completion Transformers

Bahare Fatemi

David Vasquez

Recent work on Graph Neural Networks has demonstrated that self-supervised pretraining can further enhance performance on downstream graph, … (see more)link, and node classification tasks. However, the efficacy of pretraining tasks has not been fully investigated for downstream large knowledge graph completion tasks. Using a contextualized knowledge graph embedding approach, we investigate five different pretraining signals, constructed using several graph algorithms and no external data, as well as their combination. We leverage the versatility of our Transformer-based model to explore graph structure generation pretraining tasks (i.e. path and k-hop neighborhood generation), typically inapplicable to most graph embedding methods. We further propose a new path-finding algorithm guided by information gain and find that it is the best-performing pretraining task across three downstream knowledge graph completion datasets. While using our new path-finding algorithm as a pretraining signal provides 2-3% MRR improvements, we show that pretraining on all signals together gives the best knowledge graph completion results. In a multitask setting that combines all pretraining tasks, our method surpasses the latest and strong performing knowledge graph embedding methods on all metrics for FB15K-237, on MRR and Hit@1 for WN18RRand on MRR and hit@10 for JF17K (a knowledge hypergraph dataset).

2022-10-14

ArXiv (preprint)

arxiv.org

Direct Behavior Specification via Constrained Reinforcement Learning

Chris J Pal

The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most oft… (see more)en, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied RL projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods to automatically weigh each of these behavioral constraints. Specifically, we investigate how CMDPs can be adapted to solve goal-based tasks while adhering to several constraints simultaneously. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (published)

proceedings.mlr.press

arxiv.org

A Probabilistic Perspective on Reinforcement Learning via Supervised Learning

Alexandre Piché

Rafael Pardinas

David Vázquez

2022-04-27

ICLR.cc/2022/Workshop/GPL (poster)

Learning to Guide and to Be Guided in the Architect-Builder Problem

Tristan Karch

Clément Moulin-Frier

We are interested in interactive agents that learn to coordinate, namely, a …

2022-01-28

ICLR.cc/2022/Conference (poster)

Attention-based Neural Cellular Automata

Mattie Tesfaldet

Derek Nowrouzezahrai

Recent extensions of Cellular Automata (CA) have incorporated key ideas from modern deep learning, dramatically extending their capabilities… (see more) and catalyzing a new family of Neural Cellular Automata (NCA) techniques. Inspired by Transformer-based architectures, our work presents a new class of _attention-based_ NCAs formed using a spatially localized—yet globally organized—self-attention scheme. We introduce an instance of this class named _Vision Transformer Cellular Automata (ViTCA)_. We present quantitative and qualitative results on denoising autoencoding across six benchmark datasets, comparing ViTCA to a U-Net, a U-Net-based CA baseline (UNetCA), and a Vision Transformer (ViT). When comparing across architectures configured to similar parameter complexity, ViTCA architectures yield superior performance across all benchmarks and for nearly every evaluation metric. We present an ablation study on various architectural configurations of ViTCA, an analysis of its effect on cell states, and an investigation on its inductive biases. Finally, we examine its learned representations via linear probes on its converged cell state hidden representations, yielding, on average, superior results when compared to our U-Net, ViT, and UNetCA baselines.

Challenges in leveraging GANs for few-shot data augmentation

Christopher Beckham

Issam Hadj Laradji

Pau Rodriguez

David Vázquez

Derek Nowrouzezahrai

2022-01-01

arXiv.org (preprint)

Latent Variable Sequential Set Transformers for Joint Multi-Agent Motion Prediction

Jim Aldon D'Souza

Samira Ebrahimi Kahou

Felix Heide

Robust multi-agent trajectory prediction is essential for the safe control of robotic systems. A major challenge is to efficiently learn a r… (see more)epresentation that approximates the true joint distribution of contextual, social, and temporal information to enable planning. We propose Latent Variable Sequential Set Transformers which are encoder-decoder architectures that generate scene-consistent multi-agent trajectories. We refer to these architectures as “AutoBots”. The encoder is a stack of interleaved temporal and social multi-head self-attention (MHSA) modules which alternately perform equivariant processing across the temporal and social dimensions. The decoder employs learnable seed parameters in combination with temporal and social MHSA modules allowing it to perform inference over the entire future scene in a single forward pass efficiently. AutoBots can produce either the trajectory of one ego-agent or a distribution over the future trajectories for all agents in the scene. For the single-agent prediction case, our model achieves top results on the global nuScenes vehicle motion prediction leaderboard, and produces strong results on the Argoverse vehicle prediction challenge. In the multi-agent setting, we evaluate on the synthetic partition of TrajNet++ dataset to showcase the model’s socially-consistent predictions. We also demonstrate our model on general sequences of sets and provide illustrative experiments modelling the sequential structure of the multiple strokes that make up symbols in the Omniglot data. A distinguishing feature of AutoBots is that all models are trainable on a single desktop GPU (1080 Ti) in under 48h.

2022-01-01

ICLR (published)

Alexia Jolicoeur-Martineau

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Vikram Voleti

Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor … (see more)and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames. This novel but straightforward setup allows us to train a single model that is capable of executing a broad range of video tasks, specifically: future/past prediction -- when only future/past frames are masked; unconditional generation -- when both past and future frames are masked; and interpolation -- when neither past nor future frames are masked. Our experiments show that this approach can generate high-quality frames for diverse types of videos. Our MCVD models are built from simple non-recurrent 2D-convolutional architectures, conditioning on blocks of frames and generating blocks of frames. We generate videos of arbitrary lengths autoregressively in a block-wise manner. Our approach yields SOTA results across standard video prediction and interpolation benchmarks, with computation times for training models measured in 1-12 days using

Neural Attentive Circuits

Nasim Rahaman

Martin Weiss

Francesco Locatello

Yoshua Bengio

Bernhard Schölkopf

Li Erran Li

Nicolas Ballas

Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modali… (see more)ties. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.