Portrait of Chris Pal

Chris Pal

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Deep Learning

Biography

Christopher Pal is a Canada CIFAR AI Chair, full professor at Polytechnique Montréal and adjunct professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He is also a Distinguished Scientist at ServiceNow Research.

Pal has been involved in AI and machine learning research for over twenty-five years and has published extensively on large-scale language modelling methods and generative modelling techniques. He has a PhD in computer science from the University of Waterloo.

Current Students

Research Intern - McGill University
Postdoctorate - HEC Montréal
Principal supervisor :
Collaborating researcher - McGill University
Principal supervisor :
Master's Research - Université de Montréal
PhD - Polytechnique Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Polytechnique Montréal
Master's Research - Université de Montréal
Co-supervisor :
Collaborating Alumni - Polytechnique Montréal
PhD - Polytechnique Montréal
Postdoctorate - McGill University
Co-supervisor :
Master's Research - Polytechnique Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - École de technologie suprérieure
PhD - Université de Montréal
Principal supervisor :
PhD - Polytechnique Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - Polytechnique Montréal

Publications

Does Entity Abstraction Help Generative Transformers Reason?
Nicolas Gontier
We study the utility of incorporating entity type abstractions into pre-trained Transformers and test these methods on four NLP tasks requir… (see more)ing different forms of logical reasoning: (1) compositional language understanding with text-based relational reasoning (CLUTRR), (2) abductive reasoning (ProofWriter), (3) multi-hop question answering (HotpotQA), and (4) conversational question answering (CoQA). We propose and empirically explore three ways to add such abstraction: (i) as additional input embeddings, (ii) as a separate sequence to encode, and (iii) as an auxiliary prediction task for the model. Overall, our analysis demonstrates that models with abstract entity knowledge performs better than without it. The best abstraction aware models achieved an overall accuracy of 88.8% and 91.8% compared to the baseline model achieving 62.9% and 89.8% on CLUTRR and ProofWriter respectively. However, for HotpotQA and CoQA, we find that F1 scores improve by only 0.5% on average. Our results suggest that the benefit of explicit abstraction is significant in formally defined logical reasoning settings requiring many reasoning hops, but point to the notion that it is less beneficial for NLP tasks having less formal logical structure.
A General-Purpose Neural Architecture for Geospatial Systems
Nasim Rahaman
Martin Weiss
Frederik Träuble
Francesco Locatello
Alexandre Lacoste
Li Erran Li
Bernhard Schölkopf
Direct Behavior Specification via Constrained Reinforcement Learning
Julien Roy
Roger Girgis
Joshua Romoff
Chris J Pal
The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most oft… (see more)en, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied RL projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods to automatically weigh each of these behavioral constraints. Specifically, we investigate how CMDPs can be adapted to solve goal-based tasks while adhering to several constraints simultaneously. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.
MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
Vikram Voleti
Alexia Jolicoeur-Martineau
Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor … (see more)and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames. This novel but straightforward setup allows us to train a single model that is capable of executing a broad range of video tasks, specifically: future/past prediction -- when only future/past frames are masked; unconditional generation -- when both past and future frames are masked; and interpolation -- when neither past nor future frames are masked. Our experiments show that this approach can generate high-quality frames for diverse types of videos. Our MCVD models are built from simple non-recurrent 2D-convolutional architectures, conditioning on blocks of frames and generating blocks of frames. We generate videos of arbitrary lengths autoregressively in a block-wise manner. Our approach yields SOTA results across standard video prediction and interpolation benchmarks, with computation times for training models measured in 1-12 days using
Neural Attentive Circuits
Nasim Rahaman
Martin Weiss
Francesco Locatello
Bernhard Schölkopf
Li Erran Li
Nicolas Ballas
Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modali… (see more)ties. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.
From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence
Nicholas Roy
Ingmar Posner
T. Barfoot
Philippe Beaudoin
Jeannette Bohg
Oliver Brock
Isabelle Depatie
Dieter Fox
D. Koditschek
Tom'as Lozano-p'erez
Vikash K. Mansinghka
Dorsa Sadigh
Stefan Schaal
G. Sukhatme
Denis Therien
Marc Emile Toussaint
Michiel van de Panne
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning
Nan Rosemary Ke
Aniket Rajiv Didolkar
Sarthak Mittal
Anirudh Goyal
Stefan Bauer
Danilo Jimenez Rezende
Michael Curtis Mozer
Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise tha… (see more)t the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.
Predicting Infectiousness for Proactive Contact Tracing
Prateek Gupta
Nasim Rahaman
Martin Weiss
Tristan Deleu
Meng Qu
Victor Schmidt
Pierre-Luc St-Charles
Hannah Alsdurf
Olexa Bilaniuk
gaetan caron
pierre luc carrier
Joumana Ghosn
satya ortiz gagne
Bernhard Schölkopf … (see 3 more)
Abhinav Sharma
andrew williams
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdo… (see more)wns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier
Pierre Delaunay
Mirko Bronzi
Assya Trofimov
Brennan Nichyporuk
Justin Szeto
Naz Sepah
Edward Raff
Kanika Madan
Vikram Voleti
Vincent Michalski
Dmitriy Serdyuk
Gael Varoquaux
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the l… (see more)earning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing
Prateek Gupta
Martin Weiss
Nasim Rahaman
Hannah Alsdurf
Abhinav Sharma
Nanor Minoyan
Soren Harnois-Leblanc
Victor Schmidt
Pierre-Luc St-Charles
Tristan Deleu
andrew williams
Akshay Patel
Meng Qu
Olexa Bilaniuk
gaetan caron
pierre luc carrier
satya ortiz gagne
Marc-Andre Rousseau
Joumana Ghosn
Yang Zhang
Bernhard Schölkopf
Joanna Merckx
Medical Imaging with Deep Learning: MIDL 2020 - Short Paper Track
Ismail Ben Ayed
Marleen de Bruijne
Maxime Descoteaux
This compendium gathers all the accepted extended abstracts from the Third International Conference on Medical Imaging with Deep Learning (M… (see more)IDL 2020), held in Montreal, Canada, 6-9 July 2020. Note that only accepted extended abstracts are listed here, the Proceedings of the MIDL 2020 Full Paper Track are published in the Proceedings of Machine Learning Research (PMLR).
Measuring Systematic Generalization in Neural Proof Generation with Transformers
Nicolas Gontier
Koustuv Sinha
We are interested in understanding how well Transformer language models (TLMs) can perform reasoning tasks when trained on knowledge encoded… (see more) in the form of natural language. We investigate their systematic generalization abilities on a logical reasoning task in natural language, which involves reasoning over relationships between entities grounded in first-order logical proofs. Specifically, we perform soft theorem-proving by leveraging TLMs to generate natural language proofs. We test the generated proofs for logical consistency, along with the accuracy of the final inference. We observe length-generalization issues when evaluated on longer-than-trained sequences. However, we observe TLMs improve their generalization performance after being exposed to longer, exhaustive proofs. In addition, we discover that TLMs are able to generalize better using backward-chaining proofs compared to their forward-chaining counterparts, while they find it easier to generate forward chaining proofs. We observe that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs. This suggests that Transformers have efficient internal reasoning strategies that are harder to interpret. These results highlight the systematic generalization behavior of TLMs in the context of logical reasoning, and we believe this work motivates deeper inspection of their underlying reasoning strategies.