Portrait of Chris Pal

Chris Pal

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Deep Learning

Biography

Christopher Pal is a Canada CIFAR AI Chair, full professor at Polytechnique Montréal and adjunct professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He is also a Distinguished Scientist at ServiceNow Research.

Pal has been involved in AI and machine learning research for over twenty-five years and has published extensively on large-scale language modelling methods and generative modelling techniques. He has a PhD in computer science from the University of Waterloo.

Current Students

Research Intern - McGill University
Postdoctorate - HEC Montréal
Principal supervisor :
Collaborating researcher - McGill University
Principal supervisor :
Master's Research - Université de Montréal
PhD - Polytechnique Montréal
PhD - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Polytechnique Montréal
Master's Research - Université de Montréal
Co-supervisor :
Collaborating Alumni - Polytechnique Montréal
PhD - Polytechnique Montréal
Postdoctorate - McGill University
Co-supervisor :
Master's Research - Polytechnique Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Concordia University
Co-supervisor :
Collaborating researcher - Université de Montréal
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - École de technologie suprérieure
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - HEC Montréal
Principal supervisor :
PhD - Polytechnique Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - Polytechnique Montréal

Publications

Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance
Mats Leon Richter
Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive … (see more)performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains across twenty well-known CNN architectures examined in our experiments. By further developing and formalizing the analysis of receptive field expansion in convolutional neural networks, we can predict unproductive layers in an automated manner before ever training a model. This allows us to optimize the parameter-efficiency of a given architecture at low cost. Our method is computationally simple and can be done in an automated manner or even manually with minimal effort for most common architectures. We demonstrate the effectiveness of this approach by increasing parameter efficiency across past and current top-performing CNN-architectures. Specifically, our approach is able to improve ImageNet1K performance across a wide range of well-known, state-of-the-art (SOTA) model classes, including: VGG Nets, MobileNetV1, MobileNetV3, NASNet A (mobile), MnasNet, EfficientNet, and ConvNeXt - leading to a new SOTA result for each model class.
SMPL-IK: Learned Morphology-Aware Inverse Kinematics for AI Driven Artistic Workflows
Vikram Voleti
Boris Oreshkin
Florent Bocquelet
Félix Harvey
Louis-Simon Ménard
Does Entity Abstraction Help Generative Transformers Reason?
Nicolas Gontier
We study the utility of incorporating entity type abstractions into pre-trained Transformers and test these methods on four NLP tasks requir… (see more)ing different forms of logical reasoning: (1) compositional language understanding with text-based relational reasoning (CLUTRR), (2) abductive reasoning (ProofWriter), (3) multi-hop question answering (HotpotQA), and (4) conversational question answering (CoQA). We propose and empirically explore three ways to add such abstraction: (i) as additional input embeddings, (ii) as a separate sequence to encode, and (iii) as an auxiliary prediction task for the model. Overall, our analysis demonstrates that models with abstract entity knowledge performs better than without it. The best abstraction aware models achieved an overall accuracy of 88.8% and 91.8% compared to the baseline model achieving 62.9% and 89.8% on CLUTRR and ProofWriter respectively. However, for HotpotQA and CoQA, we find that F1 scores improve by only 0.5% on average. Our results suggest that the benefit of explicit abstraction is significant in formally defined logical reasoning settings requiring many reasoning hops, but point to the notion that it is less beneficial for NLP tasks having less formal logical structure.
The Liver Tumor Segmentation Benchmark (LiTS)
Patrick Bilic
Patrick Christ
Eugene Vorontsov
Hongwei Bran Li
Grzegorz Chlebus
Hao Chen
Qi Dou
Chi-Wing Fu
Xu Han
Gabriel Efrain Humpire Mamani
Pheng Ann Heng
Jürgen Hesser
Samuel Kadoury
Julian Walter Holch
Tomasz Konopczynski
Miao Yue
Chunming Li
X. Li
Jana Lipková
John Lowengrub … (see 99 more)
Michal Marianne Amitai
Hans Meine
J. Moltz
Marie Piraud
Ivan Ezhov
Xiaojuan Qi
Fernando Navarro
Jin Qi
Florian Kofler
Markus Rempfler
Johannes C. Paetzold
Karsten Roth
Suprosanna Shit
Andrea Schenk
Xiaobin Hu
Anjany Sekuboyina
Ping Zhou
Christian Hülsemeyer
Marcel Beetz
Jan Kirschke
Florian Ettlinger
Felix Gruen
Benedikt Wiestler
Zhiheng Zhang
Georgios Kaissis
Fabian Lohöfer
Rickmer Braren
J. Holch
Michela Antonelli
Felix Hofmann
Woong Bae
Wieland Sommer
Míriam Bellver
Volker Heinemann
Lei Bi
Colin Jacobs
G. Mamani
Bram van Ginneken
Erik B. Dam
Gabriel Chartrand
An Tang
Michal Drozdzal
Bogdan Georgescu
Avi Ben-Cohen
Xavier Giró-i-Nieto
Eyal Klang
M. Amitai
E. Konen
Hayit Greenspan
Johan Moreau
Jan Hendrik Moltz
Alexandre Hostettler
Christian Igel
Luc Soler
Fabian Isensee
Refael Vivanti
Paul Jäger
Adi Szeskin
Fucang Jia
Naama Lev-Cohain
Krishna Chaitanya Kaluva
Jacob Sosna
Mahendra Khened
Leo Joskowicz
Ildoo Kim
Bjoern Menze
Jae-Hun Kim
Zengming Shen
Sungwoong Kim
Simon Kohl
Avinash Kori
Ganapathy Krishnamurthi
Fan Li
Hongchao Li
Junbo Li
Xiaomeng Li
Jun Ma
Klaus Maier-Hein
Kevis-Kokitsi Maninis
Dorit Merhof
Akshay Pai
Mathias Perslev
Jens Petersen
Jordi Pont-Tuset
Oliver Rippel
Ignacio Sarasua
Jordi Torres
Christian Wachinger
Chunliang Wang
Leon Weninger
Jianrong Wu
Daguang Xu
Xiaoping Yang
Simon Chun-Ho Yu
Yading Yuan
Liping Zhang
Jorge Cardoso
Spyridon Bakas
A General-Purpose Neural Architecture for Geospatial Systems
Nasim Rahaman
Martin Weiss
Frederik Träuble
Francesco Locatello
Alexandre Lacoste
Li Erran Li
Bernhard Schölkopf
Using Graph Algorithms to Pretrain Graph Completion Transformers
Jonathan Pilault
Mikhail Galkin
Bahare Fatemi
Perouz Taslakian
David Vasquez
Recent work on Graph Neural Networks has demonstrated that self-supervised pretraining can further enhance performance on downstream graph, … (see more)link, and node classification tasks. However, the efficacy of pretraining tasks has not been fully investigated for downstream large knowledge graph completion tasks. Using a contextualized knowledge graph embedding approach, we investigate five different pretraining signals, constructed using several graph algorithms and no external data, as well as their combination. We leverage the versatility of our Transformer-based model to explore graph structure generation pretraining tasks (i.e. path and k-hop neighborhood generation), typically inapplicable to most graph embedding methods. We further propose a new path-finding algorithm guided by information gain and find that it is the best-performing pretraining task across three downstream knowledge graph completion datasets. While using our new path-finding algorithm as a pretraining signal provides 2-3% MRR improvements, we show that pretraining on all signals together gives the best knowledge graph completion results. In a multitask setting that combines all pretraining tasks, our method surpasses the latest and strong performing knowledge graph embedding methods on all metrics for FB15K-237, on MRR and Hit@1 for WN18RRand on MRR and hit@10 for JF17K (a knowledge hypergraph dataset).
Direct Behavior Specification via Constrained Reinforcement Learning
Julien Roy
Roger Girgis
Joshua Romoff
Chris J Pal
The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most oft… (see more)en, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied RL projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods to automatically weigh each of these behavioral constraints. Specifically, we investigate how CMDPs can be adapted to solve goal-based tasks while adhering to several constraints simultaneously. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.
A Probabilistic Perspective on Reinforcement Learning via Supervised Learning
Alexandre Piché
Rafael Pardinas
David Vazquez
Learning to Guide and to Be Guided in the Architect-Builder Problem
Paul Barde
Tristan Karch
Clément Moulin-Frier
Pierre-Yves Oudeyer
We are interested in interactive agents that learn to coordinate, namely, a …
Attention-based Neural Cellular Automata
Mattie Tesfaldet
Recent extensions of Cellular Automata (CA) have incorporated key ideas from modern deep learning, dramatically extending their capabilities… (see more) and catalyzing a new family of Neural Cellular Automata (NCA) techniques. Inspired by Transformer-based architectures, our work presents a new class of _attention-based_ NCAs formed using a spatially localized—yet globally organized—self-attention scheme. We introduce an instance of this class named _Vision Transformer Cellular Automata (ViTCA)_. We present quantitative and qualitative results on denoising autoencoding across six benchmark datasets, comparing ViTCA to a U-Net, a U-Net-based CA baseline (UNetCA), and a Vision Transformer (ViT). When comparing across architectures configured to similar parameter complexity, ViTCA architectures yield superior performance across all benchmarks and for nearly every evaluation metric. We present an ablation study on various architectural configurations of ViTCA, an analysis of its effect on cell states, and an investigation on its inductive biases. Finally, we examine its learned representations via linear probes on its converged cell state hidden representations, yielding, on average, superior results when compared to our U-Net, ViT, and UNetCA baselines.
Challenges in leveraging GANs for few-shot data augmentation
Christopher Beckham
Issam Hadj Laradji
Pau Rodriguez
David Vazquez
Latent Variable Sequential Set Transformers for Joint Multi-Agent Motion Prediction
Roger Girgis
Florian Golemo
Felipe Codevilla
Martin Weiss
Jim Aldon D'Souza
Felix Heide
Robust multi-agent trajectory prediction is essential for the safe control of robotic systems. A major challenge is to efficiently learn a r… (see more)epresentation that approximates the true joint distribution of contextual, social, and temporal information to enable planning. We propose Latent Variable Sequential Set Transformers which are encoder-decoder architectures that generate scene-consistent multi-agent trajectories. We refer to these architectures as “AutoBots”. The encoder is a stack of interleaved temporal and social multi-head self-attention (MHSA) modules which alternately perform equivariant processing across the temporal and social dimensions. The decoder employs learnable seed parameters in combination with temporal and social MHSA modules allowing it to perform inference over the entire future scene in a single forward pass efficiently. AutoBots can produce either the trajectory of one ego-agent or a distribution over the future trajectories for all agents in the scene. For the single-agent prediction case, our model achieves top results on the global nuScenes vehicle motion prediction leaderboard, and produces strong results on the Argoverse vehicle prediction challenge. In the multi-agent setting, we evaluate on the synthetic partition of TrajNet++ dataset to showcase the model’s socially-consistent predictions. We also demonstrate our model on general sequences of sets and provide illustrative experiments modelling the sequential structure of the multiple strokes that make up symbols in the Omniglot data. A distinguishing feature of AutoBots is that all models are trainable on a single desktop GPU (1080 Ti) in under 48h.