Portrait of Chris Pal

Chris Pal

Core Academic Member
Canada CIFAR AI Chair
Full Professor, Polytechnique Montréal, Department of Computer Engineering and Software Engineering
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Deep Learning

Biography

Christopher Pal is a Canada CIFAR AI Chair, full professor at Polytechnique Montréal and adjunct professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He is also a Distinguished Scientist at ServiceNow Research.

Pal has been involved in AI and machine learning research for over twenty-five years and has published extensively on large-scale language modelling methods and generative modelling techniques. He has a PhD in computer science from the University of Waterloo.

Current Students

Collaborating researcher - Formerly McGill University (but ending)
Collaborating researcher - McGill University
Principal supervisor :
Master's Research - Université de Montréal
PhD - Polytechnique Montréal
Collaborating Alumni - McGill University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Polytechnique Montréal
Master's Research - Université de Montréal
Co-supervisor :
Collaborating Alumni - Polytechnique Montréal
PhD - Polytechnique Montréal
Postdoctorate - McGill University
Master's Research - Polytechnique Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Concordia University
Co-supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
PhD - Polytechnique Montréal
PhD - Polytechnique Montréal
PhD - École de technologie suprérieure
PhD - Université de Montréal
Principal supervisor :
Postdoctorate - HEC Montréal
Principal supervisor :
PhD - Polytechnique Montréal
Principal supervisor :
PhD - McGill University
Principal supervisor :
PhD - Polytechnique Montréal
Co-supervisor :
PhD - Université de Montréal

Publications

Are Diffusion Models Vision-And-Language Reasoners?
Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlik… (see more)e discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like like CLEVR and Winoground. We further boost its compositional performance with a transfer setup by fine-tuning on MS-COCO while retaining generative capabilities. We also measure the stereotypical bias in diffusion models, and find that Stable Diffusion 2.1 is, for the most part, less biased than Stable Diffusion 1.5. Overall, our results point in an exciting direction bringing discriminative and generative model evaluation closer. We will release code and benchmark setup soon.
Block-State Transformers
Parallel-mentoring for Offline Model-based Optimization
Parallel-mentoring for Offline Model-based Optimization
We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These desi… (see more)gns encompass a variety of domains, including materials, robots, DNA sequences, and proteins. A common approach trains a proxy on the static dataset and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose
Neural Causal Structure Discovery from Interventions
Nan Rosemary Ke
Bernhard Schölkopf
Michael Curtis Mozer
Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (see more) However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository.
Bridging the Gap Between Target Networks and Functional Regularization
Valentin Thomas
Joseph Marino
Gian Maria Marconi
Rafael Pardinas
Mohammad Emtiyaz Khan
Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design
In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound f… (see more)or pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn the multi-objective problem into a preference-conditioned single objective, it has been established that this kind of reduction may produce solutions that tend to slide towards the extreme points of the objective space when presented with a problem that exhibits a concave Pareto front. In this work we experiment with an alternative formulation of goal-conditioned molecular generation to obtain a more controllable conditional model that can uniformly explore solutions along the entire Pareto front.
Block-State Transformers
State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long… (see more) sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.