Chris Pal

Biography

Christopher Pal is a Canada CIFAR AI Chair, full professor at Polytechnique Montréal and adjunct professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He is also a Distinguished Scientist at ServiceNow Research.

Pal has been involved in AI and machine learning research for over twenty-five years and has published extensively on large-scale language modelling methods and generative modelling techniques. He has a PhD in computer science from the University of Waterloo.

Current Students

Mai Ababneh

Research Intern - McGill University

ababneh.mai@gmail.com

Shubham Agarwal

Postdoctorate - HEC Montréal

Principal supervisor :

Paul Barde

Collaborating researcher - McGill University

Principal supervisor :

Derek Nowrouzezahrai

paul.b.barde@gmail.com

Master's Research - Université de Montréal

Chris Beckham

PhD - Polytechnique Montréal

Can (Sam) Chen

PhD - McGill University

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

PhD - Polytechnique Montréal

Chris Emezue

Master's Research - Université de Montréal

Co-supervisor :

Collaborating Alumni - Polytechnique Montréal

Roger Girgis

PhD - Polytechnique Montréal

Florian Golemo

Postdoctorate - McGill University

Co-supervisor :

Master's Research - Polytechnique Montréal

PhD - Université de Montréal

Co-supervisor :

Yousef Kotp

Master's Research - Concordia University

Co-supervisor :

Collaborating researcher - Université de Montréal

Master's Research - Université de Montréal

Olga Luo

PhD - Université de Montréal

Joel Moniz

PhD - Polytechnique Montréal

Jonathan Pilault

PhD - Polytechnique Montréal

Juan Rodriguez

PhD - École de technologie suprérieure

Luke Rowe

PhD - Université de Montréal

Principal supervisor :

Gaurav Sahu

Postdoctorate - HEC Montréal

Principal supervisor :

PhD - Polytechnique Montréal

Principal supervisor :

PhD - McGill University

Principal supervisor :

PhD - Polytechnique Montréal

Direct Behavior Specification via Constrained Reinforcement Learning

Blog Posts

August 31, 2022

Julien Roy

Roger Girgis

Joshua Romoff

Pierre-Luc Bacon

Chris Pal

Read the article

Publications

Neural Causal Structure Discovery from Interventions

Nan Rosemary Ke

Olexa Bilaniuk

Anirudh Goyal

Stefan Bauer

Hugo Larochelle

Bernhard Schölkopf

Michael Curtis Mozer

Yoshua Bengio

Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (see more) However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository.

2023-09-10

TMLR (accepted)

Bridging the Gap Between Target Networks and Functional Regularization

Alexandre Piché

Valentin Thomas

Joseph Marino

Gian Maria Marconi

Rafael Pardinas

Mohammad Emtiyaz Khan

2023-09-06

TMLR (accepted)

Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design

Julien Roy

Emmanuel Bengio

In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound f… (see more)or pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn the multi-objective problem into a preference-conditioned single objective, it has been established that this kind of reduction may produce solutions that tend to slide towards the extreme points of the objective space when presented with a problem that exhibits a concave Pareto front. In this work we experiment with an alternative formulation of goal-conditioned molecular generation to obtain a more controllable conditional model that can uniformly explore solutions along the entire Pareto front.

2023-06-23

ICML.cc/2023/Workshop/DeployableGenerativeAI (published)

Block-State Transformers

Mahan Fathi

Jonathan Pilault

Orhan Firat

2023-06-15

ArXiv (preprint)

Block-State Transformers

Mahan Fathi

Jonathan Pilault

Orhan Firat

2023-06-15

ArXiv (preprint)

Block-State Transformers

Mahan Fathi

Jonathan Pilault

Orhan Firat

State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long… (see more) sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.

2023-06-15

ArXiv (preprint)

Block-State Transformers

Mahan Fathi

Jonathan Pilault

Orhan Firat

2023-06-15

ArXiv (preprint)

Block-State Transformers

Mahan Fathi

Jonathan Pilault

Orhan Firat

2023-06-15

ArXiv (preprint)

Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans

Stefania Raimondo

Xiaotian Liu

David Vazquez

Hector. Palacios

2023-06-02

ArXiv (preprint)

Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

Pablo Pernias

Dominic Rampas

Mats Leon Richter

Marc Aubreville

We introduce W\"urstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-eff… (see more)ectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly compressed representation of an image provides much more detailed guidance compared to latent representations of language and this significantly reduces the computational requirements to achieve state-of-the-art results. Our approach also improves the quality of text-conditioned image generation based on our user preference study. The training requirements of our approach consists of 24,602 A100-GPU hours - compared to Stable Diffusion 2.1's 200,000 GPU hours. Our approach also requires less training data to achieve these results. Furthermore, our compact latent representations allows us to perform inference over twice as fast, slashing the usual costs and carbon footprint of a state-of-the-art (SOTA) diffusion model significantly, without compromising the end performance. In a broader comparison against SOTA models our approach is substantially more efficient and compares favorably in terms of image quality. We believe that this work motivates more emphasis on the prioritization of both performance and computational accessibility.

2023-06-01

ArXiv (preprint)

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

Qiuyuan Huang

J. Park

Abhinav Gupta

Pan Lu

Paul N. Bennett

Ran Gong

Subhojit Som

Baolin Peng

Owais Khan Mohammed

Yejin Choi

Jianfeng Gao

Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2… (see more)D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e.g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world. The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK), which leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments. The knowledge interactive emergent ability (Figure 1) is demonstrated as the observation learns i) micro-action of cross-modality: in multi-modality models to collect a large amount of relevant knowledge memory data for each interaction task (e.g., unseen scene understanding) from the physical reality; and ii) macro-behavior of reality-agnostic: in mix-reality environments to improve interactions that tailor to different characterized roles, target variables, collaborative information, and so on. We validate the effectiveness of ArK on the scene generation and editing tasks. We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes, compared to baselines, demonstrating the potential benefit of incorporating ArK in generative AI for applications such as metaverse and gaming simulation.

2023-05-01

ArXiv (preprint)

Controllable Image Generation via Collage Representations

Arantxa Casanova

Marlene Careil

Adriana Romero Soriano

Jakob Verbeek

Michal Drozdzal

2023-04-26

ArXiv (preprint)