Laurent Charlin

Biography

Laurent Charlin is a Canada CIFAR AI Chair at Mila and an associate professor at HEC, the business school affiliated with the University de Montréal. He is also a core member of Mila—Quebec Institute for Artificial Intelligence.

Charlin’s research focuses on developing novel machine learning models to aid in decision-making. Recent work has focused on learning from data that changes over time, and on applications in fields such as recommender systems and optimization.

He has a number of highly cited publications on dialogue systems (chatbots). He co-developed the Toronto Paper Matching System (TPMS), which has been widely used by computer science conferences for matching reviewers to papers. He has also given MOOCs, introductory talks and media interviews to contribute to knowledge transfer and improve AI literacy.

Current Students

Neda Adl

Master's Research - HEC Montréal

Anirudh Buvanesh

PhD - Université de Montréal

Co-supervisor :

Aaron Courville

Github

Félix Gauthier

Master's Research - HEC Montréal

Soraya Ghassemlou

Master's Research - McGill University

Website

Github

Nicolas Goulet

PhD - HEC Montréal

Principal supervisor :

Eva Portelance

Shubham Gupta

PhD - Université Laval

Principal supervisor :

Cem Subakan

Ben Hudson

PhD - Université de Montréal

Co-supervisor :

Mizu Nishikawa-Toomey

PhD - Université de Montréal

Co-supervisor :

PhD - Concordia University

Principal supervisor :

Collaborating Alumni - Université de Montréal

Emiliano Penaloza

PhD - Université de Montréal

Website

Github

Gaurav Sahu

Postdoctorate - HEC Montréal

Co-supervisor :

PhD - Université de Montréal

Yipeng Zhang

PhD - Université de Montréal

Publications

Should We Feed the Trolls? Using Marketer-Generated Content to Explain Average Toxicity and Product Usage

Marcelo Vinhal Nepomuceno

Hooman Rahemi

Tolga Cenesizoglu

2023-06-28

Journal of Interactive Marketing (published)

Towards Compute-Optimal Transfer Learning

Massimo Caccia

Alexandre Galashov

Arthur Douillard

Amal Rannen-Triki

Dushyant Rao

Michela Paganini

Marc'aurelio Ranzato

Razvan Pascanu

2023-04-24

ArXiv (preprint)

From IID to the Independent Mechanisms assumption in continual learning

Oleksiy Ostapenko

Pau Rodríguez

Alexandre Lacoste

2023-01-10

AAAI.org/2023/Bridge/CCBridge (accepted)

proceedings.mlr.press

Iorl: Inductive-Offline-Reinforcement-Learning for Traffic Signal Control Warmstarting

FranÃ§ois-Xavier Devailly

Denis Larocque

2022-12-31

Social Science Research Network (published)

Price forecasting in the Ontario electricity market via TriConvGRU hybrid model: Univariate vs. multivariate frameworks

Behdad Ehsani

Pierre-Olivier Pineau

Electricity price forecasting is a challenging task for decision-makers in deregulated power markets due to the inherent characteristics of … (see more)electricity prices, e.g., high frequency and volatility. Therefore, accurate forecasting of electricity prices can assist market participants in maximizing their profit. Accordingly, we proposed a novel hybrid Deep Learning model to forecast one-step, two-step, and three-step ahead Ontario electricity prices based on a Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU). Our model consists of three consecutive CNN-GRU models combined in parallel with different input data. We downsampled input data via pooling layers at the beginning of two streams of the model to capture different frequencies of price patterns concurrently. Also, a set of external variables, including previous prices, electricity load, generation, import and export, and weather data, were considered in our forecasting models to test whether these features improve the efficiency of the models. Finally, three experiments in various weeks of 2022 were carried out in the Ontario electricity market to assess the model. The results indicate that the proposed model reduced the forecasting error significantly by 63.3% in the first experiment, 41.8% in the second, and 28.2% in the third, on average, with respect to a Root Mean Square Error (RMSE). Also, the proposed model was compared with outperformed several baseline models, including statistical time-series, Machine Learning, and Deep Learning models. Furthermore, the comparison of results in univariate and multivariate settings indicated that adding variables to forecasting models did not help reduce forecasting errors.

2022-12-31

SSRN Electronic Journal (unknown)

Continual Learning with Foundation Models: An Empirical Study of Latent Replay

Oleksiy Ostapenko

Timothee LESORT

Pau Rodríguez

Md Rifat Arefin

Arthur Douillard

Irina Rish

Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of… (see more) downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios. Our goal is twofold. First, we want to understand the compute-accuracy trade-off between CL in the raw-data space and in the latent space of pre-trained encoders. Second, we investigate how the characteristics of the encoder, the pre-training algorithm and data, as well as of the resulting latent space affect CL performance. For this, we compare the efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space. Notably, this study shows how transfer, forgetting, task similarity and learning are dependent on the input data characteristics and not necessarily on the CL algorithms. First, we show that under some circumstances reasonable CL performance can readily be achieved with a non-parametric classifier at negligible compute. We then show how models pre-trained on broader data result in better performance for various replay sizes. We explain this with representational similarity and transfer properties of these representations. Finally, we show the effectiveness of self-supervised pre-training for downstream domains that are out-of-distribution as compared to the pre-training domain. We point out and validate several research directions that can further increase the efficacy of latent CL including representation ensembling. The diverse set of datasets used in this study can serve as a compute-efficient playground for further CL research. The codebase is available under https://github.com/oleksost/latent_CL.

2022-11-27

Proceedings of The 1st Conference on Lifelong Learning Agents (published)

proceedings.mlr.press

Bayesian learning of Causal Structure and Mechanisms with GFlowNets and Variational Bayes

Mizu Nishikawa-Toomey

Tristan Deleu

Jithendaraa Subramanian

Yoshua Bengio

Bayesian causal structure learning aims to learn a posterior distribution over directed acyclic graphs (DAGs), and the mechanisms that defin… (see more)e the relationship between parent and child variables. By taking a Bayesian approach, it is possible to reason about the uncertainty of the causal model. The notion of modelling the uncertainty over models is particularly crucial for causal structure learning since the model could be unidentifiable when given only a finite amount of observational data. In this paper, we introduce a novel method to jointly learn the structure and mechanisms of the causal model using Variational Bayes, which we call Variational Bayes-DAG-GFlowNet (VBG). We extend the method of Bayesian causal structure learning using GFlowNets to learn not only the posterior distribution over the structure, but also the parameters of a linear-Gaussian model. Our results on simulated data suggest that VBG is competitive against several baselines in modelling the posterior over DAGs and mechanisms, while offering several advantages over existing methods, including the guarantee to sample acyclic graphs, and the flexibility to generalize to non-linear causal mechanisms.

2022-11-03

ArXiv (preprint)

Attention for Compositional Modularity

Oleksiy Ostapenko

Pau Rodríguez

Alexandre Lacoste

Modularity and compositionality are promising inductive biases for addressing longstanding problems in machine learning such as better syste… (see more)matic generalization, as well as better transfer and lower forgetting in the context of continual learning. Here we study how attention-based module selection can help achieve composi-tonal modularity – i.e. decomposition of tasks into meaningful sub-tasks which are tackled by independent architectural entities that we call modules. These sub-tasks must be reusable and the system should be able to learn them without additional supervision. We design a simple experimental setup in which the model is trained to solve mathematical equations with multiple math operations applied sequentially. We study different attention-based module selection strategies, inspired by the principles introduced in the recent literature. We evaluate the method’s ability to learn modules that can recover the underling sub-tasks (operation) used for data generation, as well as the ability to generalize compositionally. We find that meaningful module selection (i.e. routing) is the key to compositional generalization. Further, without access to the privileged information about which part of the input should be used for module selection, the routing component performs poorly for samples that are compositionally out of training distribution. We find that the the main reason for this lies in the routing component, since many of the tested methods perform well OOD if we report the performance of the best performing path at test time. Additionally, we study the role of the number of primitives, the number of training points and bottlenecks for modular specialization.

2022-10-19

NeurIPS.cc/2022/Workshop/Attention (poster)

openreview.net

IG-RL: Inductive Graph Reinforcement Learning for Massive-Scale Traffic Signal Control

FranÃ§ois-Xavier Devailly

Denis Larocque

Scaling adaptive traffic signal control involves dealing with combinatorial state and action spaces. Multi-agent reinforcement learning atte… (see more)mpts to address this challenge by distributing control to specialized agents. However, specialization hinders generalization and transferability, and the computational graphs underlying neural-network architectures—dominating in the multi-agent setting—do not offer the flexibility to handle an arbitrary number of entities which changes both between road networks, and over time as vehicles traverse the network. We introduce Inductive Graph Reinforcement Learning (IG-RL) based on graph-convolutional networks which adapts to the structure of any road network, to learn detailed representations of traffic signal controllers and their surroundings. Our decentralized approach enables learning of a transferable-adaptive-traffic-signal-control policy. After being trained on an arbitrary set of road networks, our model can generalize to new road networks and traffic distributions, with no additional training and a constant number of parameters, enabling greater scalability compared to prior methods. Furthermore, our approach can exploit the granularity of available data by capturing the (dynamic) demand at both the lane level and the vehicle level. The proposed method is tested on both road networks and traffic settings never experienced during training. We compare IG-RL to multi-agent reinforcement learning and domain-specific baselines. In both synthetic road networks and in a larger experiment involving the control of the 3,971 traffic signals of Manhattan, we show that different instantiations of IG-RL outperform baselines.

2022-06-30

IEEE Transactions on Intelligent Transportation Systems (published)

Learning to Cut by Looking Ahead: Cutting Plane Selection via Imitation Learning

Max B. Paulus

Giulia Zarpellon

Andreas Krause

Chris J. Maddison

Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal so… (see more)lution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection - but is too expensive to be deployed in practice. In response, we propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert. Our model outperforms standard baselines for cut selection on several synthetic MILP benchmarks. Experiments with a B&C solver for neural network verification further validate our approach, and exhibit the potential of learning methods in this setting.

2022-06-27

Proceedings of the 39th International Conference on Machine Learning (published)

proceedings.mlr.press

A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions

Francois St-Hilaire

Dung D. Vu

Antoine Frau

Nathan J. Burns

Farid Faraji

Joseph Potochny

Stephane Robert

Arnaud Roussel

Selene Zheng

Taylor Glazier

Junfel Vincent Romano

Robert Belfer

Muhammad Shayan

Ariella Smofsky

Tommy Delarosbil

Seulmin Ahn

Simon Eden-Walker

Kritika Sony

Ansona Onyi Ching

Sabina Elkins … (see 11 more)

A. Stepanyan

Adela Matajova

Victor Chen

Hossein Sahraei

Robert Larson

N. Markova

Andrew Barkett

Yoshua Bengio

Iulian V. Serban

Ekaterina Kochmar

2022-03-02

ArXiv (preprint)

COIL: A Deep Architecture for Column Generation

Behrouz Babaki