Portrait de Laurent Charlin

Laurent Charlin

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, HEC Montréal, Département de Sciences de la décision
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage de représentations
Apprentissage par renforcement
Apprentissage profond
Exploration des données
IA pour la science
Modèles génératifs
Modèles probabilistes
Recherche d'information
Réseaux de neurones en graphes
Systèmes de recommandation
Traitement du langage naturel

Biographie

Laurent Charlin est Directeur scientifique par intérim à Mila – Institut québécois d’intelligence artificielle, titulaire d’une chaire en IA Canada-CIFAR et professeur agrégé à HEC Montréal. Il est également membre principal à Mila.

Ses recherches portent sur le développement de nouveaux modèles d'apprentissage automatique pour aider à la prise de décision. Ses travaux récents concernent l'apprentissage à partir de données qui évoluent dans le temps. Il travaille également sur des applications dans des domaines tels que les systèmes de recommandation et l'optimisation.

Il est l'auteur de publications très citées sur les systèmes de dialogue (chatbots). Laurent Charlin a codéveloppé le Toronto Paper Matching System (TPMS), qui a été largement utilisé dans les conférences d'informatique pour faire correspondre les réviseur·euse·s aux articles. Il a également contribué à plusieurs MOOC récents, et a donné des conférences d'introduction et des interviews dans les médias pour contribuer au transfert de connaissances et améliorer la culture de l'IA.

Étudiants actuels

Maîtrise recherche - HEC
Postdoctorat - HEC
Co-superviseur⋅e :
Maîtrise recherche - HEC
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - HEC
Doctorat - HEC
Superviseur⋅e principal⋅e :
Doctorat - Université Laval
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - Concordia
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Postdoctorat - HEC
Co-superviseur⋅e :
Doctorat - UdeM
Doctorat - UdeM

Publications

Predictive inference for travel time on transportation networks
Mohamad Elmasri
Aurélie Labbe
Denis Larocque
Challenging Common Assumptions about Catastrophic Forgetting and Knowledge Accumulation
Timothee LESORT
Pau Rodriguez
Md Rifat Arefin
Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges
Massimo Caccia
Jonas Mueller
Taesup Kim
Rasool Fakoor
A Case Study of Instruction Tuning with Mixture of Parameter-Efficient Experts
We study the applicability of mixture of parameter-efficient experts (MoPEs) for instruction-tuning large decoder-only language models. Rece… (voir plus)nt literature indicates that MoPEs might enhance performance in specific multi-task instruction-following datasets. In this paper, we extend such previous results and study applicability of MoPEs in settings previously overlooked: a) with open-domain instruction-following datasets; b) with recent decoder-only models and c) with downstream out-of-distribution test sets. We build on top of LLaMA1-13B/-7B and LLaMA2-13B. We study different variants of learned routing, namely per-example routing ([PE]), and a more expensive per-token ([PT]) routing. Overall, we are unable to substantiate strong performance gains observed in related studies in our setting. We observe occasional enhancements of LLAMA2 fine-tuned on Open Platypus dataset in 0-shot SNI evaluation and TruthfulQA evaluation after fine-tuning on a subset of Flan. We shed some light on the inner workings of MoPEs by comparing different routing strategies. We find that [PE] routing tends to collapse at downstream evaluation time reducing the importance of router's application. We plan to publicly release our code.
Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network
Generative Flow Networks (GFlowNets), a class of generative models over discrete and structured sample spaces, have been previously applied … (voir plus)to the problem of inferring the marginal posterior distribution over the directed acyclic graph (DAG) of a Bayesian Network, given a dataset of observations. Based on recent advances extending this framework to non-discrete sample spaces, we propose in this paper to approximate the joint posterior over not only the structure of a Bayesian Network, but also the parameters of its conditional probability distributions. We use a single GFlowNet whose sampling policy follows a two-phase process: the DAG is first generated sequentially one edge at a time, and then the corresponding parameters are picked once the full structure is known. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models of the Bayesian Network, making our approach applicable even to non-linear models parametrized by neural networks. We show that our method, called JSP-GFN, offers an accurate approximation of the joint posterior, while comparing favorably against existing methods on both simulated and real data.
Should We Feed the Trolls? Using Marketer-Generated Content to Explain Average Toxicity and Product Usage
Marcelo Vinhal Nepomuceno
Hooman Rahemi
Tolga Cenesizoglu
Towards Compute-Optimal Transfer Learning
Massimo Caccia
Alexandre Galashov
Arthur Douillard
Amal Rannen-Triki
Dushyant Rao
Michela Paganini
Marc'aurelio Ranzato
From IID to the Independent Mechanisms assumption in continual learning
Pau Rodriguez
Alexandre Lacoste
Current machine learning algorithms are successful in learning clearly defined tasks from large i.i.d. data. Continual learning (CL) require… (voir plus)s learning without iid-ness and developing algorithms capable of knowledge retention and transfer, the latter can be boosted through systematic generalization. Dropping the i.i.d. assumption requires replacing it with another hypothesis. While there are several candidates, here we advocate that the independent mechanism assumption (IM) (Sch¨olkopf et al., 2012) is a useful hypothesis for representing knowledge in a form, that makes it easy to adapt to new tasks in CL. Specifically, we review several types of distribution shifts that are common in CL and point out in which way a system that represents knowledge in the form of causal modules may outperform monolithic counterparts in CL. Intuitively, the efficacy of IM solution emerges since (i) causal modules learn mechanisms invariant across domains; (ii) if causal mechanisms must be updated, modularity can enable efficient and sparse updates.
From IID to the Independent Mechanisms assumption in continual learning
Pau Rodriguez
Alexandre Lacoste
Iorl: Inductive-Offline-Reinforcement-Learning for Traffic Signal Control Warmstarting
François-Xavier Devailly
Denis Larocque
Price Forecasting in the Ontario Electricity Market via TriConvGRU Hybrid Model: Univariate vs. Multivariate Frameworks
Behdad Ehsani
Pierre-Olivier Pineau
Continual Learning with Foundation Models: An Empirical Study of Latent Replay
Timothee LESORT
Pau Rodriguez
Md Rifat Arefin
Arthur Douillard
Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of… (voir plus) downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios. Our goal is twofold. First, we want to understand the compute-accuracy trade-off between CL in the raw-data space and in the latent space of pre-trained encoders. Second, we investigate how the characteristics of the encoder, the pre-training algorithm and data, as well as of the resulting latent space affect CL performance. For this, we compare the efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space. Notably, this study shows how transfer, forgetting, task similarity and learning are dependent on the input data characteristics and not necessarily on the CL algorithms. First, we show that under some circumstances reasonable CL performance can readily be achieved with a non-parametric classifier at negligible compute. We then show how models pre-trained on broader data result in better performance for various replay sizes. We explain this with representational similarity and transfer properties of these representations. Finally, we show the effectiveness of self-supervised pre-training for downstream domains that are out-of-distribution as compared to the pre-training domain. We point out and validate several research directions that can further increase the efficacy of latent CL including representation ensembling. The diverse set of datasets used in this study can serve as a compute-efficient playground for further CL research. We will publish the code.