Laurent Charlin

Biography

Laurent Charlin is a Canada CIFAR AI Chair at Mila and an associate professor at HEC, the business school affiliated with the University de Montréal. He is also a core member of Mila—Quebec Institute for Artificial Intelligence.

Charlin’s research focuses on developing novel machine learning models to aid in decision-making. Recent work has focused on learning from data that changes over time, and on applications in fields such as recommender systems and optimization.

He has a number of highly cited publications on dialogue systems (chatbots). He co-developed the Toronto Paper Matching System (TPMS), which has been widely used by computer science conferences for matching reviewers to papers. He has also given MOOCs, introductory talks and media interviews to contribute to knowledge transfer and improve AI literacy.

Current Students

Neda Adl

Master's Research - HEC Montréal

Anirudh Buvanesh

PhD - Université de Montréal

Co-supervisor :

Aaron Courville

Github

Félix Gauthier

Master's Research - HEC Montréal

Soraya Ghassemlou

Master's Research - McGill University

Website

Github

Nicolas Goulet

PhD - HEC Montréal

Principal supervisor :

Eva Portelance

Shubham Gupta

PhD - Université Laval

Principal supervisor :

Cem Subakan

Ben Hudson

PhD - Université de Montréal

Co-supervisor :

Mizu Nishikawa-Toomey

PhD - Université de Montréal

Co-supervisor :

PhD - Concordia University

Principal supervisor :

Collaborating Alumni - Université de Montréal

Emiliano Penaloza

PhD - Université de Montréal

Website

Github

Gaurav Sahu

Postdoctorate - HEC Montréal

Co-supervisor :

PhD - Université de Montréal

Yipeng Zhang

PhD - Université de Montréal

Publications

Session-Based Social Recommendation via Dynamic Graph Attention Networks

Weiping Song

Zhiping Xiao

Yifan Wang

Ming Zhang

Jian Tang

Online communities such as Facebook and Twitter are enormously popular and have become an essential part of the daily life of many of their … (see more)users. Through these platforms, users can discover and create information that others will then consume. In that context, recommending relevant information to users becomes critical for viability. However, recommendation in online communities is a challenging problem: 1) users' interests are dynamic, and 2) users are influenced by their friends. Moreover, the influencers may be context-dependent. That is, different friends may be relied upon for different topics. Modeling both signals is therefore essential for recommendations. We propose a recommender system for online communities based on a dynamic-graph-attention neural network. We model dynamic user behaviors with a recurrent neural network, and context-dependent social influence with a graph-attention neural network, which dynamically infers the influencers based on users' current interests. The whole model can be efficiently fit on large-scale data. Experimental results on several real-world data sets demonstrate the effectiveness of our proposed approach over several competitive baselines including state-of-the-art models.

2019-01-30

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (published)

Exact Combinatorial Optimization with Graph Convolutional Neural Networks

Maxime Gasse

Didier Chételat

Nicola Ferroni

Andrea Lodi

Combinatorial optimization problems are typically tackled by the branch-and-bound paradigm. We propose a new graph convolutional neural netw… (see more)ork model for learning branch-and-bound variable selection policies, which leverages the natural variable-constraint bipartite graph representation of mixed-integer linear programs. We train our model via imitation learning from the strong branching expert rule, and demonstrate on a series of hard problems that our approach produces policies that improve upon state-of-the-art machine-learning methods for branching and generalize to instances significantly larger than seen during training. Moreover, we improve for the first time over expert-designed branching rules implemented in a state-of-the-art solver on large problems. Code for reproducing all the experiments can be found at this https URL.

2019-01-01

NeurIPS (published)

The Deconfounded Recommender: A Causal Inference Approach to Recommendation

Yixin Wang

Dawen Liang

David Blei

The goal of a recommender system is to show its users items that they will like. In forming its prediction, the recommender system tries to … (see more)answer: "what would the rating be if we 'forced' the user to watch the movie?" This is a question about an intervention in the world, a causal question, and so traditional recommender systems are doing causal inference from observational data. This paper develops a causal inference approach to recommendation. Traditional recommenders are likely biased by unobserved confounders, variables that affect both the "treatment assignments" (which movies the users watch) and the "outcomes" (how they rate them). We develop the deconfounded recommender, a strategy to leverage classical recommendation models for causal predictions. The deconfounded recommender uses Poisson factorization on which movies users watched to infer latent confounders in the data; it then augments common recommendation models to correct for potential confounding bias. The deconfounded recommender improves recommendation and it enjoys stable performance against interventions on test sets.

2018-08-20

ArXiv (preprint)

Focused Hierarchical RNNs for Conditional Sequence Processing

Nan Rosemary Ke

Adam Trischler

Recurrent Neural Networks (RNNs) with attention mechanisms have obtained state-of-the-art results for many sequence processing tasks. Most o… (see more)f these models use a simple form of encoder with attention that looks over the entire sequence and assigns a weight to each token independently. We present a mechanism for focusing RNN encoders for sequence modelling tasks which allows them to attend to key parts of the input as needed. We formulate this using a multi-layer conditional sequence encoder that reads in one token at a time and makes a discrete decision on whether the token is relevant to the context or question being asked. The discrete gating mechanism takes in the context embedding and the current hidden state as inputs and controls information flow into the layer above. We train it using policy gradient methods. We evaluate this method on several types of tasks with different attributes. First, we evaluate the method on synthetic tasks which allow us to evaluate the model for its generalization ability and probe the behavior of the gates in more controlled settings. We then evaluate this approach on large scale Question Answering tasks including the challenging MS MARCO and SearchQA tasks. Our models shows consistent improvements for both tasks over prior work and our baselines. It has also shown to generalize significantly better on synthetic tasks as compared to the baselines.

2018-07-03

Proceedings of the 35th International Conference on Machine Learning (published)

proceedings.mlr.press

Towards Deep Conversational Recommendations

Raymond Li

Samira Ebrahimi Kahou

Hannes Schulz

Vincent Michalski

Chris Pal

There has been growing interest in using neural networks and deep learning techniques to create dialogue systems. Conversational recommendat… (see more)ion is an interesting setting for the scientific exploration of dialogue with natural language as the associated discourse involves goal-driven dialogue that often transforms naturally into more free-form chat. This paper provides two contributions. First, until now there has been no publicly available large-scale data set consisting of real-world dialogues centered around recommendations. To address this issue and to facilitate our exploration here, we have collected ReDial, a data set consisting of over 10,000 conversations centered around the theme of providing movie recommendations. We make this data available to the community for further research. Second, we use this dataset to explore multiple facets of conversational recommendations. In particular we explore new neural architectures, mechanisms and methods suitable for composing conversational recommendation systems. Our dataset allows us to systematically probe model sub-components addressing different parts of the overall problem domain ranging from: sentiment analysis and cold-start recommendation generation to detailed aspects of how natural language is used in this setting in the real world. We combine such sub-components into a full-blown dialogue system and examine its behavior.

Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

Nan Rosemary Ke

A major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagat… (see more)e credit information backwards through every single step of the forward computation. This makes BPTT both computationally impractical and biologically implausible. For this reason, full backpropagation through time is rarely used on long sequences, and truncated backpropagation through time is used as a heuristic. However, this usually leads to biased estimates of the gradient in which longer term dependencies are ignored. Addressing this issue, we propose an alternative algorithm, Sparse Attentive Backtracking, which might also be related to principles used by brains to learn long-term dependencies. Sparse Attentive Backtracking learns an attention mechanism over the hidden states of the past and selectively backpropagates through paths with high attention weights. This allows the model to learn long term dependencies while only backtracking for a small number of time steps, not just from the recent past but also from attended relevant past states.

2017-11-07

ArXiv (preprint)

Learnable Explicit Density for Continuous Latent Space and Variational Inference

In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its correspon… (see more)ding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.

2017-10-06

ArXiv (preprint)

A Sparse Probabilistic Model of User Preference Data

Matthew J. A. Smith

Joelle Pineau

2017-04-11

Advances in Artificial Intelligence (published)

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Iulian V. Serban

Sequential data often possesses hierarchical structures with complex dependencies between sub-sequences, such as found between the utterance… (see more)s in a dialogue. To model these dependencies in a generative framework, we propose a neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with other recent neural-network architectures. We evaluate the model performance through a human evaluation study. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state.

2017-02-12

Proceedings of the AAAI Conference on Artificial Intelligence (published)

Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus

Ryan Thomas Lowe

Nissan Pow

Iulian V. Serban

Chia-Wei Liu

Joelle Pineau

In this paper, we construct and train end-to-end neural network-based dialogue systems using an updated version of the recent Ubuntu Dialogu… (see more)e Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering, which can be both time consuming and expensive. We provide baselines in two different environments: one where models are trained to maximize the log-likelihood of a generated utterance conditioned on the context of the conversation, and one where models are trained to select the correct next response from a list of candidate responses. These are both evaluated on a recall task that we call Next Utterance Classification (NUC), as well as other generation-specific metrics. Finally, we provide a qualitative error analysis to help determine the most promising directions for future research on the Ubuntu Dialogue Corpus, and for end-to-end dialogue systems in general.

2017-01-20

Dialogue & Discourse (published)