Portrait de Simon Lacoste-Julien

Simon Lacoste-Julien

Membre académique principal
Chaire en IA Canada-CIFAR
Directeur scientifique adjoint, Mila, Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Vice-président et directeur de laboratoire, Samsung Advanced Institute of Technology (SAIT) AI Lab, Montréal
Sujets de recherche
Apprentissage profond
Causalité
Modèles génératifs
Modèles probabilistes
Optimisation
Théorie de l'apprentissage automatique
Traitement du langage naturel
Vision par ordinateur

Biographie

Simon Lacoste-Julien est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre cofondateur de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Il dirige également à temps partiel le SAIT AI Lab Montréal.

Ses recherches portent sur l'apprentissage automatique et les mathématiques appliquées, et intègrent des applications à la vision artificielle et au traitement du langage naturel. Il a obtenu une licence en mathématiques, physique et informatique à l’Université McGill, un doctorat en informatique à l’Université de Californie à Berkeley et un postdoctorat à l'Université de Cambridge.

Il a passé quelques années à l'Institut national de recherche en sciences et technologies du numérique (INRIA) et à l'École normale supérieure de Paris en tant que professeur de recherche avant de revenir à Montréal, en 2016, pour répondre à l'appel de Yoshua Bengio et contribuer à la croissance de l'écosystème de l'IA à Montréal.

Étudiants actuels

Visiteur de recherche indépendant - Samsung SAIT
Visiteur de recherche indépendant - Samsung SAIT
Collaborateur·rice alumni - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - Samsung SAIT
Visiteur de recherche indépendant - Samsung SAIT
Collaborateur·rice alumni - UdeM
Visiteur de recherche indépendant - Samsung SAIT
Collaborateur·rice de recherche - UdeM
Visiteur de recherche indépendant - Samsung SAIT
Visiteur de recherche indépendant - Seoul National University, Korea
Visiteur de recherche indépendant - UdeM
Visiteur de recherche indépendant - Pohang University of Science and Technology in Pohang, Korea
Collaborateur·rice de recherche
Maîtrise recherche - UdeM
Visiteur de recherche indépendant - Samsung SAIT
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Visiteur de recherche indépendant - Samsung SAIT

Publications

Differentiable Causal Discovery from Interventional Data
Philippe Brouillard
Sébastien Lachapelle
Alexandre Lacoste
Discovering causal relationships in data is a challenging task that involves solving a combinatorial problem for which the solution is not a… (voir plus)lways identifiable. A new line of work reformulates the combinatorial problem as a continuous constrained optimization one, enabling the use of different powerful optimization techniques. However, methods based on this idea do not yet make use of interventional data, which can significantly alleviate identifiability issues. In this work, we propose a neural network-based method for this task that can leverage interventional data. We illustrate the flexibility of the continuous-constrained framework by taking advantage of expressive neural architectures such as normalizing flows. We show that our approach compares favorably to the state of the art in a variety of settings, including perfect and imperfect interventions for which the targeted nodes may even be unknown.
Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
Si Yi Meng
Sharan Vaswani
Issam Hadj Laradji
Mark Schmidt
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied b… (voir plus)y over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.
GAIT: A Geometric Approach to Information Theory
Jose Gallego-Posada
Ankit Vani
Max Schwarzer
We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities … (voir plus)between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our proposed divergence exhibits performance on par with state-of-the-art methods based on the Wasserstein distance, but enjoys a closed-form expression that can be computed efficiently. We demonstrate the versatility of our method via experiments on a broad range of domains: training generative models, computing image barycenters, approximating empirical measures and counting modes.
How to make your optimizer generalize better
Sharan Vaswani
Reza Babenzhad
Sait AI Lab
Montreal.
Jose Gallego
Aaron Mishkin
We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and… (voir plus) over-parametrized regimes. For over-parameterized linear regression, where there are infinitely many interpolating solutions, different optimization methods can converge to solutions with varying generalization performance. In this setting, we show that projections onto linear spans can be used to move between solutions. Furthermore, via a simple reparameterization, we can ensure that an arbitrary optimizer converges to the minimum (cid:96) 2 -norm solution with favourable generalization properties. For under-parameterized linear clas-sification, optimizers can converge to different decision boundaries separating the data. We prove that for any such classifier, there exists a family of quadratic norms (cid:107)·(cid:107) P such that the classifier’s direction is the same as that of the maximum P -margin solution. We argue that analyzing convergence to the standard maximum (cid:96) 2 -margin is arbitrary and show that minimizing the norm induced by the data can result in better generalization. We validate our theoretical results via experiments on synthetic and real datasets.
G RADIENT -B ASED N EURAL DAG L EARNING WITH I NTERVENTIONS
Philippe Brouillard
Sébastien Lachapelle
Alexandre Lacoste
Decision making based on statistical association alone can be a dangerous endeavor due to non-causal associations. Ideally, one would rely o… (voir plus)n causal relationships that enable reasoning about the effect of interventions. Several methods have been proposed to discover such relationships from observational and inter-ventional data. Among them, GraN-DAG, a method that relies on the constrained optimization of neural networks, was shown to produce state-of-the-art results among algorithms relying purely on observational data. However, it is limited to observational data and cannot make use of interventions. In this work, we extend GraN-DAG to support interventional data and show that this improves its ability to infer causal structures
Stochastic Hamiltonian Gradient Methods for Smooth Games
Nicolas Loizou
Hugo Berard
Alexia Jolicoeur-Martineau
The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the c… (voir plus)lass of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.
A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games
We consider differentiable games where the goal is to find a Nash equilibrium. The machine learning community has recently started using v… (voir plus)ariants of the gradient method ( GD ). Prime examples are extragradient ( EG ), the optimistic gradient method ( OG ) and consensus optimization ( CO ), which enjoy linear convergence in cases like bilinear games, where the standard GD fails. The full bene-fits of theses relatively new methods are not known as there is no unified analysis for both strongly monotone and bilinear games. We provide new analyses of the EG ’s local and global convergence properties and use is to get a tighter global convergence rate for OG and CO . Our analysis covers the whole range of settings between bilinear and strongly monotone games. It reveals that these methods converges via different mechanisms at these extremes; in between, it exploits the most favorable mechanism for the given problem. We then prove that EG achieves the optimal rate for a wide class of algorithms with any number of extrapolations. Our tight analysis of EG ’s convergence rate in games shows that, unlike in convex minimization, EG may be much faster than GD .
A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games.
Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
S. Meng
Sharan Vaswani
Issam Hadj Laradji
Mark Schmidt
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied b… (voir plus)y over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.
GAIT: A Geometric Approach to Information Theory
Jose Gallego-Posada
Ankit Vani
Max Schwarzer
We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities … (voir plus)between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our proposed divergence exhibits performance on par with state-of-the-art methods based on the Wasserstein distance, but enjoys a closed-form expression that can be computed efficiently. We demonstrate the versatility of our method via experiments on a broad range of domains: training generative models, computing image barycenters, approximating empirical measures and counting modes.
Negative Momentum for Improved Game Dynamics
Reyhane Askari Hemmat
Mohammad Pezeshki
Gabriel Huang
Rémi LE PRIOL
Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiab… (voir plus)le games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optimization, game dynamics are more complex and less understood. In this paper, we analyze gradient-based methods with momentum on simple games. We prove that alternating updates are more stable than simultaneous updates. Next, we show both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.
Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information
Eric P. Larsen
Sébastien Lachapelle
Andrea Lodi
This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a method… (voir plus)ology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming, where the second stage is demanding computationally. We aim to predict at a high speed the expected TDOS associated with the second-stage problem, conditionally on the first-stage variables. This may be used in support of the solution to the overall two-stage problem by avoiding the online generation of multiple second-stage scenarios and solutions. We formulate the tactical prediction problem as a stochastic optimal prediction program, whose solution we approximate with supervised machine learning. The training data set consists of a large number of deterministic operational problems generated by controlled probabilistic sampling. The labels are computed based on solutions to these problems (solved independently and offline), employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application on load planning for rail transportation show that deep learning models produce accurate predictions in very short computing time (milliseconds or less). The predictive accuracy is close to the lower bounds calculated based on sample average approximation of the stochastic prediction programs.