Simon Lacoste-Julien

Core Academic Member

Canada CIFAR AI Chair

Associate Scientific Director, Mila, Associate Professor, Université de Montréal, Department of Computer Science and Operations Research

Vice President and Lab Director, Samsung Advanced Institute of Technology (SAIT) AI Lab, Montréal

Website

Google Scholar

Biography

Simon Lacoste-Julien is an associate professor at Mila – Quebec Artificial Intelligence Institute and in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He is also a Canada CIFAR AI Chair and heads (part time) the SAIT AI Lab Montréal.

Lacoste-Julien‘s research interests are machine learning and applied mathematics, along with their applications to computer vision and natural language processing. He completed a BSc in mathematics, physics and computer science at McGill University, a PhD in computer science at UC Berkeley and a postdoc at the University of Cambridge.

After spending several years as a researcher at INRIA and the École normale supérieure in Paris, he returned to his home city of Montréal in 2016 to answer Yoshua Bengio’s call to help grow the Montréal AI ecosystem.

Current Students

Alexia Jolicoeur-Martineau

Independent visiting researcher - Samsung SAIT

PhD - Université de Montréal

antonio-miguel.gois@mila.quebec

Independent visiting researcher - Samsung SAIT

Independent visiting researcher - Université de Montréal

Independent visiting researcher - Samsung SAIT

PhD - McGill University

Principal supervisor :

Adam M. Oberman

george.orfanides@mila.quebec

Jaewoo Lee

Independent visiting researcher - Pohang University of Science and Technology in Pohang, Korea

jaewoo.lee@mila.quebec

Jose Gallego Posada

PhD - Université de Montréal

PhD - Université de Montréal

juan.ramirez@mila.quebec

Website

Github

Google Scholar

Kiho Cho

Independent visiting researcher - Samsung SAIT

kiho.cho@mila.quebec

Kwon Kisoo

Independent visiting researcher - Seoul National University, Korea

kwon.kisoo@mila.quebec

Lucas Maes

PhD - Université de Montréal

lucas.maes@mila.quebec

Website

Github

Mansi Rankawat

PhD - Université de Montréal

mansi.rankawat@mila.quebec

Independent visiting researcher - Samsung SAIT

marwa.el-halabi@mila.quebec

Collaborating researcher - Université de Montréal

merajhse@mila.quebec

Github

Michelle Liu

Collaborating researcher

liumiche@mila.quebec

Motahareh Sohrabi

Master's Research - Université de Montréal

motahareh.sohrabi@mila.quebec

Website

Pedram Khorsandi

PhD - Université de Montréal

pedram.khorsandi@mila.quebec

Github

Google Scholar

Quentin Bertrand

Postdoctorate - Université de Montréal

Principal supervisor :

Gauthier Gidel

quentin.bertrand@mila.quebec

Website

Github

Google Scholar

Reza Babanezhad Harikandeh

Independent visiting researcher - Samsung SAIT

babanezr@mila.quebec

Rozhin Nobahari

Master's Research - Université de Montréal

rozhin.nobahari@mila.quebec

Sébastien Lachapelle

PhD - Université de Montréal

lachaseb@mila.quebec

Website

Google Scholar

Helen Zhang

PhD - Université de Montréal

tianyue.zhang@mila.quebec

Website

Github

Vitoria Barin Pacela

PhD - Université de Montréal

vitoria.barin-pacela@mila.quebec

Website

Github

Google Scholar

Yan Zhang

Independent visiting researcher - Samsung SAIT

yan.zhang@mila.quebec

Website

Github

Google Scholar

Yash Goyal

Independent visiting researcher - Samsung SAIT

yash.goyal@mila.quebec

Website

Blog Posts

March 18, 2024

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

Sébastien Lachapelle

Divyat Mahajan

Ioannis Mitliagkas

Simon Lacoste-Julien

Read the article

Publications

G RADIENT -B ASED N EURAL DAG L EARNING WITH I NTERVENTIONS

Philippe Brouillard

Alexandre Drouin

Sébastien Lachapelle

Alexandre Lacoste

Simon Lacoste-Julien

Decision making based on statistical association alone can be a dangerous endeavor due to non-causal associations. Ideally, one would rely o… (see more)n causal relationships that enable reasoning about the effect of interventions. Several methods have been proposed to discover such relationships from observational and inter-ventional data. Among them, GraN-DAG, a method that relies on the constrained optimization of neural networks, was shown to produce state-of-the-art results among algorithms relying purely on observational data. However, it is limited to observational data and cannot make use of interventions. In this work, we extend GraN-DAG to support interventional data and show that this improves its ability to infer causal structures

2020-01-01

(published)

www.semanticscholar.org

Stochastic Hamiltonian Gradient Methods for Smooth Games

Nicolas Loizou

Hugo Berard

Alexia Jolicoeur-Martineau

Pascal Vincent

Simon Lacoste-Julien

Ioannis Mitliagkas

The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the c… (see more)lass of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.

2020-01-01

ICML (published)

proceedings.mlr.press

arxiv.org

A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games.

Waiss Azizian

Ioannis Mitliagkas

Simon Lacoste-Julien

Gauthier Gidel

2020-01-01

International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games

Waiss Azizian

Ioannis Mitliagkas

Simon Lacoste-Julien

Gauthier Gidel

We consider diﬀerentiable games where the goal is to ﬁnd a Nash equilibrium. The machine learning community has recently started using v… (see more)ariants of the gradient method ( GD ). Prime examples are extragradient ( EG ), the optimistic gradient method ( OG ) and consensus optimization ( CO ), which enjoy linear convergence in cases like bilinear games, where the standard GD fails. The full bene-ﬁts of theses relatively new methods are not known as there is no uniﬁed analysis for both strongly monotone and bilinear games. We provide new analyses of the EG ’s local and global convergence properties and use is to get a tighter global convergence rate for OG and CO . Our analysis covers the whole range of settings between bilinear and strongly monotone games. It reveals that these methods converges via diﬀerent mechanisms at these extremes; in between, it exploits the most favorable mechanism for the given problem. We then prove that EG achieves the optimal rate for a wide class of algorithms with any number of extrapolations. Our tight analysis of EG ’s convergence rate in games shows that, unlike in convex minimization, EG may be much faster than GD .

2020-01-01

International Conference on Artificial Intelligence and Statistics (published)

dblp.uni-trier.de

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

S. Meng

Sharan Vaswani

Issam Hadj Laradji

Mark Schmidt

Simon Lacoste-Julien

We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied b… (see more)y over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.

2019-10-11

ArXiv (preprint)

arxiv.org

GAIT: A Geometric Approach to Information Theory

Jose Gallego-Posada

Ankit Vani

Max Schwarzer

Simon Lacoste-Julien

We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities … (see more)between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our proposed divergence exhibits performance on par with state-of-the-art methods based on the Wasserstein distance, but enjoys a closed-form expression that can be computed efficiently. We demonstrate the versatility of our method via experiments on a broad range of domains: training generative models, computing image barycenters, approximating empirical measures and counting modes.

2019-06-19

ArXiv (preprint)

arxiv.org

Negative Momentum for Improved Game Dynamics

Gauthier Gidel

Reyhane Askari Hemmat

Mohammad Pezeshki

Gabriel Huang

Rémi LE PRIOL

Simon Lacoste-Julien

Ioannis Mitliagkas

Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiab… (see more)le games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optimization, game dynamics are more complex and less understood. In this paper, we analyze gradient-based methods with momentum on simple games. We prove that alternating updates are more stable than simultaneous updates. Next, we show both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.

2019-04-11

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

arxiv.org

Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

Eric P. Larsen

Sébastien Lachapelle

This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a method… (see more)ology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming, where the second stage is demanding computationally. We aim to predict at a high speed the expected TDOS associated with the second-stage problem, conditionally on the first-stage variables. This may be used in support of the solution to the overall two-stage problem by avoiding the online generation of multiple second-stage scenarios and solutions. We formulate the tactical prediction problem as a stochastic optimal prediction program, whose solution we approximate with supervised machine learning. The training data set consists of a large number of deterministic operational problems generated by controlled probabilistic sampling. The labels are computed based on solutions to these problems (solved independently and offline), employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application on load planning for rail transportation show that deep learning models produce accurate predictions in very short computing time (milliseconds or less). The predictive accuracy is close to the lower bounds calculated based on sample average approximation of the stochastic prediction programs.

2018-07-31

ArXiv (preprint)

doi.org

arxiv.org

Negative Momentum for Improved Game Dynamics

Gauthier Gidel

Reyhane Askari Hemmat

Mohammad Pezeshki

Gabriel Huang

Rémi LE PRIOL

Simon Lacoste-Julien

Ioannis Mitliagkas

2018-07-12

ArXiv (preprint)

arxiv.org

Frank-Wolfe Splitting via Augmented Lagrangian Method

Gauthier Gidel

Fabian Pedregosa

Simon Lacoste-Julien

Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than mini… (see more)mizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets, and only require access to the oracle on the individual constraints. In this work, we develop and analyze the Frank-Wolfe Augmented Lagrangian (FW-AL) algorithm, a method for minimizing a smooth function over convex compact sets related by a "linear consistency" constraint that only requires access to a linear minimization oracle over the individual constraints. It is based on the Augmented Lagrangian Method (ALM), also known as Method of Multipliers, but unlike most existing splitting methods, it only requires access to linear (instead of quadratic) minimization oracles. We use recent advances in the analysis of Frank-Wolfe and the alternating direction method of multipliers algorithms to prove a sublinear convergence rate for FW-AL over general convex compact sets and a linear convergence rate for polytopes.

2018-03-31

ArXiv (preprint)

arxiv.org

Frank-Wolfe Splitting via Augmented Lagrangian Method

Gauthier Gidel

Fabian Pedregosa

Simon Lacoste-Julien

2018-01-01

AISTATS (published)

proceedings.mlr.press

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Simon Lacoste-Julien

Biography

Current Students

Blog Posts

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Simon Lacoste-Julien

Biography

Current Students

Blog Posts

Publications