Simon Lacoste-Julien

Core Academic Member

Canada CIFAR AI Chair

Associate Scientific Director, Mila, Associate Professor, Université de Montréal, Department of Computer Science and Operations Research

Vice President and Lab Director, Samsung Advanced Institute of Technology (SAIT) AI Lab, Montréal

Website

Google Scholar

Biography

Simon Lacoste-Julien is an associate professor at Mila – Quebec Artificial Intelligence Institute and in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He is also a Canada CIFAR AI Chair and heads (part time) the SAIT AI Lab Montréal.

Lacoste-Julien‘s research interests are machine learning and applied mathematics, along with their applications to computer vision and natural language processing. He completed a BSc in mathematics, physics and computer science at McGill University, a PhD in computer science at UC Berkeley and a postdoc at the University of Cambridge.

After spending several years as a researcher at INRIA and the École normale supérieure in Paris, he returned to his home city of Montréal in 2016 to answer Yoshua Bengio’s call to help grow the Montréal AI ecosystem.

Current Students

Alexia Jolicoeur-Martineau

Independent visiting researcher - Samsung SAIT

PhD - Université de Montréal

antonio-miguel.gois@mila.quebec

Independent visiting researcher - Samsung SAIT

Independent visiting researcher - Université de Montréal

Independent visiting researcher - Samsung SAIT

PhD - McGill University

Principal supervisor :

Adam M. Oberman

george.orfanides@mila.quebec

Jaewoo Lee

Independent visiting researcher - Pohang University of Science and Technology in Pohang, Korea

jaewoo.lee@mila.quebec

Jose Gallego Posada

PhD - Université de Montréal

PhD - Université de Montréal

juan.ramirez@mila.quebec

Website

Github

Google Scholar

Kiho Cho

Independent visiting researcher - Samsung SAIT

kiho.cho@mila.quebec

Kwon Kisoo

Independent visiting researcher - Seoul National University, Korea

kwon.kisoo@mila.quebec

Lucas Maes

PhD - Université de Montréal

lucas.maes@mila.quebec

Website

Github

Mansi Rankawat

PhD - Université de Montréal

mansi.rankawat@mila.quebec

Independent visiting researcher - Samsung SAIT

marwa.el-halabi@mila.quebec

Collaborating researcher - Université de Montréal

merajhse@mila.quebec

Github

Michelle Liu

Collaborating researcher

liumiche@mila.quebec

Motahareh Sohrabi

Master's Research - Université de Montréal

motahareh.sohrabi@mila.quebec

Website

Pedram Khorsandi

PhD - Université de Montréal

pedram.khorsandi@mila.quebec

Github

Google Scholar

Quentin Bertrand

Postdoctorate - Université de Montréal

Principal supervisor :

Gauthier Gidel

quentin.bertrand@mila.quebec

Website

Github

Google Scholar

Reza Babanezhad Harikandeh

Independent visiting researcher - Samsung SAIT

babanezr@mila.quebec

Rozhin Nobahari

Master's Research - Université de Montréal

rozhin.nobahari@mila.quebec

Sébastien Lachapelle

PhD - Université de Montréal

lachaseb@mila.quebec

Website

Google Scholar

Helen Zhang

PhD - Université de Montréal

tianyue.zhang@mila.quebec

Website

Github

Vitoria Barin Pacela

PhD - Université de Montréal

vitoria.barin-pacela@mila.quebec

Website

Github

Google Scholar

Yan Zhang

Independent visiting researcher - Samsung SAIT

yan.zhang@mila.quebec

Website

Github

Google Scholar

Yash Goyal

Independent visiting researcher - Samsung SAIT

yash.goyal@mila.quebec

Website

Blog Posts

March 18, 2024

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

Sébastien Lachapelle

Divyat Mahajan

Ioannis Mitliagkas

Simon Lacoste-Julien

Read the article

Publications

An Analysis of the Adaptation Speed of Causal Models

Rémi LE PRIOL

Reza Babanezhad Harikandeh

Yoshua Bengio

Simon Lacoste-Julien

2021-01-01

AISTATS (published)

proceedings.mlr.press

arxiv.org

Implicit Regularization in Deep Learning: A View from Function Space

Aristide Baratin

Thomas George

César Laurent

2020-08-03

ArXiv (preprint)

arxiv.org

Implicit Regularization in Deep Learning: A View from Function Space

Aristide Baratin

Thomas George

César Laurent

We approach the problem of implicit regularization in deep learning from a geometrical viewpoint. We highlight a possible regularization eff… (see more)ect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions. By extrapolating a new analysis of Rademacher complexity bounds in linear models, we propose and study a new heuristic complexity measure for neural networks which captures this phenomenon, in terms of sequences of tangent kernel classes along in the learning trajectories.

2020-08-03

ArXiv (preprint)

arxiv.org

To Each Optimizer a Norm, To Each Norm its Generalization

Sharan Vaswani

Reza Babanezhad Harikandeh

Jose Gallego

Aaron Mishkin

Simon Lacoste-Julien

Nicolas Le Roux

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and… (see more) over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoning, we prove that for over-parameterized linear regression, projections onto linear spans can be used to move between different interpolating solutions. For under-parameterized linear classification, we prove that for any linear classifier separating the data, there exists a family of quadratic norms ||.||_P such that the classifier's direction is the same as that of the maximum P-margin solution. For linear classification, we argue that analyzing convergence to the standard maximum l2-margin is arbitrary and show that minimizing the norm induced by the data results in better generalization. Furthermore, for over-parameterized linear classification, projections onto the data-span enable us to use techniques from the under-parameterized setting. On the empirical side, we propose techniques to bias optimizers towards better generalizing solutions, improving their test performance. We validate our theoretical results via synthetic experiments, and use the neural tangent kernel to handle non-linear models.

2020-06-11

ArXiv (preprint)

arxiv.org

Accelerating Smooth Games by Manipulating Spectral Shapes

Waiss Azizian

Damien Scieur

Ioannis Mitliagkas

Simon Lacoste-Julien

Gauthier Gidel

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

arxiv.org

An Analysis of the Adaptation Speed of Causal Models

Rémi LE PRIOL

Reza Babanezhad Harikandeh

Yoshua Bengio

Simon Lacoste-Julien

We consider the problem of discovering the causal process that generated a collection of datasets. We assume that all these datasets were ge… (see more)nerated by unknown sparse interventions on a structural causal model (SCM)

2020-05-18

ArXiv (preprint)

arxiv.org

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Nicolas Loizou

Sharan Vaswani

Issam Hadj Laradji

Simon Lacoste-Julien

We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing… (see more) the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.

2020-02-24

ArXiv (preprint)

arxiv.org

Accelerating Smooth Games by Manipulating Spectral Shapes

Waiss Azizian

Damien Scieur

Ioannis Mitliagkas

Simon Lacoste-Julien

Gauthier Gidel

We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set co… (see more)ntaining all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges in solving smooth games. In this framework, we describe gradient-based methods, such as extragradient, as transformations on the spectral shape. Using this perspective, we propose an optimal algorithm for bilinear games. For smooth and strongly monotone operators, we identify a continuum between convex minimization, where acceleration is possible using Polyak's momentum, and the worst case where gradient descent is optimal. Finally, going beyond first-order methods, we propose an accelerated version of consensus optimization.

2020-01-02

ArXiv (preprint)

arxiv.org

Differentiable Causal Discovery from Interventional Data

Philippe Brouillard

Sébastien Lachapelle

Alexandre Lacoste

Simon Lacoste-Julien

Alexandre Drouin

Discovering causal relationships in data is a challenging task that involves solving a combinatorial problem for which the solution is not a… (see more)lways identifiable. A new line of work reformulates the combinatorial problem as a continuous constrained optimization one, enabling the use of different powerful optimization techniques. However, methods based on this idea do not yet make use of interventional data, which can significantly alleviate identifiability issues. In this work, we propose a neural network-based method for this task that can leverage interventional data. We illustrate the flexibility of the continuous-constrained framework by taking advantage of expressive neural architectures such as normalizing flows. We show that our approach compares favorably to the state of the art in a variety of settings, including perfect and imperfect interventions for which the targeted nodes may even be unknown.

arxiv.org

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

Si Yi Meng

Sharan Vaswani

Issam Hadj Laradji

Mark Schmidt

Simon Lacoste-Julien

We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied b… (see more)y over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.

2020-01-01

AISTATS (published)

proceedings.mlr.press

arxiv.org

GAIT: A Geometric Approach to Information Theory

Jose Gallego-Posada

Ankit Vani

Max Schwarzer

Simon Lacoste-Julien

We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities … (see more)between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our proposed divergence exhibits performance on par with state-of-the-art methods based on the Wasserstein distance, but enjoys a closed-form expression that can be computed efficiently. We demonstrate the versatility of our method via experiments on a broad range of domains: training generative models, computing image barycenters, approximating empirical measures and counting modes.

2020-01-01

AISTATS (published)

proceedings.mlr.press

arxiv.org

How to make your optimizer generalize better

Sharan Vaswani

Reza Babenzhad

Sait AI Lab

Montreal.

Jose Gallego

Aaron Mishkin

Simon Lacoste-Julien

Nicolas Le Roux

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and… (see more) over-parametrized regimes. For over-parameterized linear regression, where there are inﬁnitely many interpolating solutions, different optimization methods can converge to solutions with varying generalization performance. In this setting, we show that projections onto linear spans can be used to move between solutions. Furthermore, via a simple reparameterization, we can ensure that an arbitrary optimizer converges to the minimum (cid:96) 2 -norm solution with favourable generalization properties. For under-parameterized linear clas-siﬁcation, optimizers can converge to different decision boundaries separating the data. We prove that for any such classiﬁer, there exists a family of quadratic norms (cid:107)·(cid:107) P such that the classiﬁer’s direction is the same as that of the maximum P -margin solution. We argue that analyzing convergence to the standard maximum (cid:96) 2 -margin is arbitrary and show that minimizing the norm induced by the data can result in better generalization. We validate our theoretical results via experiments on synthetic and real datasets.

2020-01-01

(published)

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Simon Lacoste-Julien

Biography

Current Students

Blog Posts

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Simon Lacoste-Julien

Biography

Current Students

Blog Posts

Publications