Simon Lacoste-Julien

Membre académique principal

Chaire en IA Canada-CIFAR

Directeur scientifique adjoint, Mila, Professeur agrégé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Vice-président et directeur de laboratoire, Samsung Advanced Institute of Technology (SAIT) AI Lab, Montréal

Site web

Google Scholar

Biographie

Simon Lacoste-Julien est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre cofondateur de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Il dirige également à temps partiel le SAIT AI Lab Montréal.

Ses recherches portent sur l'apprentissage automatique et les mathématiques appliquées, et intègrent des applications à la vision artificielle et au traitement du langage naturel. Il a obtenu une licence en mathématiques, physique et informatique à l’Université McGill, un doctorat en informatique à l’Université de Californie à Berkeley et un postdoctorat à l'Université de Cambridge.

Il a passé quelques années à l'Institut national de recherche en sciences et technologies du numérique (INRIA) et à l'École normale supérieure de Paris en tant que professeur de recherche avant de revenir à Montréal, en 2016, pour répondre à l'appel de Yoshua Bengio et contribuer à la croissance de l'écosystème de l'IA à Montréal.

Étudiants actuels

Alexia Jolicoeur-Martineau

Visiteur de recherche indépendant - Samsung SAIT

Doctorat - Université de Montréal

antonio-miguel.gois@mila.quebec

Visiteur de recherche indépendant - Samsung SAIT

Visiteur de recherche indépendant - Université de Montréal

Visiteur de recherche indépendant - Samsung SAIT

Doctorat - McGill University

Superviseur⋅e principal⋅e :

Adam M. Oberman

george.orfanides@mila.quebec

Jaewoo Lee

Visiteur de recherche indépendant - Pohang University of Science and Technology in Pohang, Korea

jaewoo.lee@mila.quebec

Jose Gallego Posada

Doctorat - Université de Montréal

Doctorat - Université de Montréal

juan.ramirez@mila.quebec

Site web

Github

Google Scholar

Kiho Cho

Visiteur de recherche indépendant - Samsung SAIT

kiho.cho@mila.quebec

Kwon Kisoo

Visiteur de recherche indépendant - Seoul National University, Korea

kwon.kisoo@mila.quebec

Lucas Maes

Doctorat - Université de Montréal

lucas.maes@mila.quebec

Site web

Github

Mansi Rankawat

Doctorat - Université de Montréal

mansi.rankawat@mila.quebec

Visiteur de recherche indépendant - Samsung SAIT

marwa.el-halabi@mila.quebec

Collaborateur·rice de recherche - Université de Montréal

merajhse@mila.quebec

Github

Michelle Liu

Collaborateur·rice de recherche

liumiche@mila.quebec

Motahareh Sohrabi

Maîtrise recherche - Université de Montréal

motahareh.sohrabi@mila.quebec

Site web

Pedram Khorsandi

Doctorat - Université de Montréal

pedram.khorsandi@mila.quebec

Github

Google Scholar

Quentin Bertrand

Postdoctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Gauthier Gidel

quentin.bertrand@mila.quebec

Site web

Github

Google Scholar

Reza Babanezhad Harikandeh

Visiteur de recherche indépendant - Samsung SAIT

babanezr@mila.quebec

Rozhin Nobahari

Maîtrise recherche - Université de Montréal

rozhin.nobahari@mila.quebec

Sébastien Lachapelle

Doctorat - Université de Montréal

lachaseb@mila.quebec

Site web

Google Scholar

Helen Zhang

Doctorat - Université de Montréal

tianyue.zhang@mila.quebec

Site web

Github

Vitoria Barin Pacela

Doctorat - Université de Montréal

vitoria.barin-pacela@mila.quebec

Site web

Github

Google Scholar

Yan Zhang

Visiteur de recherche indépendant - Samsung SAIT

yan.zhang@mila.quebec

Site web

Github

Google Scholar

Yash Goyal

Visiteur de recherche indépendant - Samsung SAIT

yash.goyal@mila.quebec

Site web

Billets de blogue

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

18 mars 2024

Décodeurs additifs pour l’identification des variables latentes et l’extrapolation du produit cartésien

par

Sébastien Lachapelle

Divyat Mahajan

Ioannis Mitliagkas

Simon Lacoste-Julien

Lire l'article

Publications

An Analysis of the Adaptation Speed of Causal Models

Rémi LE PRIOL

Reza Babanezhad Harikandeh

Yoshua Bengio

Simon Lacoste-Julien

2021-01-01

AISTATS (publié)

proceedings.mlr.press

arxiv.org

Implicit Regularization in Deep Learning: A View from Function Space

Aristide Baratin

Thomas George

César Laurent

2020-08-03

ArXiv (prépublication)

arxiv.org

Implicit Regularization in Deep Learning: A View from Function Space

Aristide Baratin

Thomas George

César Laurent

We approach the problem of implicit regularization in deep learning from a geometrical viewpoint. We highlight a possible regularization eff… (voir plus)ect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions. By extrapolating a new analysis of Rademacher complexity bounds in linear models, we propose and study a new heuristic complexity measure for neural networks which captures this phenomenon, in terms of sequences of tangent kernel classes along in the learning trajectories.

2020-08-03

ArXiv (preprint)

arxiv.org

To Each Optimizer a Norm, To Each Norm its Generalization

Sharan Vaswani

Reza Babanezhad Harikandeh

Jose Gallego

Aaron Mishkin

Simon Lacoste-Julien

Nicolas Le Roux

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and… (voir plus) over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoning, we prove that for over-parameterized linear regression, projections onto linear spans can be used to move between different interpolating solutions. For under-parameterized linear classification, we prove that for any linear classifier separating the data, there exists a family of quadratic norms ||.||_P such that the classifier's direction is the same as that of the maximum P-margin solution. For linear classification, we argue that analyzing convergence to the standard maximum l2-margin is arbitrary and show that minimizing the norm induced by the data results in better generalization. Furthermore, for over-parameterized linear classification, projections onto the data-span enable us to use techniques from the under-parameterized setting. On the empirical side, we propose techniques to bias optimizers towards better generalizing solutions, improving their test performance. We validate our theoretical results via synthetic experiments, and use the neural tangent kernel to handle non-linear models.

2020-06-11

ArXiv (prépublication)

arxiv.org

Accelerating Smooth Games by Manipulating Spectral Shapes

Waiss Azizian

Damien Scieur

Ioannis Mitliagkas

Simon Lacoste-Julien

Gauthier Gidel

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

arxiv.org

An Analysis of the Adaptation Speed of Causal Models

Rémi LE PRIOL

Reza Babanezhad Harikandeh

Yoshua Bengio

Simon Lacoste-Julien

We consider the problem of discovering the causal process that generated a collection of datasets. We assume that all these datasets were ge… (voir plus)nerated by unknown sparse interventions on a structural causal model (SCM)

2020-05-18

ArXiv (preprint)

arxiv.org

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Nicolas Loizou

Sharan Vaswani

Issam Hadj Laradji

Simon Lacoste-Julien

We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing… (voir plus) the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.

2020-02-24

ArXiv (preprint)

arxiv.org

Accelerating Smooth Games by Manipulating Spectral Shapes

Waiss Azizian

Damien Scieur

Ioannis Mitliagkas

Simon Lacoste-Julien

Gauthier Gidel

We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set co… (voir plus)ntaining all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges in solving smooth games. In this framework, we describe gradient-based methods, such as extragradient, as transformations on the spectral shape. Using this perspective, we propose an optimal algorithm for bilinear games. For smooth and strongly monotone operators, we identify a continuum between convex minimization, where acceleration is possible using Polyak's momentum, and the worst case where gradient descent is optimal. Finally, going beyond first-order methods, we propose an accelerated version of consensus optimization.

2020-01-02

ArXiv (preprint)

arxiv.org

Differentiable Causal Discovery from Interventional Data

Philippe Brouillard

Sébastien Lachapelle

Alexandre Lacoste

Simon Lacoste-Julien

Alexandre Drouin

Discovering causal relationships in data is a challenging task that involves solving a combinatorial problem for which the solution is not a… (voir plus)lways identifiable. A new line of work reformulates the combinatorial problem as a continuous constrained optimization one, enabling the use of different powerful optimization techniques. However, methods based on this idea do not yet make use of interventional data, which can significantly alleviate identifiability issues. In this work, we propose a neural network-based method for this task that can leverage interventional data. We illustrate the flexibility of the continuous-constrained framework by taking advantage of expressive neural architectures such as normalizing flows. We show that our approach compares favorably to the state of the art in a variety of settings, including perfect and imperfect interventions for which the targeted nodes may even be unknown.

arxiv.org

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation

Si Yi Meng

Sharan Vaswani

Issam Hadj Laradji

Mark Schmidt

Simon Lacoste-Julien

We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied b… (voir plus)y over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel mapping. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.

2020-01-01

AISTATS (publié)

proceedings.mlr.press

arxiv.org

GAIT: A Geometric Approach to Information Theory

Jose Gallego-Posada

Ankit Vani

Max Schwarzer

Simon Lacoste-Julien

We advocate the use of a notion of entropy that reflects the relative abundances of the symbols in an alphabet, as well as the similarities … (voir plus)between them. This concept was originally introduced in theoretical ecology to study the diversity of ecosystems. Based on this notion of entropy, we introduce geometry-aware counterparts for several concepts and theorems in information theory. Notably, our proposed divergence exhibits performance on par with state-of-the-art methods based on the Wasserstein distance, but enjoys a closed-form expression that can be computed efficiently. We demonstrate the versatility of our method via experiments on a broad range of domains: training generative models, computing image barycenters, approximating empirical measures and counting modes.

2020-01-01

AISTATS (publié)

proceedings.mlr.press

arxiv.org

How to make your optimizer generalize better

Sharan Vaswani

Reza Babenzhad

Sait AI Lab

Montreal.

Jose Gallego

Aaron Mishkin

Simon Lacoste-Julien

Nicolas Le Roux

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and… (voir plus) over-parametrized regimes. For over-parameterized linear regression, where there are inﬁnitely many interpolating solutions, different optimization methods can converge to solutions with varying generalization performance. In this setting, we show that projections onto linear spans can be used to move between solutions. Furthermore, via a simple reparameterization, we can ensure that an arbitrary optimizer converges to the minimum (cid:96) 2 -norm solution with favourable generalization properties. For under-parameterized linear clas-siﬁcation, optimizers can converge to different decision boundaries separating the data. We prove that for any such classiﬁer, there exists a family of quadratic norms (cid:107)·(cid:107) P such that the classiﬁer’s direction is the same as that of the maximum P -margin solution. We argue that analyzing convergence to the standard maximum (cid:96) 2 -margin is arbitrary and show that minimizing the norm induced by the data can result in better generalization. We validate our theoretical results via experiments on synthetic and real datasets.

2020-01-01

(publié)

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Simon Lacoste-Julien

Biographie

Étudiants actuels

Billets de blogue

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Simon Lacoste-Julien

Biographie

Étudiants actuels

Billets de blogue

Publications