Simon Lacoste-Julien

Biographie

Simon Lacoste-Julien est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre cofondateur de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Il dirige également à temps partiel le SAIT AI Lab Montréal.

Ses recherches portent sur l'apprentissage automatique et les mathématiques appliquées, et intègrent des applications à la vision artificielle et au traitement du langage naturel. Il a obtenu une licence en mathématiques, physique et informatique à l’Université McGill, un doctorat en informatique à l’Université de Californie à Berkeley et un postdoctorat à l'Université de Cambridge.

Il a passé quelques années à l'Institut national de recherche en sciences et technologies du numérique (INRIA) et à l'École normale supérieure de Paris en tant que professeur de recherche avant de revenir à Montréal, en 2016, pour répondre à l'appel de Yoshua Bengio et contribuer à la croissance de l'écosystème de l'IA à Montréal.

Étudiants actuels

Reza Babanezhad Harikandeh

Visiteur de recherche indépendant - Samsung SAIT

Aristide Baratin

Visiteur de recherche indépendant - Samsung SAIT

Doctorat - UdeM

Visiteur de recherche indépendant - Samsung

Simon Dufort-Labbé

Doctorat - UdeM

Marwa El Halabi

Visiteur de recherche indépendant - Samsung SAIT

Doctorat - UdeM

Yash Goyal

Visiteur de recherche indépendant - Samsung SAIT

Meraj Hashemizadeh

Collaborateur·rice de recherche - UdeM

Fahimeh HosseiniNoohdani

Collaborateur·rice de recherche - UdeM

Doctorat - UdeM

Visiteur de recherche indépendant - UdeM

Visiteur de recherche indépendant - Samsung - SAIT

Lucas Maes

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice alumni - UdeM

Doctorat - UdeM

Doctorat - UdeM

Visiteur de recherche indépendant - Univeristy of Tübingen

Theo Saulus

Doctorat - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

Damien Scieur

Visiteur de recherche indépendant - Samsung SAIT

Motahareh Sohrabi

Collaborateur·rice de recherche - UdeM

Helen Zhang

Doctorat - UdeM

Yan Zhang

Visiteur de recherche indépendant - Samsung SAIT

Décodeurs additifs pour l’identification des variables latentes et l’extrapolation du produit cartésien

Billets de blogue

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

18 mars 2024

par

Sébastien Lachapelle

Divyat Mahajan

Ioannis Mitliagkas

Simon Lacoste-Julien

Lire l'article

Publications

Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise

Juan Ramirez

Constrained optimization is a powerful framework for enforcing requirements on neural networks. These constrained deep learning problems are… (voir plus) typically solved using first-order methods on their min-max Lagrangian formulation, but such approaches often suffer from oscillations and can fail to find all local solutions. While the Augmented Lagrangian method (ALM) addresses these issues, practitioners often favor dual optimistic ascent schemes (PI control) on the standard Lagrangian, which perform well empirically but lack formal guarantees. In this paper, we establish a previously unknown equivalence between these approaches: dual optimistic ascent on the Lagrangian is equivalent to gradient descent-ascent on the Augmented Lagrangian. This finding allows us to transfer the robust theoretical guarantees of the ALM to the dual optimistic setting, proving it converges linearly to all local solutions. Furthermore, the equivalence provides principled guidance for tuning the optimism hyper-parameter. Our work closes a critical gap between the empirical success of dual optimistic methods and their theoretical foundation.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

Operationalizing Quantized Disentanglement

Vitória Barin-Pacela

Kartik Ahuja

P Vincent

2025-10-31

arXiv (publié)

arxiv.org

Tight Lower Bounds and Improved Convergence in Performative Prediction

Performative prediction is a framework accounting for the shift in the data distribution induced by the prediction of a model deployed in th… (voir plus)e real world. Ensuring rapid convergence to a stable solution where the data distribution remains the same after the model deployment is crucial, especially in evolving environments. This paper extends the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers and enabling convergence to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that utilizing historical datasets can surpass the lower bound for last iterate RRM, and empirically observe faster convergence to the stable point on various performative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Minimizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our framework.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

Alexia Jolicoeur-Martineau

Understanding Adam Requires Better Rotation Dependent Assumptions

Lucas Maes

Tianyue H. Zhang

Alan Milligan

Ioannis Mitliagkas

Damien Scieur

Charles Guille-escuret

Despite its widespread adoption, Adam's advantage over Stochastic Gradient Descent (SGD) lacks a comprehensive theoretical explanation. This… (voir plus) paper investigates Adam's sensitivity to rotations of the parameter space. We observe that Adam's performance in training transformers degrades under random rotations of the parameter space, indicating a crucial sensitivity to the choice of basis in practice. This reveals that conventional rotation-invariant assumptions are insufficient to capture Adam's advantages theoretically. To better understand the rotation-dependent properties that benefit Adam, we also identify structured rotations that preserve or even enhance its empirical performance. We then examine the rotation-dependent assumptions in the literature and find that they fall short in explaining Adam's behaviour across various rotation types. In contrast, we verify the orthogonality of the update as a promising indicator of Adam's basis sensitivity, suggesting it may be the key quantity for developing rotation-dependent theoretical frameworks that better explain its empirical success.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

Quantized Disentanglement: A Practical Approach

Vitória Barin-Pacela

Kartik Ahuja

P Vincent

2025-06-08

ICML.cc/2025/Workshop/SIM (poster)

Performative Prediction on Games and Mechanism Design

Fernando P. Santos

2025-04-22

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

Accelerating Training with Neuron Interaction and Nowcasting Networks

Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e.g. Adam). However,… (voir plus) learnable update rules can be costly and unstable to train and use. Recently, Jang et al. (2023) proposed a simpler approach to accelerate training based on weight nowcaster networks (WNNs). In their approach, Adam is used for most of the optimization steps and periodically, only every few steps, a WNN nowcasts (predicts near future) parameters. We improve WNNs by proposing neuron interaction and nowcasting (NiNo) networks. In contrast to WNNs, NiNo leverages neuron connectivity and graph neural networks to more accurately nowcast parameters. We further show that in some networks, such as Transformers, modeling neuron connectivity accurately is challenging. We address this and other limitations, which allows NiNo to accelerate Adam training by up to 50% in vision and language tasks.

2025-01-21

ICLR.cc/2025/Conference (poster)

Feasible Learning

Juan Ramirez

Ignacio Hounie

Juan Elenter

Jose Gallego-Posada

Meraj Hashemizadeh

Alejandro Ribeiro

We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bound… (voir plus)s the loss for each training sample. In contrast to the ubiquitous Empirical Risk Minimization (ERM) framework, which optimizes for average performance, FL demands satisfactory performance \emph{on every individual data point}. Since any model that meets the prescribed performance threshold is a valid FL solution, the choice of optimization algorithm and its dynamics play a crucial role in shaping the properties of the resulting solutions. In particular, we study a primal-dual approach which dynamically re-weights the importance of each sample during training. To address the challenge of setting a meaningful threshold in practice, we introduce a relaxation of FL that incorporates slack variables of minimal norm. Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.

2025-01-21

aistats.org/AISTATS/2025/Conference (poster)

proceedings.mlr.press

On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

Motahareh Sohrabi

Juan Ramirez

Tianyue H. Zhang

Jose Gallego-Posada

Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problem… (voir plus)s are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the lack of reliable, general-purpose update schemes for the Lagrange multipliers. This paper proposes the

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

Reza Babanezhad Harikandeh

proceedings.mlr.press

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Pranshu Malviya

Goncalo Mordido

Aristide Baratin

Adaptive gradient-based optimizers, notably Adam, have left their mark in training large-scale deep learning models, offering fast convergen… (voir plus)ce and robustness to hyperparameter settings. However, they often struggle with generalization, attributed to their tendency to converge to sharp minima in the loss landscape. To address this, we propose a new memory-augmented version of Adam that encourages exploration towards flatter minima by incorporating a buffer of critical momentum terms during training. This buffer prompts the optimizer to overshoot beyond narrow minima, promoting exploration. Through comprehensive analysis in simple settings, we illustrate the efficacy of our approach in increasing exploration and bias towards flatter minima. We empirically demonstrate that it can improve model performance for image classification on ImageNet and CIFAR10/100, language modelling on Penn Treebank, and online learning tasks on TinyImageNet and 5-dataset. Our code is available at https://github.com/chandar-lab/CMOptimizer.

2024-06-08

TMLR (accepté)