Simon Lacoste-Julien

Biographie

Simon Lacoste-Julien est professeur agrégé au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre cofondateur de Mila – Institut québécois d’intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Il dirige également à temps partiel le SAIT AI Lab Montréal.

Ses recherches portent sur l'apprentissage automatique et les mathématiques appliquées, et intègrent des applications à la vision artificielle et au traitement du langage naturel. Il a obtenu une licence en mathématiques, physique et informatique à l’Université McGill, un doctorat en informatique à l’Université de Californie à Berkeley et un postdoctorat à l'Université de Cambridge.

Il a passé quelques années à l'Institut national de recherche en sciences et technologies du numérique (INRIA) et à l'École normale supérieure de Paris en tant que professeur de recherche avant de revenir à Montréal, en 2016, pour répondre à l'appel de Yoshua Bengio et contribuer à la croissance de l'écosystème de l'IA à Montréal.

Étudiants actuels

Reza Babanezhad Harikandeh

Visiteur de recherche indépendant - Samsung SAIT

Aristide Baratin

Visiteur de recherche indépendant - Samsung SAIT

Doctorat - UdeM

Visiteur de recherche indépendant - Samsung

Marwa El Halabi

Visiteur de recherche indépendant - Samsung SAIT

Doctorat - UdeM

Yash Goyal

Visiteur de recherche indépendant - Samsung SAIT

Meraj Hashemizadeh

Collaborateur·rice de recherche - UdeM

Fahimeh HosseiniNoohdani

Collaborateur·rice de recherche - UdeM

Alexia Jolicoeur-Martineau

Visiteur de recherche indépendant - Samsung SAIT

Doctorat - UdeM

Visiteur de recherche indépendant - UdeM

Visiteur de recherche indépendant - Samsung - SAIT

Lucas Maes

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice alumni - UdeM

Doctorat - UdeM

Doctorat - UdeM

Visiteur de recherche indépendant - Univeristy of Tübingen

Theo Saulus

Doctorat - UdeM

Co-superviseur⋅e :

Dhanya Sridhar

Damien Scieur

Visiteur de recherche indépendant - Samsung SAIT

Motahareh Sohrabi

Collaborateur·rice alumni - UdeM

Helen Zhang

Doctorat - UdeM

Yan Zhang

Visiteur de recherche indépendant - Samsung SAIT

Décodeurs additifs pour l’identification des variables latentes et l’extrapolation du produit cartésien

Billets de blogue

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

18 mars 2024

par

Sébastien Lachapelle

Divyat Mahajan

Ioannis Mitliagkas

Simon Lacoste-Julien

Lire l'article

Publications

SVRG meets AdaGrad: painless variance reduction

Benjamin Dubois-Taine

Sharan Vaswani

Reza Babanezhad Harikandeh

Mark Schmidt

2022-11-10

Machine Learning (publié)

Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints

Jose Gallego-Posada

The performance of trained neural networks is robust to harsh levels of pruning. Coupled with the ever-growing size of deep learning models,… (voir plus) this observation has motivated extensive research on learning sparse models. In this work, we focus on the task of controlling the level of sparsity when performing sparse learning. Existing methods based on sparsity-inducing penalties involve expensive trial-and-error tuning of the penalty factor, thus lacking direct control of the resulting model sparsity. In response, we adopt a constrained formulation: using the gate mechanism proposed by Louizos et al. (2018), we formulate a constrained optimization problem where sparsification is guided by the training objective and the desired sparsity target in an end-to-end fashion. Experiments on CIFAR-{10, 100}, TinyImageNet, and ImageNet using WideResNet and ResNet{18, 50} models validate the effectiveness of our proposal and demonstrate that we can reliably achieve pre-determined sparsity targets without compromising on predictive performance.

Partial Disentanglement via Mechanism Sparsity

Sébastien Lachapelle

2022-07-09

auai.org/UAI/2022/Workshop/CRL (présentation orale)

Bayesian Structure Learning with Generative Flow Networks

In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian ne… (voir plus)tworks, from data. Defining such a distribution is very challenging, due to the combinatorially large sample space, and approximations based on MCMC are often required. Recently, a novel class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling of discrete and composite objects, such as graphs. In this work, we propose to use a GFlowNet as an alternative to MCMC for approximating the posterior distribution over the structure of Bayesian networks, given a dataset of observations. Generating a sample DAG from this approximate distribution is viewed as a sequential decision problem, where the graph is constructed one edge at a time, based on learned transition probabilities. Through evaluation on both simulated and real data, we show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs, and it compares favorably against other methods based on MCMC or variational inference.

2022-05-20

auai.org/UAI/2022/Conference (poster)

proceedings.mlr.press

Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution

Antonio Orvieto

Nicolas Loizou

Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochastic Polyak stepsize (SPS). The proposed … (voir plus)SPS comes with strong convergence guarantees and competitive performance; however, it has two main drawbacks when it is used in non-over-parameterized regimes: (i) It requires a priori knowledge of the optimal mini-batch losses, which are not available when the interpolation condition is not satisfied (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. In this work, we study the dynamics and the convergence properties of SGD equipped with new variants of the stochastic Polyak stepsize and provide solutions to both drawbacks of the original SPS. We first show that a simple modification of the original SPS that uses lower bounds instead of the optimal function values can directly solve issue (i). On the other hand, solving issue (ii) turns out to be more challenging and leads us to valuable insights into the method's behavior. We show that if interpolation is not satisfied, the correlation between SPS and stochastic gradients introduces a bias, which effectively distorts the expectation of the gradient signal near minimizers, leading to non-convergence - even if the stepsize is scaled down during training. To fix this issue, we propose DecSPS, a novel modification of SPS, which guarantees convergence to the exact minimizer - without a priori knowledge of the problem parameters. For strongly-convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.

2022-05-09

ArXiv (prépublication)

Data-Efficient Structured Pruning via Submodular Optimization

Marwa El Halabi

Suraj Srinivas

Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performa… (voir plus)nce. However, most current structured pruning methods do not provide any performance guarantees, and often require fine-tuning, which makes them inapplicable in the limited-data regime. We propose a principled data-efficient structured pruning method based on submodular optimization. In particular, for a given layer, we select neurons/channels to prune and corresponding new weights for the next layer, that minimize the change in the next layer's input induced by pruning. We show that this selection problem is a weakly submodular maximization problem, thus it can be provably approximated using an efficient greedy algorithm. Our method is guaranteed to have an exponentially decreasing error between the original model and the pruned model outputs w.r.t the pruned size, under reasonable assumptions. It is also one of the few methods in the literature that uses only a limited-number of training data and no labels. Our experimental results demonstrate that our method outperforms state-of-the-art methods in the limited-data regime.

Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA

Sébastien Lachapelle

Pau Rodriguez

Yash Sharma

Katie E Everett

Rémi LE PRIOL

Alexandre Lacoste

This work introduces a novel principle we call disentanglement via mechanism sparsity regularization, which can be applied when the latent f… (voir plus)actors of interest depend sparsely on past latent factors and/or observed auxiliary variables. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that relates them. We develop a rigorous identifiability theory, building on recent nonlinear independent component analysis (ICA) results, that formalizes this principle and shows how the latent variables can be recovered up to permutation if one regularizes the latent mechanisms to be sparse and if some graph connectivity criterion is satisfied by the data generating process. As a special case of our framework, we show how one can leverage unknown-target interventions on the latent factors to disentangle them, thereby drawing further connections between ICA and causality. We propose a VAE-based method in which the latent mechanisms are learned and regularized via binary masks, and validate our theory by showing it learns disentangled representations in simulations.

2022-01-01

CLeaR (publié)

proceedings.mlr.press

Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution

Antonio Orvieto

Nicolas Loizou

Recently Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochastic Polyak stepsize (SPS). The proposed S… (voir plus)PS comes with strong convergence guarantees and competitive performance; however, it has two main drawbacks when it is used in non-over-parameterized regimes: (i) It requires a priori knowledge of the optimal mini-batch losses, which are not available when the interpolation condition is not satisfied (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. In this work, we study the dynamics and the convergence properties of SGD equipped with new variants of the stochastic Polyak stepsize and provide solutions to both drawbacks of the original SPS. We first show that a simple modification of the original SPS that uses lower bounds instead of the optimal function values can directly solve issue (i). On the other hand, solving issue (ii) turns out to be more challenging and leads us to valuable insights into the method's behavior. We show that if interpolation is not satisfied, the correlation between SPS and stochastic gradients introduces a bias, which effectively distorts the expectation of the gradient signal near minimizers, leading to non-convergence - even if the stepsize is scaled down during training. To fix this issue, we propose DecSPS, a novel modification of SPS, which guarantees convergence to the exact minimizer - without a priori knowledge of the problem parameters. For strongly-convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.

Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

Eric Larsen

Andrea Lodi

This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a method… (voir plus)ology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming, where the second stage is demanding computationally. We aim to predict at a high speed the expected TDOS associated with the second-stage problem, conditionally on the first-stage variables. This may be used in support of the solution to the overall two-stage problem by avoiding the online generation of multiple second-stage scenarios and solutions. We formulate the tactical prediction problem as a stochastic optimal prediction program, whose solution we approximate with supervised machine learning. The training data set consists of a large number of deterministic operational problems generated by controlled probabilistic sampling. The labels are computed based on solutions to these problems (solved independently and offline), employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application on load planning for rail transportation show that deep learning models produce accurate predictions in very short computing time (milliseconds or less). The predictive accuracy is close to the lower bounds calculated based on sample average approximation of the stochastic prediction programs.

2022-01-01

INFORMS Journal on Computing (publié)

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and … (voir plus)the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.

A Survey of Self-Supervised and Few-Shot Object Detection

Gabriel Huang

Issam Hadj Laradji

David Vázquez

Pau Rodriguez

Labeling data is often expensive and time-consuming, especially for tasks such as object detection and instance segmentation, which require … (voir plus)dense labeling of the image. While few-shot object detection is about training a model on novel (unseen) object classes with little data, it still requires prior training on many labeled examples of base (seen) classes. On the other hand, self-supervised methods aim at learning representations from unlabeled data which transfer well to downstream tasks such as object detection. Combining few-shot and self-supervised object detection is a promising research direction. In this survey, we review and characterize the most recent approaches on few-shot and self-supervised object detection. Then, we give our main takeaways and discuss future research directions. Project page: https://gabrielhuang.github.io/fsod-survey/.

2021-10-27

ArXiv (preprint)

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Nicolas Loizou

Sharan Vaswani

Issam Hadj Laradji

We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing… (voir plus) the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press