Portrait of Simon Lacoste-Julien

Simon Lacoste-Julien

Core Academic Member
Canada CIFAR AI Chair
Associate Scientific Director, Mila, Associate Professor, Université de Montréal, Department of Computer Science and Operations Research
Vice President and Lab Director, Samsung Advanced Institute of Technology (SAIT) AI Lab, Montréal

Biography

Simon Lacoste-Julien is an associate professor at Mila – Quebec Artificial Intelligence Institute and in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal. He is also a Canada CIFAR AI Chair and heads (part time) the SAIT AI Lab Montréal.

Lacoste-Julien‘s research interests are machine learning and applied mathematics, along with their applications to computer vision and natural language processing. He completed a BSc in mathematics, physics and computer science at McGill University, a PhD in computer science at UC Berkeley and a postdoc at the University of Cambridge.

After spending several years as a researcher at INRIA and the École normale supérieure in Paris, he returned to his home city of Montréal in 2016 to answer Yoshua Bengio’s call to help grow the Montréal AI ecosystem.

Current Students

Reza Babanezhad Harikandeh
Visiting Researcher - Samsung SAIT
babanezr@mila.quebec
Aristide Baratin
Visiting Researcher - Samsung SAIT
baratina@mila.quebec
Vitoria Barin Pacela
PhD - Université de Montréal
vitoria.barin-pacela@mila.quebec
Quentin Bertrand
Postdoctorate - Université de Montréal
Principal supervisor :
quentin.bertrand@mila.quebec
Kiho Cho
Visiting Researcher - Samsung SAIT
kiho.cho@mila.quebec
Marwa El Halabi
Visiting Researcher - Samsung SAIT
marwa.el-halabi@mila.quebec
Jose Gallego Posada
PhD - Université de Montréal
gallegoj@mila.quebec
Antonio Gois
PhD - Université de Montréal
antonio-miguel.gois@mila.quebec
Yash Goyal
Visiting Researcher - Samsung SAIT
yash.goyal@mila.quebec
Meraj Hashemizadeh
Collaborating Researcher - Université de Montréal
merajhse@mila.quebec
Alexia Jolicoeur-Martineau
Visiting Researcher - Samsung SAIT
jolicoea@mila.quebec
Pedram Khorsandi
PhD - Université de Montréal
pedram.khorsandi@mila.quebec
Kwon Kisoo
Visiting Researcher - Seoul National University, Korea
kwon.kisoo@mila.quebec
Boris Knyazev
Visiting Researcher - Université de Montréal
knyazevb@mila.quebec
Sébastien Lachapelle
PhD - Université de Montréal
lachaseb@mila.quebec
Jaewoo Lee
Visiting Researcher - Pohang University of Science and Technology in Pohang, Korea
jaewoo.lee@mila.quebec
Michelle Liu
Collaborating Researcher
liumiche@mila.quebec
Lucas Maes
PhD - Université de Montréal
lucas.maes@mila.quebec
Rozhin Nobahari
Master's Research - Université de Montréal
rozhin.nobahari@mila.quebec
Juan Ramirez
PhD - Université de Montréal
juan.ramirez@mila.quebec
Mansi Rankawat
PhD - Université de Montréal
mansi.rankawat@mila.quebec
Damien Scieur
Visiting Researcher - Samsung SAIT
scieurda@mila.quebec
Motahareh Sohrabi
Master's Research - Université de Montréal
motahareh.sohrabi@mila.quebec
Helen Zhang
PhD - Université de Montréal
tianyue.zhang@mila.quebec
Yan Zhang
Visiting Researcher - Samsung SAIT
yan.zhang@mila.quebec

Publications

PopulAtion Parameter Averaging (PAPA)
Alexia Jolicoeur-Martineau
Emy Gervais
Kilian FATRAS
Yan Zhang
Balancing Act: Constraining Disparate Impact in Sparse Models
Meraj Hashemizadeh
Juan Ramirez
Rohan Sukumaran
Jose Gallego-Posada
Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or … (see more)storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact induced by pruning (i) rely on surrogate metrics that address the problem indirectly and have limited interpretability; or (ii) scale poorly with the number of protected sub-groups in terms of computational cost. We propose a constrained optimization approach that directly addresses the disparate impact of pruning: our formulation bounds the accuracy change between the dense and sparse models, for each sub-group. This choice of constraints provides an interpretable success criterion to determine if a pruned model achieves acceptable disparity levels. Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups.
Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse Actions, Interventions and Sparse Temporal Dependencies
Sébastien Lachapelle
Pau Rodriguez
Yash Sharma
Katie Everett
Rémi LE PRIOL
Alexandre Lacoste
Weight-Sharing Regularization
Mehran Shakerinava
Motahareh Sohrabi
Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation
Sébastien Lachapelle
Divyat Mahajan
We tackle the problems of latent variables identification and "out-of-support'' image generation in representation learning. We show that bo… (see more)th are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Pranshu Malviya
Goncalo Mordido
Aristide Baratin
Reza Babanezhad Harikandeh
Jerry Huang
Razvan Pascanu
Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of su… (see more)ch optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.
On the Identifiability of Quantized Factors
Vitória Barin Pacela
Kartik Ahuja
Disentanglement aims to recover meaningful latent ground-truth factors from the observed distribution solely, and is formalized through the … (see more)theory of identifiability. The identifiability of independent latent factors is proven to be impossible in the unsupervised i.i.d. setting under a general nonlinear map from factors to observations. In this work, however, we demonstrate that it is possible to recover quantized latent factors under a generic nonlinear diffeomorphism. We only assume that the latent factors have independent discontinuities in their density, without requiring the factors to be statistically independent. We introduce this novel form of identifiability, termed quantized factor identifiability, and provide a comprehensive proof of the recovery of the quantized factors.
Identifiability of Discretized Latent Coordinate Systems via Density Landmarks Detection
Vitória Barin-Pacela
Kartik Ahuja
Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Boris Knyazev
DOHA HWANG
Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communi… (see more)ties with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.
CrossSplit: Mitigating Label Noise Memorization through Data Splitting
Jihye Kim
Aristide Baratin
Yan Zhang
We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label cor… (see more)rection and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labeled dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios. The project page is at https://rlawlgul.github.io/.
A Survey of Self-Supervised and Few-Shot Object Detection
Gabriel Huang
Issam Hadj Laradji
David Vazquez
Pau Rodriguez
Labeling data is often expensive and time-consuming, especially for tasks such as object detection and instance segmentation, which require … (see more)dense labeling of the image. While few-shot object detection is about training a model on novel (unseen) object classes with little data, it still requires prior training on many labeled examples of base (seen) classes. On the other hand, self-supervised methods aim at learning representations from unlabeled data which transfer well to downstream tasks such as object detection. Combining few-shot and self-supervised object detection is a promising research direction. In this survey, we review and characterize the most recent approaches on few-shot and self-supervised object detection. Then, we give our main takeaways and discuss future research directions. Project page: https://gabrielhuang.github.io/fsod-survey/.
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning
Sébastien Lachapelle
Tristan Deleu
Divyat Mahajan
Quentin Bertrand
Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding … (see more)is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem. Finally, we explore a meta-learning version of this algorithm based on group Lasso multiclass SVM base-predictors, for which we derive a tractable dual formulation. It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.