Nicolas Le Roux

Membre industriel principal

Chaire en IA Canada-CIFAR

Professeur associé, McGill University, École d'informatique

Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Chercheur scientifique, Microsoft Research

Biographie

Je suis un chercheur universitaire spécialisé dans l'apprentissage automatique, la vision par ordinateur, les réseaux de neurones, l'apprentissage en profondeur, l'optimisation, l'apprentissage à grande échelle et la modélisation statistique en général.

Étudiants actuels

Alan Chan

Doctorat - Université de Montréal

Co-superviseur⋅e :

David Scott Krueger

alan.chan@mila.quebec

Site web

Arnaud Bergeron

arnaud.bergeron1@mila.quebec

Kate Lobacheva

Postdoctorat

Co-superviseur⋅e :

Irina Rish

ekaterina.lobacheva@mila.quebec

Site web

Github

Google Scholar

Reyhane Askari Hemmat

Doctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Ioannis Mitliagkas

reyhane.askari.hemmat@mila.quebec

Publications

A general class of surrogate functions for stable and efficient reinforcement learning

Sharan Vaswani

Olivier Bachem

Simone Totaro

Robert Lynn Mueller

Shivam Garg

Matthieu. Geist

Marlos C. Machado

Pablo Samuel Castro

Nicolas Le Roux

2022-01-01

AISTATS (publié)

proceedings.mlr.press

arxiv.org

Multi-Head Adapter Routing for Data-Efficient Fine-Tuning

Lucas Caccia

Edoardo Ponti

Lu Liu

Matheus Pereira

Nicolas Le Roux

Alessandro Sordoni

Parameter-efﬁcient ﬁne-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly add… (voir plus)ed parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon [Ponti et al., 2022] jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and ﬁne-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efﬁcient generalization. Thus, we propose less expressive variants where we perform weighted averaging of the adapters before few-shot adaptation ( Poly - µ ) instead of learning a routing function. Moreover, we introduce more expressive variants where ﬁner-grained task–adapter allocation is learned through a multi-head routing function ( Poly - S ). We test these variants on three separate benchmarks for multi-task learning. We ﬁnd that Poly - S achieves gains on all three (up to 5.3 points on average) over strong baselines, while incurring a negligible additional cost in parameter count. In particular, we ﬁnd that instruction tuning, where models are fully ﬁne-tuned on natural language instructions for each task, is inferior to modular methods such as Polytropon and our proposed variants.

2022-01-01

arXiv.org (prépublication)

doi.org

On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

Chris Junchi Li

Yaodong Yu

Nicolas Loizou

Gauthier Gidel

Yi Ma

Nicolas Le Roux

Michael I. Jordan

We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) meth… (voir plus)od with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.

2022-01-01

AISTATS (publié)

arxiv.org

Impact of Aliasing on Generalization in Deep Convolutional Networks

Cristina Vasconcelos

Hugo Larochelle

Vincent Dumoulin

Rob Romijnders

Nicolas Le Roux

Ross Goroshin

We investigate the impact of aliasing on generalization in Deep Convolutional Networks and show that data augmentation schemes alone are una… (voir plus)ble to prevent it due to structural limitations in widely used architectures. Drawing insights from frequency analysis theory, we take a closer look at ResNet and EfficientNet architectures and review the trade-off between aliasing and information loss in each of their major components. We show how to mitigate aliasing by inserting non-trainable low-pass filters at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in generalization on i.i.d. and even more on out-of-distribution conditions, such as image classification under natural corruptions on ImageNet-C [11] and few-shot learning on Meta-Dataset [26]. State-of-the art results are achieved on both datasets without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.

2021-10-10

2021 IEEE/CVF International Conference on Computer Vision (ICCV) (publié)

doi.org

arxiv.org

Beyond variance reduction: Understanding the true impact of baselines on policy optimization

Wesley Chung

Valentin Thomas

Marlos C. Machado

Nicolas Le Roux

2021-07-01

Proceedings of the 38th International Conference on Machine Learning (publié)

proceedings.mlr.press

arxiv.org

Bridging the Gap Between Adversarial Robustness and Optimization Bias

Fartash Faghri

Cristina Vasconcelos

David J Fleet

Fabian Pedregosa

Nicolas Le Roux

2021-02-17

ArXiv (prépublication)

arxiv.org

Batch Reinforcement Learning Through Continuation Method

Yijie Guo

Shengyu Feng

Nicolas Le Roux

Ed Chi

Honglak Lee

Minmin Chen

Many real-world applications of reinforcement learning (RL) require the agent to learn from a fixed set of trajectories, without collecting … (voir plus)new interactions. Policy optimization under this setting is extremely challenging as: 1) the geometry of the objective function is hard to optimize efficiently; 2) the shift of data distributions causes high noise in the value estimation. In this work, we propose a simple yet effective policy iteration approach to batch RL using global optimization techniques known as continuation. By constraining the difference between the learned policy and the behavior policy that generates the fixed trajectories, and continuously relaxing the constraint, our method 1) helps the agent escape local optima; 2) reduces the error in policy evaluation in the optimization procedure. We present results on a variety of control tasks, game environments, and a recommendation task to empirically demonstrate the efficacy of our proposed method.

2021-01-01

ICLR (publié)

openreview.net

An Effective Anti-Aliasing Approach for Residual Networks

Cristina Vasconcelos

Hugo Larochelle

Vincent Dumoulin

Nicolas Le Roux

Ross Goroshin

Image pre-processing in the frequency domain has traditionally played a vital role in computer vision and was even part of the standard pipe… (voir plus)line in the early days of deep learning. However, with the advent of large datasets, many practitioners concluded that this was unnecessary due to the belief that these priors can be learned from the data itself. Frequency aliasing is a phenomenon that may occur when sub-sampling any signal, such as an image or feature map, causing distortion in the sub-sampled output. We show that we can mitigate this effect by placing non-trainable blur filters and using smooth activation functions at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in out-of-distribution generalization on both image classification under natural corruptions on ImageNet-C [10] and few-shot learning on Meta-Dataset [17], without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.

2020-11-20

ArXiv (prépublication)

arxiv.org

To Each Optimizer a Norm, To Each Norm its Generalization

Sharan Vaswani

Reza Babanezhad Harikandeh

Jose Gallego

Aaron Mishkin

Simon Lacoste-Julien

Nicolas Le Roux

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and… (voir plus) over-parametrized regimes. Since it is difficult to determine whether an optimizer converges to solutions that minimize a known norm, we flip the problem and investigate what is the corresponding norm minimized by an interpolating solution. Using this reasoning, we prove that for over-parameterized linear regression, projections onto linear spans can be used to move between different interpolating solutions. For under-parameterized linear classification, we prove that for any linear classifier separating the data, there exists a family of quadratic norms ||.||_P such that the classifier's direction is the same as that of the maximum P-margin solution. For linear classification, we argue that analyzing convergence to the standard maximum l2-margin is arbitrary and show that minimizing the norm induced by the data results in better generalization. Furthermore, for over-parameterized linear classification, projections onto the data-span enable us to use techniques from the under-parameterized setting. On the empirical side, we propose techniques to bias optimizers towards better generalizing solutions, improving their test performance. We validate our theoretical results via synthetic experiments, and use the neural tangent kernel to handle non-linear models.

2020-06-11

ArXiv (prépublication)

arxiv.org

The Geometry of Sign Gradient Descent

Lukas Balles

Fabian Pedregosa

Nicolas Le Roux

2020-02-19

ArXiv (prépublication)

arxiv.org

How to make your optimizer generalize better

Sharan Vaswani

Reza Babenzhad

Sait AI Lab

Montreal.

Jose Gallego

Aaron Mishkin

Simon Lacoste-Julien

Nicolas Le Roux

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and… (voir plus) over-parametrized regimes. For over-parameterized linear regression, where there are inﬁnitely many interpolating solutions, different optimization methods can converge to solutions with varying generalization performance. In this setting, we show that projections onto linear spans can be used to move between solutions. Furthermore, via a simple reparameterization, we can ensure that an arbitrary optimizer converges to the minimum (cid:96) 2 -norm solution with favourable generalization properties. For under-parameterized linear clas-siﬁcation, optimizers can converge to different decision boundaries separating the data. We prove that for any such classiﬁer, there exists a family of quadratic norms (cid:107)·(cid:107) P such that the classiﬁer’s direction is the same as that of the maximum P -margin solution. We argue that analyzing convergence to the standard maximum (cid:96) 2 -margin is arbitrary and show that minimizing the norm induced by the data can result in better generalization. We validate our theoretical results via experiments on synthetic and real datasets.

2020-01-01

(publié)

www.semanticscholar.org

An operator view of policy gradient methods

Dibya Ghosh

Marlos C. Machado

Nicolas Le Roux

We cast policy gradient methods as the repeated application of two operators: a policy improvement operator …

arxiv.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Nicolas Le Roux

Biographie

Étudiants actuels

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Nicolas Le Roux

Biographie

Étudiants actuels

Publications