Nicolas Le Roux

Membre industriel principal

Chaire en IA Canada-CIFAR

Professeur associé, McGill University, École d'informatique

Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Chercheur scientifique, Microsoft Research

Biographie

Je suis un chercheur universitaire spécialisé dans l'apprentissage automatique, la vision par ordinateur, les réseaux de neurones, l'apprentissage en profondeur, l'optimisation, l'apprentissage à grande échelle et la modélisation statistique en général.

Étudiants actuels

Alan Chan

Doctorat - Université de Montréal

Co-superviseur⋅e :

David Scott Krueger

alan.chan@mila.quebec

Site web

Arnaud Bergeron

arnaud.bergeron1@mila.quebec

Kate Lobacheva

Postdoctorat

Co-superviseur⋅e :

Irina Rish

ekaterina.lobacheva@mila.quebec

Site web

Github

Google Scholar

Reyhane Askari Hemmat

Doctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Ioannis Mitliagkas

reyhane.askari.hemmat@mila.quebec

Publications

Reducing the variance in online optimization by transporting past gradients

Sébastien M. R. Arnold

Pierre-Antoine Manzagol

Reza Babanezhad Harikandeh

Ioannis Mitliagkas

Nicolas Le Roux

Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing pas… (voir plus)t gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.

arxiv.org

Understanding the impact of entropy in policy learning

Zafarali Ahmed

Nicolas Le Roux

Mohammad Norouzi

Dale Schuurmans

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{explorat… (voir plus)ion} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms.

2018-11-27

(publié)

www.semanticscholar.org

Combining adaptive algorithms and hypergradient method: a performance and robustness study

Akram Erraqabi

Nicolas Le Roux

2018-09-27

(publié)

www.semanticscholar.org

Online Hyper-Parameter Optimization

Damien Vincent

Sylvain Gelly

Nicolas Le Roux

Olivier Bousquet

2018-02-15

(publié)

www.semanticscholar.org

Online variance-reducing optimization

Nicolas Le Roux

Reza Babanezhad 0001

Reza Babanezhad Harikandeh

Pierre-Antoine Manzagol

2018-02-12

International Conference on Learning Representations (publié)

dblp.uni-trier.de

Negative eigenvalues of the Hessian in deep neural networks

Guillaume Alain

Nicolas Le Roux

Pierre-Antoine Manzagol

The loss function of deep networks is known to be non-convex but the precise nature of this nonconvexity is still an active area of research… (voir plus). In this work, we study the loss landscape of deep networks through the eigendecompositions of their Hessian matrix. In particular, we examine how important the negative eigenvalues are and the benefits one can observe in handling them appropriately.

2018-01-01

ICLR (Workshop) (publié)

arxiv.org

BOUNDS LEAD TO IMPROVED CLASSIFIERS

Nicolas Le Roux

The standard approach to supervised classification involves the minimization of a log-loss as an upper bound to the classification error. Wh… (voir plus)ile this is a tight bound early on in the optimization, it overemphasizes the influence of incorrectly classified examples far from the decision boundary. Updating the upper bound during the optimization leads to improved classification rates while transforming the learning into a sequence of minimization problems. In addition, in the context where the classifier is part of a larger system, this modification makes it possible to link the performance of the classifier to that of the whole system, allowing the seamless introduction of external constraints.

2017-01-01

(publié)

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Nicolas Le Roux

Biographie

Étudiants actuels

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Nicolas Le Roux

Biographie

Étudiants actuels

Publications