Nicolas Le Roux

Core Industry Member

Canada CIFAR AI Chair

Adjunct Professor, McGill University, School of Computer Science

Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research

Research Scientist, Microsoft Research

Biography

I am an academic researcher with expertise in machine learning, computer vision, neural networks, deep learning, optimization, large-scale learning and statistical modelling in general.

Current Students

Alan Chan

PhD - Université de Montréal

Co-supervisor :

David Scott Krueger

alan.chan@mila.quebec

Website

Arnaud Bergeron

arnaud.bergeron1@mila.quebec

Kate Lobacheva

Postdoctorate

Co-supervisor :

Irina Rish

ekaterina.lobacheva@mila.quebec

Website

Github

Google Scholar

Reyhane Askari Hemmat

PhD - Université de Montréal

Principal supervisor :

Ioannis Mitliagkas

reyhane.askari.hemmat@mila.quebec

Publications

Reducing the variance in online optimization by transporting past gradients

Sébastien M. R. Arnold

Pierre-Antoine Manzagol

Reza Babanezhad Harikandeh

Ioannis Mitliagkas

Nicolas Le Roux

Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing pas… (see more)t gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.

arxiv.org

Understanding the impact of entropy in policy learning

Zafarali Ahmed

Nicolas Le Roux

Mohammad Norouzi

Dale Schuurmans

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{explorat… (see more)ion} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms.

2018-11-27

(published)

www.semanticscholar.org

Combining adaptive algorithms and hypergradient method: a performance and robustness study

Akram Erraqabi

Nicolas Le Roux

2018-09-27

(published)

www.semanticscholar.org

Online Hyper-Parameter Optimization

Damien Vincent

Sylvain Gelly

Nicolas Le Roux

Olivier Bousquet

2018-02-15

(published)

www.semanticscholar.org

Online variance-reducing optimization

Nicolas Le Roux

Reza Babanezhad 0001

Reza Babanezhad Harikandeh

Pierre-Antoine Manzagol

2018-02-12

International Conference on Learning Representations (published)

dblp.uni-trier.de

Negative eigenvalues of the Hessian in deep neural networks

Guillaume Alain

Nicolas Le Roux

Pierre-Antoine Manzagol

The loss function of deep networks is known to be non-convex but the precise nature of this nonconvexity is still an active area of research… (see more). In this work, we study the loss landscape of deep networks through the eigendecompositions of their Hessian matrix. In particular, we examine how important the negative eigenvalues are and the benefits one can observe in handling them appropriately.

2018-01-01

ICLR (Workshop) (published)

arxiv.org

BOUNDS LEAD TO IMPROVED CLASSIFIERS

Nicolas Le Roux

The standard approach to supervised classification involves the minimization of a log-loss as an upper bound to the classification error. Wh… (see more)ile this is a tight bound early on in the optimization, it overemphasizes the influence of incorrectly classified examples far from the decision boundary. Updating the upper bound during the optimization leads to improved classification rates while transforming the learning into a sequence of minimization problems. In addition, in the context where the classifier is part of a larger system, this modification makes it possible to link the performance of the classifier to that of the whole system, allowing the seamless introduction of external constraints.

2017-01-01

(published)

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Nicolas Le Roux

Biography

Current Students

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Nicolas Le Roux

Biography

Current Students

Publications