Portrait de Pierre-Antoine Manzagol n'est pas disponible

Pierre-Antoine Manzagol

Alumni

Publications

On the interplay between noise and curvature and its effect on optimization and generalization
The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the v… (voir plus)ariance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.
Information matrices and generalization
This work revisits the use of information criteria to characterize the generalization of deep learning models. In particular, we empirically… (voir plus) demonstrate the effectiveness of the Takeuchi information criterion (TIC), an extension of the Akaike information criterion (AIC) for misspecified models, in estimating the generalization gap, shedding light on why quantities such as the number of parameters cannot quantify generalization. The TIC depends on both the Hessian of the loss H and the covariance of the gradients C. By exploring the similarities and differences between these two matrices as well as the Fisher information matrix F, we study the interplay between noise and curvature in deep models. We also address the question of whether C is a reasonable approximation to F, as is commonly assumed.
Information matrices and generalization
This work revisits the use of information criteria to characterize the generalization of deep learning models. In particular, we empirically… (voir plus) demonstrate the effectiveness of the Takeuchi information criterion (TIC), an extension of the Akaike information criterion (AIC) for misspecified models, in estimating the generalization gap, shedding light on why quantities such as the number of parameters cannot quantify generalization. The TIC depends on both the Hessian of the loss H and the covariance of the gradients C. By exploring the similarities and differences between these two matrices as well as the Fisher information matrix F, we study the interplay between noise and curvature in deep models. We also address the question of whether C is a reasonable approximation to F, as is commonly assumed.
Reducing the variance in online optimization by transporting past gradients
Most stochastic optimization methods use gradients once before discarding them. While variance reduction methods have shown that reusing pas… (voir plus)t gradients can be beneficial when there is a finite number of datapoints, they do not easily extend to the online setting. One issue is the staleness due to using past gradients. We propose to correct this staleness using the idea of implicit gradient transport (IGT) which transforms gradients computed at previous iterates into gradients evaluated at the current iterate without using the Hessian explicitly. In addition to reducing the variance and bias of our updates over time, IGT can be used as a drop-in replacement for the gradient estimate in a number of well-understood methods such as heavy ball or Adam. We show experimentally that it achieves state-of-the-art results on a wide range of architectures and benchmarks. Additionally, the IGT gradient estimator yields the optimal asymptotic convergence rate for online stochastic optimization in the restricted setting where the Hessians of all component functions are equal.
Online variance-reducing optimization
Negative eigenvalues of the Hessian in deep neural networks
The loss function of deep networks is known to be non-convex but the precise nature of this nonconvexity is still an active area of research… (voir plus). In this work, we study the loss landscape of deep networks through the eigendecompositions of their Hessian matrix. In particular, we examine how important the negative eigenvalues are and the benefits one can observe in handling them appropriately.
Theano: A Python framework for fast computation of mathematical expressions
Rami Al-rfou'
Amjad Almahairi
Christof Angermüller
Frédéric Bastien
Justin S. Bayer
A. Belikov
A. Belopolsky
J. Bergstra
Josh Bleecher Snyder
Paul F. Christiano
Marc-Alexandre Côté
Myriam Côté
Julien Demouth
Sander Dieleman
M'elanie Ducoffe
Ziye Fan
Mathieu Germain
Ian J. Goodfellow
Matthew Graham
Balázs Hidasi
Arjun Jain
S'ebastien Jean
Kai Jia
Mikhail V. Korobov
Vivek Kulkarni
Pascal Lamblin
Eric P. Larsen
S. Lee
Simon-mark Lefrancois
J. Livezey
Cory R. Lorenz
Jeremiah L. Lowin
Qianli M. Ma
R. McGibbon
Mehdi Mirza
Alberto Orlandi
Colin Raffel
Daniel Renshaw
Matthew David Rocklin
Markus Dr. Roth
Peter Sadowski
John Salvatier
Jan Schlüter
John D. Schulman
Gabriel Schwartz
Iulian V. Serban
Samira Shabanian
Sigurd Spieckermann
S. Subramanyam
Gijs van Tulder
Joseph P. Turian
Sebastian Urban
Dustin J. Webb
M. Willson
Lijun Xue
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (voir plus)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Theano: A Python framework for fast computation of mathematical expressions
Rami Al-rfou'
Amjad Almahairi
Christof Angermüller
Frédéric Bastien
Justin S. Bayer
A. Belikov
A. Belopolsky
Josh Bleecher Snyder
Paul F. Christiano
Marc-Alexandre Côté
Myriam Côté
Julien Demouth
Sander Dieleman
M'elanie Ducoffe
Ziye Fan
Mathieu Germain
Ian G Goodfellow
Matthew Graham
Balázs Hidasi
Arjun Jain
Kai Jia
Mikhail V. Korobov
Vivek Kulkarni
Pascal Lamblin
Eric Larsen
S. Lee
Simon-mark Lefrancois
J. Livezey
Cory R. Lorenz
Jeremiah L. Lowin
Qianli M. Ma
R. McGibbon
Mehdi Mirza
Alberto Orlandi
Colin Raffel
Daniel Renshaw
Matthew David Rocklin
Markus Dr. Roth
Peter Sadowski
John Salvatier
Jan Schlüter
John D. Schulman
Gabriel Schwartz
Iulian V. Serban
Samira Shabanian
Sigurd Spieckermann
S. Subramanyam
Gijs van Tulder
Sebastian Urban
Dustin J. Webb
M. Willson
Lijun Xue
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (voir plus)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.