Gauthier Gidel

mehrnaz.mofakhami@mila.quebec

Postdoctorat - Université de Montréal

Co-superviseur⋅e :

quentin.bertrand@mila.quebec

Doctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Jian Tang

sophie.xhonneux@mila.quebec

Zichu Liu

Doctorat - Université de Montréal

Co-superviseur⋅e :

Que nous apprennent les distributions des coefficients synaptiques au sujet de l’apprentissage dans le cerveau ?

zichu.liu@mila.quebec

Billets de blogue

13 juin 2024

par

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Lire l'article

Publications

Momentum Extragradient is Optimal for Games with Cross-Shaped Spectrum

Junhyung Lyle Kim

Anastasios Kyrillidis

Fabian Pedregosa

Google Research

The extragradient method has recently gained a lot of attention, due to its convergence behavior on smooth games. In games, the eigenvalues … (voir plus)of the Jacobian of the vector field are distributed on the complex plane, exhibiting more convoluted dynamics compared to minimization. In this work, we take a polynomial-based analysis of the extragradient with momentum for optimizing games with \emph{cross-shaped} spectrum on the complex plane. We show two results: first, the extragradient with momentum exhibits three different modes of convergence based on the hyperparameter setup: when the eigenvalues are distributed

2022-11-23

NeurIPS.cc/2022/Workshop/OPT (poster)

Dissecting adaptive methods in GANs

Samy Jelassi

David Dobre

Arthur Mensch

Yuanzhi Li

Adaptive methods are a crucial component widely used for training generative adversarial networks (GANs). While there has been some work to … (voir plus)pinpoint the “marginal value of adaptive methods” in standard tasks, it remains unclear why they are still critical for GAN training. In this paper, we formally study how adaptive methods help train GANs; inspired by the grafting method proposed in Agarwal et al. (2020), we separate the magnitude and direction components of the Adam updates, and graft them to the direction and magnitude of SGDA updates respectively. By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training. This motivates us to have a closer look at the class of normalized stochastic gradient descent ascent (nSGDA) methods in the context of GAN training. We propose a synthetic theoretical framework to compare the performance of nSGDA and SGDA for GAN training with neural networks. We prove that in that setting, GANs trained with nSGDA recover all the modes of the true distribution, whereas the same networks trained with SGDA (and any learning rate configuration) suffer from mode collapse. The critical insight in our analysis is that normalizing the gradients forces the discriminator and generator to be updated at the same pace. We also experimentally show that for several datasets, Adam’s performance can be recovered with nSGDA methods.

2022-10-09

ArXiv (prépublication)

doi.org

Only tails matter: Average-Case Universality and Robustness in the Convex Regime

Leonardo Cunha

Fabian Pedregosa

Damien Scieur

Courtney Paquette

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (publié)

doi.org

Only Tails Matter: Average-Case Universality and Robustness in the Convex Regime

Leonardo Cunha

Fabian Pedregosa

Damien Scieur

Courtney Paquette

2022-06-20

ArXiv (prépublication)

doi.org

On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

Chris Junchi Li

Yaodong Yu

Nicolas Loizou

Yi Ma

Nicolas Le Roux

Michael I. Jordan

We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) meth… (voir plus)od with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.

2022-01-01

AISTATS (publié)

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Nicolas Loizou

Hugo Berard

Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and … (voir plus)the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.

Linear Lower Bounds and Conditioning of Differentiable Games

Adam Ibrahim

Waiss Azizian

Recent successes of game-theoretic formulations in ML have caused a resurgence of research interest in differentiable games. Overwhelmingly,… (voir plus) that research focuses on methods and upper bounds on their speed of convergence. In this work, we approach the question of fundamental iteration complexity by providing lower bounds to complement the linear (i.e. geometric) upper bounds observed in the literature on a wide class of problems. We cast saddle-point and min-max problems as 2-player games. We leverage tools from single-objective convex optimisation to propose new linear lower bounds for convex-concave games. Notably, we give a linear lower bound for

2020-11-21

Proceedings of the 37th International Conference on Machine Learning (publié)

proceedings.mlr.press

Accelerating Smooth Games by Manipulating Spectral Shapes

Waiss Azizian

Damien Scieur

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

Minimax Theorem for Latent Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets

D. Balduzzi

Wojciech M. Czarnecki

M. Garnelo

Yoram Bachrach

Adversarial training, a special case of multi-objective optimization, is an increasingly useful tool in machine learning. For example, two-p… (voir plus)layer zero-sum games are important for generative modeling (GANs) and for mastering games like Go or Poker via self-play. A classic result in Game Theory states that one must mix strategies, as pure equilibria may not exist. Surprisingly, machine learning practitioners typically train a \emph{single} pair of agents -- instead of a pair of mixtures -- going against Nash's principle. Our main contribution is a notion of limited-capacity-equilibrium for which, as capacity grows, optimal agents -- not mixtures -- can learn increasingly expressive and realistic behaviors. We define \emph{latent games}, a new class of game where agents are mappings that transform latent distributions. Examples include generators in GANs, which transform Gaussian noise into distributions on images, and StarCraft II agents, which transform sampled build orders into policies. We show that minimax equilibria in latent games can be approximated by a \emph{single} pair of dense neural networks. Finally, we apply our latent game approach to solve differentiable Blotto, a game with an infinite strategy space.

2020-02-14

ArXiv (prépublication)

Minimax Theorem for Latent Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets

D. Balduzzi

Wojciech M. Czarnecki

M. Garnelo

Yoram Bachrach

2020-02-14

ArXiv (preprint)

Accelerating Smooth Games by Manipulating Spectral Shapes

Waiss Azizian

Damien Scieur

We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set co… (voir plus)ntaining all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges in solving smooth games. In this framework, we describe gradient-based methods, such as extragradient, as transformations on the spectral shape. Using this perspective, we propose an optimal algorithm for bilinear games. For smooth and strongly monotone operators, we identify a continuum between convex minimization, where acceleration is possible using Polyak's momentum, and the worst case where gradient descent is optimal. Finally, going beyond first-order methods, we propose an accelerated version of consensus optimization.

2020-01-02

ArXiv (preprint)

A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games.

Waiss Azizian