Portrait de Gauthier Gidel

Gauthier Gidel

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement
Modèles génératifs
Optimisation
Théorie de l'apprentissage automatique

Biographie

Je suis professeur adjoint au Département d’informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal et membre académique principal de Mila – Institut québécois d’intelligence artificielle. J'ai obtenu une bourse Borealis AI destinée aux étudiant·e·s des cycles supérieurs et je suis actuellement titulaire d'une chaire en IA Canada-CIFAR. J'ai travaillé chez DeepMind et Element AI, et j'ai récemment été un visiteur de longue durée au Simons Institute de l’Université de Californie à Berkeley. Mes intérêts de recherche se situent à l'intersection de la théorie des jeux, de l'optimisation et de l'apprentissage automatique.

Étudiants actuels

Visiteur de recherche indépendant - UBC
Maîtrise recherche - Polytechnique
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche
Doctorat - UdeM
Postdoctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Visiteur de recherche indépendant - Technical Univeristy of Munich
Doctorat - UdeM
Co-superviseur⋅e :

Publications

Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top
Eduard Gorbunov
Samuel Horváth
Peter Richtárik
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in collaborative and federated learning. However,… (voir plus) many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field. This work addresses this gap and proposes Byz-VR-MARINA - a new Byzantine-tolerant method with variance reduction and compression. A key message of our paper is that variance reduction is key to fighting Byzantine workers more effectively. At the same time, communication compression is a bonus that makes the process more communication efficient. We derive theoretical convergence guarantees for Byz-VR-MARINA outperforming previous state-of-the-art for general non-convex and Polyak-Lojasiewicz loss functions. Unlike the concurrent Byzantine-robust methods with variance reduction and/or compression, our complexity results are tight and do not rely on restrictive assumptions such as boundedness of the gradients or limited compression. Moreover, we provide the first analysis of a Byzantine-tolerant method supporting non-uniform sampling of stochastic gradients. Numerical experiments corroborate our theoretical findings.
Momentum Extragradient is Optimal for Games with Cross-Shaped Spectrum
Junhyung Lyle Kim
Anastasios Kyrillidis
Fabian Pedregosa
Google Research
© J.l. Kim
The extragradient method has recently gained a lot of attention, due to its convergence behavior on smooth games. In games, the eigenvalues … (voir plus)of the Jacobian of the vector field are distributed on the complex plane, exhibiting more convoluted dynamics compared to minimization. In this work, we take a polynomial-based analysis of the extragradient with momentum for optimizing games with \emph{cross-shaped} spectrum on the complex plane. We show two results: first, the extragradient with momentum exhibits three different modes of convergence based on the hyperparameter setup: when the eigenvalues are distributed
Dissecting adaptive methods in GANs
Samy Jelassi
David Dobre
Arthur Mensch
Yuanzhi Li
Adaptive methods are a crucial component widely used for training generative adversarial networks (GANs). While there has been some work to … (voir plus)pinpoint the “marginal value of adaptive methods” in standard tasks, it remains unclear why they are still critical for GAN training. In this paper, we formally study how adaptive methods help train GANs; inspired by the grafting method proposed in Agarwal et al. (2020), we separate the magnitude and direction components of the Adam updates, and graft them to the direction and magnitude of SGDA updates respectively. By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training. This motivates us to have a closer look at the class of normalized stochastic gradient descent ascent (nSGDA) methods in the context of GAN training. We propose a synthetic theoretical framework to compare the performance of nSGDA and SGDA for GAN training with neural networks. We prove that in that setting, GANs trained with nSGDA recover all the modes of the true distribution, whereas the same networks trained with SGDA (and any learning rate configuration) suffer from mode collapse. The critical insight in our analysis is that normalizing the gradients forces the discriminator and generator to be updated at the same pace. We also experimentally show that for several datasets, Adam’s performance can be recovered with nSGDA methods.
Only tails matter: Average-Case Universality and Robustness in the Convex Regime
Leonardo Cunha
Fabian Pedregosa
Damien Scieur
Only Tails Matter: Average-Case Universality and Robustness in the Convex Regime
Leonardo Cunha
Fabian Pedregosa
Damien Scieur
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging
Chris Junchi Li
Yaodong Yu
Nicolas Loizou
Yi Ma
Michael I. Jordan
We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) meth… (voir plus)od with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity
Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and … (voir plus)the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.
Linear Lower Bounds and Conditioning of Differentiable Games
Adam Ibrahim
Waiss Azizian
Recent successes of game-theoretic formulations in ML have caused a resurgence of research interest in differentiable games. Overwhelmingly,… (voir plus) that research focuses on methods and upper bounds on their speed of convergence. In this work, we approach the question of fundamental iteration complexity by providing lower bounds to complement the linear (i.e. geometric) upper bounds observed in the literature on a wide class of problems. We cast saddle-point and min-max problems as 2-player games. We leverage tools from single-objective convex optimisation to propose new linear lower bounds for convex-concave games. Notably, we give a linear lower bound for
Accelerating Smooth Games by Manipulating Spectral Shapes
Minimax Theorem for Latent Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets
D. Balduzzi
Wojciech M. Czarnecki
M. Garnelo
Yoram Bachrach
Adversarial training, a special case of multi-objective optimization, is an increasingly useful tool in machine learning. For example, two-p… (voir plus)layer zero-sum games are important for generative modeling (GANs) and for mastering games like Go or Poker via self-play. A classic result in Game Theory states that one must mix strategies, as pure equilibria may not exist. Surprisingly, machine learning practitioners typically train a \emph{single} pair of agents -- instead of a pair of mixtures -- going against Nash's principle. Our main contribution is a notion of limited-capacity-equilibrium for which, as capacity grows, optimal agents -- not mixtures -- can learn increasingly expressive and realistic behaviors. We define \emph{latent games}, a new class of game where agents are mappings that transform latent distributions. Examples include generators in GANs, which transform Gaussian noise into distributions on images, and StarCraft II agents, which transform sampled build orders into policies. We show that minimax equilibria in latent games can be approximated by a \emph{single} pair of dense neural networks. Finally, we apply our latent game approach to solve differentiable Blotto, a game with an infinite strategy space.
Minimax Theorem for Latent Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets
D. Balduzzi
Wojciech M. Czarnecki
M. Garnelo
Yoram Bachrach
Adversarial training, a special case of multi-objective optimization, is an increasingly useful tool in machine learning. For example, two-p… (voir plus)layer zero-sum games are important for generative modeling (GANs) and for mastering games like Go or Poker via self-play. A classic result in Game Theory states that one must mix strategies, as pure equilibria may not exist. Surprisingly, machine learning practitioners typically train a \emph{single} pair of agents -- instead of a pair of mixtures -- going against Nash's principle. Our main contribution is a notion of limited-capacity-equilibrium for which, as capacity grows, optimal agents -- not mixtures -- can learn increasingly expressive and realistic behaviors. We define \emph{latent games}, a new class of game where agents are mappings that transform latent distributions. Examples include generators in GANs, which transform Gaussian noise into distributions on images, and StarCraft II agents, which transform sampled build orders into policies. We show that minimax equilibria in latent games can be approximated by a \emph{single} pair of dense neural networks. Finally, we apply our latent game approach to solve differentiable Blotto, a game with an infinite strategy space.
Accelerating Smooth Games by Manipulating Spectral Shapes
We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set co… (voir plus)ntaining all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges in solving smooth games. In this framework, we describe gradient-based methods, such as extragradient, as transformations on the spectral shape. Using this perspective, we propose an optimal algorithm for bilinear games. For smooth and strongly monotone operators, we identify a continuum between convex minimization, where acceleration is possible using Polyak's momentum, and the worst case where gradient descent is optimal. Finally, going beyond first-order methods, we propose an accelerated version of consensus optimization.