Portrait de Gauthier Gidel

Gauthier Gidel

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement
Modèles génératifs
Optimisation
Théorie de l'apprentissage automatique

Biographie

Je suis professeur adjoint au Département d’informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal et membre académique principal de Mila – Institut québécois d’intelligence artificielle. J'ai obtenu une bourse Borealis AI destinée aux étudiant·e·s des cycles supérieurs et je suis actuellement titulaire d'une chaire en IA Canada-CIFAR. J'ai travaillé chez DeepMind et Element AI, et j'ai récemment été un visiteur de longue durée au Simons Institute de l’Université de Californie à Berkeley. Mes intérêts de recherche se situent à l'intersection de la théorie des jeux, de l'optimisation et de l'apprentissage automatique.

Étudiants actuels

Maîtrise recherche - UdeM
Collaborateur·rice de recherche - UdeM
Doctorat - UdeM
Visiteur de recherche indépendant - N/A
Doctorat - UdeM
Co-superviseur⋅e :
Stagiaire de recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Visiteur de recherche indépendant - Technical Univeristy of Munich
Stagiaire de recherche - UdeM
Postdoctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice alumni - N/A

Publications

Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure
Angela Yuan
Chris Junchi Li
Michael Jordan
Quanquan Gu
Simon Shaolei Du
We consider the problem of solving stochastic monotone variational inequalities with a separable structure using a stochastic first-order or… (voir plus)acle. Building on standard extragradient for variational inequalities we propose a novel algorithm---stochastic \emph{accelerated gradient-extragradient} (AG-EG)---for strongly monotone variational inequalities (VIs). Our approach combines the strengths of extragradient and Nesterov acceleration. By showing that its iterates remain in a bounded domain and applying scheduled restarting, we prove that AG-EG has an optimal convergence rate for strongly monotone VIs. Furthermore, when specializing to the particular case of bilinearly coupled strongly-convex-strongly-concave saddle-point problems, including bilinear games, our algorithm achieves fine-grained convergence rates that match the respective lower bounds, with the stochasticity being characterized by an additive statistical error term that is optimal up to a constant prefactor.
AI4GCC - Track 3: Consumption and the Challenges of Multi-Agent RL
Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity
Adrien Taylor
Samuel Horváth
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance
Abdurakhmon Sadiev
Marina Danilova
Samuel Horváth
Pavel Dvurechensky
Alexander Gasnikov
Peter Richtárik
During the recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimiza… (voir plus)tion methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as boundedness of the gradient noise variance or of the objective’s gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central
Omega: Optimistic EMA Gradients
Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training… (voir plus). Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.
Raising the Bar for Certified Adversarial Robustness with Diffusion Models
Thomas Altstidl
David Dobre
Bjoern Eskofier
Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirica… (voir plus)l methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have shown that generating additional training data using state-of-the-art diffusion models can considerably improve the robustness of adversarial training. In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses. In addition, we provide a list of recommendations to scale the robustness of certified training approaches. One of our main insights is that the generalization gap, i.e., the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the robustness improvement when using additional generated data. Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the
A General Framework for Proving the Equivariant Strong Lottery Ticket Hypothesis
The Strong Lottery Ticket Hypothesis (SLTH) stipulates the existence of a subnetwork within a sufficiently overparameterized (dense) neural … (voir plus)network that -- when initialized randomly and without any training -- achieves the accuracy of a fully trained target network. Recent works by Da Cunha et. al 2022; Burkholz 2022 demonstrate that the SLTH can be extended to translation equivariant networks -- i.e. CNNs -- with the same level of overparametrization as needed for the SLTs in dense networks. However, modern neural networks are capable of incorporating more than just translation symmetry, and developing general equivariant architectures such as rotation and permutation has been a powerful design principle. In this paper, we generalize the SLTH to functions that preserve the action of the group
Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples
Avishek (Joey) Bose
Ian Gemp
Chongli Qin
Yoram Bachrach
The past few years have seen impressive progress in the development of deep generative models capable of producing high-dimensional, complex… (voir plus), and photo-realistic data. However, current methods for evaluating such models remain incomplete: standard likelihood-based metrics do not always apply and rarely correlate with perceptual fidelity, while sample-based metrics, such as FID, are insensitive to overfitting, i.e., inability to generalize beyond the training set. To address these limitations, we propose a new metric called the Feature Likelihood Divergence (FLD), a parametric sample-based metric that uses density estimation to provide a comprehensive trichotomic evaluation accounting for novelty (i.e., different from the training samples), fidelity, and diversity of generated samples. We empirically demonstrate the ability of FLD to identify overfitting problem cases, even when previously proposed metrics fail. We also extensively evaluate FLD on various image datasets and model classes, demonstrating its ability to match intuitions of previous metrics like FID while offering a more comprehensive evaluation of generative models. Code is available at https://github.com/marcojira/fld.
Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization
Chris Junchi Li
Huizhuo Yuan
Angela Yuan
Quanquan Gu
Michael Jordan
We propose a new first-order optimization algorithm — AcceleratedGradient-OptimisticGradient (AG-OG) Descent Ascent—for separable convex… (voir plus)-concave minimax optimization. The main idea of our algorithm is to carefully leverage the structure of the minimax problem, performing Nesterov acceleration on the individual component and optimistic gradient on the coupling component. Equipped with proper restarting, we show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings, including bilinearly coupled strongly convex-strongly concave minimax optimization (bi-SC-SC), bilinearly coupled convex-strongly concave minimax optimization (bi-C-SC), and bilinear games. We also extend our algorithm to the stochastic setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings. AG-OG is the first single-call algorithm with optimal convergence rates in both deterministic and stochastic settings for bilinearly coupled minimax optimization problems.
Performative Prediction with Neural Networks
Performative prediction is a framework for learning models that influence the data they intend to predict. We focus on finding classifiers t… (voir plus)hat are performatively stable, i.e. optimal for the data distribution they induce. Standard convergence results for finding a performatively stable classifier with the method of repeated risk minimization assume that the data distribution is Lipschitz continuous to the model's parameters. Under this assumption, the loss must be strongly convex and smooth in these parameters; otherwise, the method will diverge for some problems. In this work, we instead assume that the data distribution is Lipschitz continuous with respect to the model's predictions, a more natural assumption for performative systems. As a result, we are able to significantly relax the assumptions on the loss function. In particular, we do not need to assume convexity with respect to the model's parameters. As an illustration, we introduce a resampling procedure that models realistic distribution shifts and show that it satisfies our assumptions. We support our theory by showing that one can learn performatively stable classifiers with neural networks making predictions about real data that shift according to our proposed procedure.
On the Limitations of Elo: Real-World Games, are Transitive, not Additive
Wojciech M. Czarnecki
Real-world competitive games, such as chess, go, or StarCraft II, rely on Elo models to measure the strength of their players. Since these g… (voir plus)ames are not fully transitive, using Elo implicitly assumes they have a strong transitive component that can correctly be identified and extracted. In this study, we investigate the challenge of identifying the strength of the transitive component in games. First, we show that Elo models can fail to extract this transitive component, even in elementary transitive games. Then, based on this observation, we propose an extension of the Elo score: we end up with a disc ranking system that assigns each player two scores, which we refer to as skill and consistency. Finally, we propose an empirical validation on payoff matrices coming from real-world games played by bots and humans.
Variance Reduction is an Antidote to Byzantines: Better Rates, Weaker Assumptions and Communication Compression as a Cherry on the Top
Samuel Horváth
Peter Richtárik
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in collaborative and federated learning. However,… (voir plus) many fruitful directions, such as the usage of variance reduction for achieving robustness and communication compression for reducing communication costs, remain weakly explored in the field. This work addresses this gap and proposes Byz-VR-MARINA - a new Byzantine-tolerant method with variance reduction and compression. A key message of our paper is that variance reduction is key to fighting Byzantine workers more effectively. At the same time, communication compression is a bonus that makes the process more communication efficient. We derive theoretical convergence guarantees for Byz-VR-MARINA outperforming previous state-of-the-art for general non-convex and Polyak-Lojasiewicz loss functions. Unlike the concurrent Byzantine-robust methods with variance reduction and/or compression, our complexity results are tight and do not rely on restrictive assumptions such as boundedness of the gradients or limited compression. Moreover, we provide the first analysis of a Byzantine-tolerant method supporting non-uniform sampling of stochastic gradients. Numerical experiments corroborate our theoretical findings.