Gauthier Gidel

Biography

I am an assistant professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal, a core academic member of Mila – Quebec Artificial Intelligence Institute, and a Canada CIFAR AI Chair.

Previously, I was awarded a Borealis AI Graduate Fellowship, worked at DeepMind and Element AI, and was a Long-Term Visitor at the Simons Institute at UC Berkeley.

My research interests lie at the intersection of game theory, optimization and machine learning.

Current Students

Sadhana Anand

Master's Research - Université de Montréal

Manfred Diaz Cabrera

Collaborating researcher - Université de Montréal

David Dobre

PhD - Université de Montréal

Damien Ferbach

PhD - Université de Montréal

Co-supervisor :

Research Intern - Université de Montréal

PhD - Université de Montréal

Zichu Liu

PhD - Université de Montréal

Co-supervisor :

Ioannis Mitliagkas

Andjela Mladenovic

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - Université de Montréal

Brice Rauby

PhD - Polytechnique Montréal

Principal supervisor :

Maxime Gasse

Leo Schwinn

Independent visiting researcher - Technical Univeristy of Munich

Tom Stanic

Research Intern - Université de Montréal

Danilo Vucetic

PhD - Université de Montréal

Sophie Xhonneux

PhD - Université de Montréal

Co-supervisor :

Jian Tang

What Do Synaptic Weight Distributions Tell Us About Learning in the Brain ?

Bora Yongacoglu

Collaborating Alumni - N/A

Blog Posts

June 13, 2024

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Read the article

Publications

High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance

Abdurakhmon Sadiev

Marina Danilova

Eduard Gorbunov

Samuel Horváth

Pavel Dvurechensky

Alexander Gasnikov

Peter Richtárik

During the recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimiza… (see more)tion methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as boundedness of the gradient noise variance or of the objective’s gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

Omega: Optimistic EMA Gradients

Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training… (see more). Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.

2023-07-02

ICML.cc/2023/Workshop/LXAI_Regular_Deadline (oral)

Raising the Bar for Certified Adversarial Robustness with Diffusion Models

Thomas Altstidl

David Dobre

Bjoern Eskofier

Leo Schwinn

Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirica… (see more)l methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have shown that generating additional training data using state-of-the-art diffusion models can considerably improve the robustness of adversarial training. In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses. In addition, we provide a list of recommendations to scale the robustness of certified training approaches. One of our main insights is that the generalization gap, i.e., the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the robustness improvement when using additional generated data. Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the

2023-05-17

ArXiv (preprint)

arxiv.org

High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance

Abdurakhmon Sadiev

Marina Danilova

Eduard Gorbunov

Samuel Horváth

Pavel Dvurechensky

Alexander Gasnikov

Peter Richtárik

During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization… (see more) methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central

2023-04-24

ICML.cc/2023/Conference (poster)

Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features

Aleksandr Beznosikov

David Dobre

The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints that arise in machine learni… (see more)ng applications. In recent years, stochastic versions of FW have gained popularity, motivated by large datasets for which the computation of the full gradient is prohibitively expensive. In this paper, we present two new variants of the FW algorithms for stochastic finite-sum minimization. Our algorithms have the best convergence guarantees of existing stochastic FW approaches for both convex and non-convex objective functions. Our methods do not have the issue of permanently collecting large batches, which is common to many stochastic projection-free approaches. Moreover, our second approach does not require either large batches or full deterministic gradients, which is a typical weakness of many techniques for finite-sum problems. The faster theoretical rates of our approaches are confirmed experimentally.

2023-04-23

ArXiv (preprint)

arxiv.org

A General Framework For Proving The Equivariant Strong Lottery Ticket Hypothesis

The Strong Lottery Ticket Hypothesis (SLTH) stipulates the existence of a subnetwork within a sufficiently overparameterized (dense) neural … (see more)network that -- when initialized randomly and without any training -- achieves the accuracy of a fully trained target network. Recent works by Da Cunha et. al 2022; Burkholz 2022 demonstrate that the SLTH can be extended to translation equivariant networks -- i.e. CNNs -- with the same level of overparametrization as needed for the SLTs in dense networks. However, modern neural networks are capable of incorporating more than just translation symmetry, and developing general equivariant architectures such as rotation and permutation has been a powerful design principle. In this paper, we generalize the SLTH to functions that preserve the action of the group

2023-02-01

ICLR.cc/2023/Conference (poster)

Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity

Eduard Gorbunov

Adrien Taylor

Samuel Horváth

Algorithms for min-max optimization and variational inequalities are often studied under monotonicity assumptions. Motivated by non-monotone… (see more) machine learning applications, we follow the line of works (Diakonikolas et al., 2021; Lee & Kim, 2021; Pethick et al., 2022; Bohm,2022) aiming at going beyond monotonicity by considering the weaker *negative comonotonicity* assumption. In this work, we provide tight complexity analyses for the Proximal Point (PP), Extragradient (EG), and Optimistic Gradient (OG) methods in this setup, closing several questions on their working guarantees beyond monotonicity. In particular, we derive the first non-asymptotic convergence rates for PP under negative comonotonicity and star-negative comonotonicity and show their tightness via constructing worst-case examples; we also relax the assumptions for the last-iterate convergence guarantees for EG and OG and prove the tightness of the existing best-iterate guarantees for EG and OG via constructing counter-examples.

2023-01-01

ICML (published)

Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples

Marco Jiralerspong

Joey Bose

Ian Gemp

Chongli Qin

Yoram Bachrach

Feature Likelihood Score: Evaluating Generalization of Generative Models Using Samples

Marco Jiralerspong

Joey Bose

Deep generative models have demonstrated the ability to generate complex, high-dimensional, and photo-realistic data. However, a uniﬁed fr… (see more)amework for evaluating different generative modeling families remains a challenge. Indeed, likelihood-based metrics do not apply in many cases while pure sample-based metrics such as FID fail to capture known failure modes such as overﬁtting on training data. In this work, we introduce the Feature Likelihood Score (FLS), a parametric sample-based score that uses density estimation to quantitatively measure the quality/diversity of generated samples while taking into account overﬁtting. We empirically demonstrate the ability of FLS to identify speciﬁc overﬁtting problem cases, even when previously proposed metrics fail. We further perform an extensive experimental evaluation on various image datasets and model classes. Our results indicate that FLS matches intuitions of previous metrics, such as FID, while providing a more holistic evaluation of generative models that highlights models whose generalization abilities are under or overappreciated. Code for computing FLS is provided at https://github.com/marcojira/ﬂs.

2023-01-01

arXiv.org (preprint)

Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization

Chris Junchi Li

Huizhuo Yuan

Angela Yuan

Quanquan Gu

Michael Jordan

We propose a new first-order optimization algorithm — AcceleratedGradient-OptimisticGradient (AG-OG) Descent Ascent—for separable convex… (see more)-concave minimax optimization. The main idea of our algorithm is to carefully leverage the structure of the minimax problem, performing Nesterov acceleration on the individual component and optimistic gradient on the coupling component. Equipped with proper restarting, we show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings, including bilinearly coupled strongly convex-strongly concave minimax optimization (bi-SC-SC), bilinearly coupled convex-strongly concave minimax optimization (bi-C-SC), and bilinear games. We also extend our algorithm to the stochastic setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings. AG-OG is the first single-call algorithm with optimal convergence rates in both deterministic and stochastic settings for bilinearly coupled minimax optimization problems.

2023-01-01

ICML (published)

proceedings.mlr.press

Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization

Chris Junchi Li

Angela Yuan

Quanquan Gu

Michael Jordan

We propose a new first-order optimization algorithm --- AcceleratedGradient-OptimisticGradient (AG-OG) Descent Ascent---for separable convex… (see more)-concave minimax optimization. The main idea of our algorithm is to carefully leverage the structure of the minimax problem, performing Nesterov acceleration on the individual component and optimistic gradient on the coupling component. Equipped with proper restarting, we show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings, including bilinearly coupled strongly convex-strongly concave minimax optimization (bi-SC-SC), bilinearly coupled convex-strongly concave minimax optimization (bi-C-SC), and bilinear games. We also extend our algorithm to the stochastic setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings. AG-OG is the first single-call algorithm with optimal convergence rates in both deterministic and stochastic settings for bilinearly coupled minimax optimization problems.

2023-01-01

ICML (published)

Performative Prediction with Neural Networks

Mehrnaz Mofakhami

Ioannis Mitliagkas

Performative prediction is a framework for learning models that influence the data they intend to predict. We focus on finding classifiers t… (see more)hat are performatively stable, i.e. optimal for the data distribution they induce. Standard convergence results for finding a performatively stable classifier with the method of repeated risk minimization assume that the data distribution is Lipschitz continuous to the model's parameters. Under this assumption, the loss must be strongly convex and smooth in these parameters; otherwise, the method will diverge for some problems. In this work, we instead assume that the data distribution is Lipschitz continuous with respect to the model's predictions, a more natural assumption for performative systems. As a result, we are able to significantly relax the assumptions on the loss function. In particular, we do not need to assume convexity with respect to the model's parameters. As an illustration, we introduce a resampling procedure that models realistic distribution shifts and show that it satisfies our assumptions. We support our theory by showing that one can learn performatively stable classifiers with neural networks making predictions about real data that shift according to our proposed procedure.

2023-01-01

AISTATS (published)