Portrait de Gauthier Gidel

Gauthier Gidel

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle

Biographie

Je suis professeur adjoint au Département d’informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal et membre académique principal de Mila – Institut québécois d’intelligence artificielle. J'ai obtenu une bourse Borealis AI destinée aux étudiant·e·s des cycles supérieurs et je suis actuellement titulaire d'une chaire en IA Canada-CIFAR. J'ai travaillé chez DeepMind et Element AI, et j'ai récemment été un visiteur de longue durée au Simons Institute de l’Université de Californie à Berkeley. Mes intérêts de recherche se situent à l'intersection de la théorie des jeux, de l'optimisation et de l'apprentissage automatique.

Étudiants actuels

Maîtrise recherche - Université de Montréal
Collaborateur·rice de recherche
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Doctorat - Université de Montréal
Stagiaire de recherche - McGill University
Maîtrise recherche - Université de Montréal
Co-superviseur⋅e :
Postdoctorat - Université de Montréal
Co-superviseur⋅e :
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Université de Montréal
Co-superviseur⋅e :

Publications

Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Damien Ferbach
Baptiste Goujaud
Aymeric Dieuleveut
The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neura… (voir plus)l network architectures. Recent works have experimentally shown that two different solutions found after two runs of a stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a framework theoretically explaining this empirical observation. Based on convergence rates in Wasserstein distance of empirical measures, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we express upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates is correlated with linear mode connectivity.
Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma
Quentin Bertrand
Juan Duque
Emilio Calvano
The deployment of machine learning systems in the market economy has triggered academic and institutional fears over potential tacit collusi… (voir plus)on between fully automated agents. Multiple recent economics studies have empirically shown the emergence of collusive strategies from agents guided by machine learning algorithms. In this work, we prove that multi-agent Q-learners playing the iterated prisoner's dilemma can learn to collude. The complexity of the cooperative multi-agent setting yields multiple fixed-point policies for
Proving Linear Mode Connectivity of Neural Networks via Optimal Transport
Damien Ferbach
Baptiste Goujaud
Aymeric Dieuleveut
The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neura… (voir plus)l network architectures. Recent works have experimentally shown that two different solutions found after two runs of a stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a framework theoretically explaining this empirical observation. Based on convergence rates in Wasserstein distance of empirical measures, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we express upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates is correlated with linear mode connectivity.
Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
Leo Schwinn
David Dobre
Stephan Günnemann
Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastl… (voir plus)y unsolved. Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations. Flawed robustness evaluations necessitate rectifications in subsequent works, dangerously slowing down the research and providing a false sense of security. In this context, we will face substantial challenges associated with an impending adversarial arms race in natural language processing, specifically with closed-source Large Language Models (LLMs), such as ChatGPT, Google Bard, or Anthropic's Claude. We provide a first set of prerequisites to improve the robustness assessment of new approaches and reduce the amount of faulty evaluations. Additionally, we identify embedding space attacks on LLMs as another viable threat model for the purposes of generating malicious content in open-sourced models. Finally, we demonstrate on a recently proposed defense that, without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach.
A Persuasive Approach to Combating Misinformation
Safwan Hossain
Andjela Mladenovic
Yiling Chen
Bayesian Persuasion is proposed as a tool for social media platforms to combat the spread of misinformation. Since platforms can use machine… (voir plus) learning to predict the popularity and misinformation features of to-be-shared posts, and users are largely motivated to share popular content, platforms can strategically signal this informational advantage to change user beliefs and persuade them not to share misinformation. We characterize the optimal signaling scheme with imperfect predictions as a linear program and give sufficient and necessary conditions on the classifier to ensure optimal platform utility is non-decreasing and continuous. Next, this interaction is considered under a performative model, wherein platform intervention affects the user's future behaviour. The convergence and stability of optimal signaling under this performative process are fully characterized. Lastly, we experimentally validate that our approach significantly reduces misinformation in both the single round and performative setting and discuss the broader scope of using information design to combat misinformation.
High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise
Eduard Gorbunov
Abdurakhmon Sadiev
Marina Danilova
Samuel Horváth
Pavel Dvurechensky
Alexander Gasnikov
Peter Richtárik
Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure
Angela Yuan
Chris Junchi Li
Michael Jordan
Quanquan Gu
Simon Shaolei Du
We consider the problem of solving stochastic monotone variational inequalities with a separable structure using a stochastic first-order or… (voir plus)acle. Building on standard extragradient for variational inequalities we propose a novel algorithm---stochastic \emph{accelerated gradient-extragradient} (AG-EG)---for strongly monotone variational inequalities (VIs). Our approach combines the strengths of extragradient and Nesterov acceleration. By showing that its iterates remain in a bounded domain and applying scheduled restarting, we prove that AG-EG has an optimal convergence rate for strongly monotone VIs. Furthermore, when specializing to the particular case of bilinearly coupled strongly-convex-strongly-concave saddle-point problems, including bilinear games, our algorithm achieves fine-grained convergence rates that match the respective lower bounds, with the stochasticity being characterized by an additive statistical error term that is optimal up to a constant prefactor.
AI4GCC - Track 3: Consumption and the Challenges of Multi-Agent RL
Marco Jiralerspong
Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity
Eduard Gorbunov
Adrien Taylor
Samuel Horváth
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance
Abdurakhmon Sadiev
Marina Danilova
Eduard Gorbunov
Samuel Horváth
Pavel Dvurechensky
Alexander Gasnikov
Peter Richtárik
During the recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimiza… (voir plus)tion methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as boundedness of the gradient noise variance or of the objective’s gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central
Omega: Optimistic EMA Gradients
Juan Ramirez
Rohan Sukumaran
Quentin Bertrand
Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training… (voir plus). Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.
Raising the Bar for Certified Adversarial Robustness with Diffusion Models
Thomas R. Altstidl
David Dobre
Björn M. Eskofier
Leo Schwinn
Certified defenses against adversarial attacks offer formal guarantees on the robustness of a model, making them more reliable than empirica… (voir plus)l methods such as adversarial training, whose effectiveness is often later reduced by unseen attacks. Still, the limited certified robustness that is currently achievable has been a bottleneck for their practical adoption. Gowal et al. and Wang et al. have shown that generating additional training data using state-of-the-art diffusion models can considerably improve the robustness of adversarial training. In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses. In addition, we provide a list of recommendations to scale the robustness of certified training approaches. One of our main insights is that the generalization gap, i.e., the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the robustness improvement when using additional generated data. Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the