Gauthier Gidel

Biography

I am an assistant professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal, a core academic member of Mila – Quebec Artificial Intelligence Institute, and a Canada CIFAR AI Chair.

Previously, I was awarded a Borealis AI Graduate Fellowship, worked at DeepMind and Element AI, and was a Long-Term Visitor at the Simons Institute at UC Berkeley.

My research interests lie at the intersection of game theory, optimization and machine learning.

Current Students

Sadhana Anand

Master's Research - Université de Montréal

Manfred Diaz Cabrera

Collaborating researcher - Université de Montréal

David Dobre

PhD - Université de Montréal

Damien Ferbach

PhD - Université de Montréal

Co-supervisor :

Research Intern - Université de Montréal

PhD - Université de Montréal

Zichu Liu

PhD - Université de Montréal

Co-supervisor :

Ioannis Mitliagkas

Andjela Mladenovic

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - Université de Montréal

Leo Schwinn

Independent visiting researcher - Technical Univeristy of Munich

Tom Stanic

Research Intern - Université de Montréal

Danilo Vucetic

PhD - Université de Montréal

Sophie Xhonneux

PhD - Université de Montréal

Co-supervisor :

Jian Tang

What Do Synaptic Weight Distributions Tell Us About Learning in the Brain ?

Bora Yongacoglu

Collaborating Alumni - N/A

Blog Posts

June 13, 2024

Roman Pogodin

Jonathan Cornford

Arna Ghosh

Gauthier Gidel

Guillaume Lajoie

Blake Richards

Read the article

Publications

On the Stability of Iterative Retraining of Generative Models on their own Data

Alexandre Duplessis

Deep generative models have made tremendous progress in modeling complex data, often exhibiting generation quality that surpasses a typical … (see more)human's ability to discern the authenticity of samples. Undeniably, a key driver of this success is enabled by the massive amounts of web-scale data consumed by these models. Due to these models' striking performance and ease of availability, the web will inevitably be increasingly populated with synthetic content. Such a fact directly implies that future iterations of generative models will be trained on both clean and artificially generated data from past models. In this paper, we develop a framework to rigorously study the impact of training generative models on mixed datasets---from classical training on real data to self-consuming generative models trained on purely synthetic data. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough and the proportion of clean training data (w.r.t. synthetic data) is large enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models on CIFAR10 and FFHQ.

2024-01-16

ICLR.cc/2024/Conference (spotlight)

High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise.

Eduard Gorbunov

Abdurakhmon Sadiev

Marina Danilova

Samuel Horváth

Pavel Dvurechensky

Alexander Gasnikov

Peter Richtárik

2024-01-01

International Conference on Machine Learning (published)

Proving Linear Mode Connectivity of Neural Networks via Optimal Transport

Damien Ferbach

Baptiste Goujaud

Aymeric Dieuleveut

The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neura… (see more)l network architectures. Recent works have experimentally shown that two different solutions found after two runs of a stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a framework theoretically explaining this empirical observation. Based on convergence rates in Wasserstein distance of empirical measures, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we express upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates is correlated with linear mode connectivity.

2024-01-01

AISTATS (published)

Stochastic Frank-Wolfe: Unified Analysis and Zoo of Special Cases

Ruslan Nazykov

Aleksandr Shestakov

Vladimir Solodkin

Aleksandr Beznosikov

Alexander Gasnikov

The Conditional Gradient (or Frank-Wolfe) method is one of the most well-known methods for solving constrained optimization problems appeari… (see more)ng in various machine learning tasks. The simplicity of iteration and applicability to many practical problems helped the method to gain popularity in the community. In recent years, the Frank-Wolfe algorithm received many different extensions, including stochastic modifications with variance reduction and coordinate sampling for training of huge models or distributed variants for big data problems. In this paper, we present a unified convergence analysis of the Stochastic Frank-Wolfe method that covers a large number of particular practical cases that may have completely different nature of stochasticity, intuitions and application areas. Our analysis is based on a key parametric assumption on the variance of the stochastic gradients. But unlike most works on unified analysis of other methods, such as SGD, we do not assume an unbiasedness of the real gradient estimation. We conduct analysis for convex and non-convex problems due to the popularity of both cases in machine learning. With this general theoretical framework, we not only cover rates of many known methods, but also develop numerous new methods. This shows the flexibility of our approach in developing new algorithms based on the Conditional Gradient approach. We also demonstrate the properties of the new methods through numerical experiments.

2024-01-01

International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma

Quentin Bertrand

Juan Agustin Duque

Emilio Calvano

The deployment of machine learning systems in the market economy has triggered academic and institutional fears over potential tacit collusi… (see more)on between fully automated agents. Multiple recent economics studies have empirically shown the emergence of collusive strategies from agents guided by machine learning algorithms. In this work, we prove that multi-agent Q-learners playing the iterated prisoner's dilemma can learn to collude. The complexity of the cooperative multi-agent setting yields multiple fixed-point policies for

2023-12-13

ArXiv (preprint)

Proving Linear Mode Connectivity of Neural Networks via Optimal Transport

Damien Ferbach

Baptiste Goujaud

Aymeric Dieuleveut

2023-10-29

ArXiv (preprint)

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

Leo Schwinn

David Dobre

Stephan Günnemann

Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastl… (see more)y unsolved. Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations. Flawed robustness evaluations necessitate rectifications in subsequent works, dangerously slowing down the research and providing a false sense of security. In this context, we will face substantial challenges associated with an impending adversarial arms race in natural language processing, specifically with closed-source Large Language Models (LLMs), such as ChatGPT, Google Bard, or Anthropic's Claude. We provide a first set of prerequisites to improve the robustness assessment of new approaches and reduce the amount of faulty evaluations. Additionally, we identify embedding space attacks on LLMs as another viable threat model for the purposes of generating malicious content in open-sourced models. Finally, we demonstrate on a recently proposed defense that, without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach.

2023-10-27

NeurIPS.cc/2023/Workshop/ICBINB (published)

A Persuasive Approach to Combating Misinformation

Safwan Hossain

Andjela Mladenovic

Yiling Chen

Bayesian Persuasion is proposed as a tool for social media platforms to combat the spread of misinformation. Since platforms can use machine… (see more) learning to predict the popularity and misinformation features of to-be-shared posts, and users are largely motivated to share popular content, platforms can strategically signal this informational advantage to change user beliefs and persuade them not to share misinformation. We characterize the optimal signaling scheme with imperfect predictions as a linear program and give sufficient and necessary conditions on the classifier to ensure optimal platform utility is non-decreasing and continuous. Next, this interaction is considered under a performative model, wherein platform intervention affects the user's future behaviour. The convergence and stability of optimal signaling under this performative process are fully characterized. Lastly, we experimentally validate that our approach significantly reduces misinformation in both the single round and performative setting and discuss the broader scope of using information design to combat misinformation.

2023-10-18

ArXiv (preprint)

High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise

Eduard Gorbunov

Abdurakhmon Sadiev

Marina Danilova

Samuel Horváth

Pavel Dvurechensky

Alexander Gasnikov

Peter Richtárik

2023-10-03

ArXiv (preprint)

Optimal Extragradient-Based Algorithms for Stochastic Variational Inequalities with Separable Structure

Angela Yuan

Chris Junchi Li

Michael Jordan

Quanquan Gu

Simon Shaolei Du

We consider the problem of solving stochastic monotone variational inequalities with a separable structure using a stochastic first-order or… (see more)acle. Building on standard extragradient for variational inequalities we propose a novel algorithm---stochastic \emph{accelerated gradient-extragradient} (AG-EG)---for strongly monotone variational inequalities (VIs). Our approach combines the strengths of extragradient and Nesterov acceleration. By showing that its iterates remain in a bounded domain and applying scheduled restarting, we prove that AG-EG has an optimal convergence rate for strongly monotone VIs. Furthermore, when specializing to the particular case of bilinearly coupled strongly-convex-strongly-concave saddle-point problems, including bilinear games, our algorithm achieves fine-grained convergence rates that match the respective lower bounds, with the stochasticity being characterized by an additive statistical error term that is optimal up to a constant prefactor.

AI4GCC - Track 3: Consumption and the Challenges of Multi-Agent RL

Marco Jiralerspong

2023-08-09

ArXiv (preprint)