Portrait de Gauthier Gidel

Gauthier Gidel

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage par renforcement
Modèles génératifs
Optimisation
Théorie de l'apprentissage automatique

Biographie

Je suis professeur adjoint au Département d’informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal et membre académique principal de Mila – Institut québécois d’intelligence artificielle. J'ai obtenu une bourse Borealis AI destinée aux étudiant·e·s des cycles supérieurs et je suis actuellement titulaire d'une chaire en IA Canada-CIFAR. J'ai travaillé chez DeepMind et Element AI, et j'ai récemment été un visiteur de longue durée au Simons Institute de l’Université de Californie à Berkeley. Mes intérêts de recherche se situent à l'intersection de la théorie des jeux, de l'optimisation et de l'apprentissage automatique.

Étudiants actuels

Maîtrise recherche - UdeM
Stagiaire de recherche - UdeM
Doctorat - UdeM
Visiteur de recherche indépendant - N/A
Doctorat - UdeM
Co-superviseur⋅e :
Stagiaire de recherche - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - UdeM
Visiteur de recherche indépendant - Technical Univeristy of Munich
Stagiaire de recherche - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Collaborateur·rice alumni - N/A

Publications

Why Open Source? A Game-Theoretic Analysis of the AI Race
In recent years, with the advancement of frontier AI, we have observed certain dynamics in open-sourcing and closed-sourcing decisions. We p… (voir plus)ropose a game-theoretic model to analyze these dynamics in the current landscape of the AI race. Our model builds on an R&D race framework under a winner-takes-all setting, and it accounts for the cases where the players' actions can be either discrete or continuous (i.e., partial open-sourcing, such as open weights). We show that determining the existence of a discrete pure non-trivial Nash equilibrium is NP-hard in general but that we can transform the discrete Nash existence computation into a MIP (Mixed-Integer Programming) problem, making it tractable for small instances using a standard MIP solver. Next, we show the existence and tractability of pure Nash equilibria in the continuous version of our problem, leveraging standard convex analysis results, and constructing an equivalent MIP formulation. Throughout this work, we leverage both our main technical results as well as surrounding technical analysis, to derive socially relevant insights that we believe can serve both to understand already existing decisions and dynamics and to potentially inform new policies.
Soft Mellowmax Monte Carlo Planning
Soft mellowmax (SMM) recently emerged as an alternative operator in Q-learning, achieving impressive performance in games and scientific dis… (voir plus)covery tasks. Despite SMM's ability to achieve high returns and its enticing robustness, diversity, and sample efficiency characteristics, SMM has not yet been translated into a Monte Carlo tree search algorithm. To address this gap, a soft mellowmax-based Monte Carlo tree search algorithm, SMM-TS, is proposed and theoretically justified. It is empirically demonstrated that SMM-TS converges significantly faster than other tree search methods in synthetic environments, while maintaining competitive performance in games. The fast convergence of SMM-TS makes recursive self-improvement loops more scalable, while the stability gained via planning and the robustness of the operator make SMM-TS more practical for agents operating in uncertain and changing environments.
Beyond Reward Maximization: Evaluating the Diversity of Trajectories in Reinforcement Learning with Temporal Vendi Score
In domains such as scientific discovery and automated design using reinforcement learning (RL), the final task of an agent should extend bey… (voir plus)ond maximising a single scalar reward; it requires identifying diverse sets of high-quality trajectories to uncover distinct solutions that can provide novel insights on how to solve the problems of interest and transfer robustly from simulation to the real world. However, the RL literature currently lacks a holistic, domain-agnostic standard for measuring trajectory diversity. Existing metrics have been developed to improve exploration at training time but not to evaluate and compare diversity induced by different agents, rendering cross-method comparisons inconsistent and challenging. To address this, we introduce the Temporal Vendi Score (TVS), a novel metric designed to evaluate the diversity of an RL agent by computing the entropy of the eigenvalues' similarity matrix of sampled trajectories. Unlike previous approaches, our metric captures the behavioural diversity of trajectories by accounting for both the sequential nature of state visitations and the temporal structure of the underlying MDP, rather than relying on order-agnostic state comparisons. We validate the TVS on simple environments where we can control the number of different ways a problem can be solved, demonstrating that it provides a more robust, semantically meaningful ranking of diversity than standard baselines. We then show that our metric can scale to a high-dimensional, continuous environment.
Logarithmic-time Schedules for Scaling Language Models with Momentum
In practice, the hyperparameters …
Accelerated and Stable Convergence with Anchored Generalized Optimistic Method
We study first-order methods for solving monotone variational inequalities arising in min-max optimization. Classical approaches such as the… (voir plus) extragradient method rely on two gradient queries per iteration, which limits their analysis and applicability in the online and stochastic settings. We propose a family of Generalized Optimistic Methods with Anchoring (GOMA), which combine two time-scale optimistic updates with an anchoring term inspired by Halpern iteration. In particular, we show that for monotone Lipschitz operators, GOMA achieves an accelerated last-iterate convergence rate of
A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness
Moritz Ladenburger
Tim Beyer
Stephan Günnemann
Automated \enquote{LLM-as-a-Judge} frameworks have become the de facto standard for scalable evaluation across natural language processing. … (voir plus)For instance, in safety evaluation, these judges are relied upon to evaluate harmfulness in order to benchmark the robustness of safety against adversarial attacks. However, we show that existing validation protocols fail to account for substantial distribution shifts inherent to red-teaming: diverse victim models exhibit distinct generation styles, attacks distort output patterns, and semantic ambiguity varies significantly across jailbreak scenarios. Through a comprehensive audit using 6642 human-verified labels, we reveal that the unpredictable interaction of these shifts often causes judge performance to degrade to near random chance. This stands in stark contrast to the high human agreement reported in prior work. Crucially, we find that many attacks inflate their success rates by exploiting judge insufficiencies rather than eliciting genuinely harmful content. To enable more reliable evaluation, we propose ReliableBench, a benchmark of behaviors that remain more consistently judgeable, and JudgeStressTest, a dataset designed to expose judge failures. (Data in supplement).
Position: LLM-Safety Evaluations Lack Robustness
Tim Beyer
Simon Geisler
Stephan Günnemann
In this position paper, we argue that current safety alignment research efforts for large language models are hindered by many intertwined s… (voir plus)ources of noise, such as small datasets, methodological inconsistencies, and unreliable evaluation setups. This can, at times, make it impossible to evaluate and compare attacks and defenses fairly, thereby slowing research progress. We systematically analyze the LLM safety evaluation pipeline, covering dataset curation, optimization strategies for automated red-teaming, response generation, and response evaluation using LLM judges. At each stage, we identify key issues and highlight their practical impact. We also propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers. Lastly, we offer an opposing perspective, highlighting practical reasons for existing limitations. We believe that addressing the outlined problems in future research will improve the field’s ability to generate easily comparable results and make measurable progress.
Dimension-adapted Momentum Outscales SGD
We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by dat… (voir plus)a complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.
Tight Lower Bounds and Improved Convergence in Performative Prediction
Performative prediction is a framework accounting for the shift in the data distribution induced by the prediction of a model deployed in th… (voir plus)e real world. Ensuring rapid convergence to a stable solution where the data distribution remains the same after the model deployment is crucial, especially in evolving environments. This paper extends the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers and enabling convergence to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that utilizing historical datasets can surpass the lower bound for last iterate RRM, and empirically observe faster convergence to the stable point on various performative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Minimizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our framework.
Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning
A major bottleneck in scientific discovery consists of narrowing an exponentially large set of objects, such as proteins or molecules, to a … (voir plus)small set of promising candidates with desirable properties. While this process can rely on expert knowledge, recent methods leverage reinforcement learning (RL) guided by a proxy reward function to enable this filtering. By employing various forms of entropy regularization, these methods aim to learn samplers that generate diverse candidates that are highly rated by the proxy function. In this work, we make two main contributions. First, we show that these methods are liable to generate overly diverse, suboptimal candidates in large search spaces. To address this issue, we introduce a novel unified operator that combines several regularized RL operators into a general framework that better targets peakier sampling distributions. Secondly, we offer a novel, robust RL perspective of this filtering process. The regularization can be interpreted as robustness to a compositional form of uncertainty in the proxy function (i.e., the true evaluation of a candidate differs from the proxy's evaluation). Our analysis leads us to a novel, easy-to-use algorithm we name trajectory general mellowmax (TGM): we show it identifies higher quality, diverse candidates than baselines in both synthetic and real-world tasks. Code: https://github.com/marcojira/tgm.
Jailbreak Distillation: Renewable Safety Benchmarking
Jingyu Zhang
Ahmed Elgohary
Xiawei Wang
A S M Iftekhar
Ahmed Magooda
Benjamin Van Durme
Daniel Khashabi
Kyle Jackson
JBDistill Benchmark JBDistill Benchmark
Marah Ihab Abdin
Jyoti Aneja
Harkirat Singh Behl
Sébastien Bubeck
Ronen Eldan
S. Gunasekar
Michael Harrison
Russell J. Hewett
Mojan Javaheripi
Piero Kauffmann
James R. Lee … (voir 484 de plus)
Yin Tat Lee
Yuanzhi Li
Weishung Liu
C. C. T. Mendes
Anh Nguyen
Eric Price
Gustavo de Rosa
Olli Saarikivi
Adil Salim
Tim Beyer
Simon Geisler
Stephan Günnemann. 2025
Blake Bullwinkel
Amanda Minnich
Shiven Chawla
Gary Lopez
Martin Pouliot
Whitney Maxwell
Patrick Chao
Edoardo Debenedetti
Alexander Robey
Maksym Andriushchenko
Francesco Croce
Vikash Sehwag
Edgar Dobriban
Nicolas Flammarion
George J. Pappas
Florian Tramèr
Hamed Hassani
Eric Wong
Jailbreakbench
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
Lev E McKinney
Rohit Gandikota
Aidan Ewart
Domenic Rosati
Zichu Wu
Zikui Cai
Daya Guo
Dejian Yang
Haowei Zhang
Jun-Mei Song
Ruoyu Zhang
Runxin Xu
Qihao Zhu
Shirong Ma
Peiyi Wang
Xiaoling Bi
Xiaokang Zhang
Xingkai Yu
Yu Wu
Z. F. Wu
Zhibin Gou
Zhihong Shao
Zhuoshu Li
Ziyi Gao
A. Liu
Bing Xue
Bingxuan Wang
Bo WU
Bei Feng
Chenggang Lu
Chenggang Zhao
Chengqi Deng
Chenyu Zhang
C. Ruan
Damai Dai
Deli Chen
Dong-Li Ji
Erhang Li
Fangyun Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guowei Li
Han Bao
Hanwei Xu
Haocheng Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jingchang Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jinbo Cai
Jia Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
Litong Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Min Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Meng Wang
Qiancheng Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Runji Wang
R. J. Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Shanhuang Chen
Shengfeng Ye
Shiyu Wang
Shuiping Yu
Shunfeng Zhou
Shuting Pan
S. S. Li
Shuang Zhou
Shao-Ping Wu
Tao Yun
Tian Pei
Tianyu Sun
T. Wang
Wangding Zeng
Wanjia Zhao
Wen Liu
Wenfeng Liang
Wenjun Gao
Wen-Xuan Yu
Wentao Zhang
Wei Xiao
Wei An
Xiaodong Liu
Xiaohan Wang
Xiaokang Chen
Xiaotao Nie
Xin Cheng
Jian Li
Xinfeng Xie
Xingchao Liu
Xinyu Yang
Xinyuan Li
Xuecheng Su
Xuheng Lin
Xiangyu Jin
Xi-Cheng Shen
Xiaosha Chen
Xiaowen Sun
Xiaoxi-ang Wang
Xinnan Song
Xinyi Zhou
Xianzu Wang
Xinxia Shan
Y. K. Li
Y. Q. Wang
Y. X. Wei
Yang Zhang
Yan-Hong Xu
Yao Zhao
Yaofeng Sun
Yaohui Wang
Yi Yu
Yichao Zhang
Yifan Shi
Yi Xiong
Ying He
Yishi Piao
Yisong Wang
Yi Chern Tan
Yiyang Ma
Yiyuan Liu
Yongqiang Guo
Yuan Ou
Yuduan Wang
Yue Gong
Yuheng Zou
Yuzi He
Yunfan Xiong
Yuxiang Luo
Yuxiang You
Yu-mei You
Yuxuan Liu
Yuyang Zhou
Y. X. Zhu
Yanping Huang
Yaohui Li
Yang Li
Yi Zheng
Yunxiang Ma
Ying Tang
Yukun Zha
Yuting Yan
Z. Z. Ren
Zehui Ren
Zhangli Sha
Zhe Fu
Zhean Xu
Zhenda Xie
Zhengyan Zhang
Zhewen Hao
Zhicheng Ma
Zhigang Yan
Zhiyu Wu
Zihui Gu
Zijia Zhu
Zijun Liu
Zi-An Li
Ziwei Xie
Ziyang Song
Deep Ganguli
Liane Lovitt
Jackson Kernion
Amanda Askell
Yuntao Bai
Saurav Kadavath
Benjamin Mann
Nicholas Schiefer
Kamal Ndousse
Andy Jones
Sam Bowman
Anna Chen
Tom Con-erly
Nova Dassarma
Dawn Drain
Nelson Elhage Sheer
Stanislav Fort
Zac Hatfield-Dodds
T. Henighan
Danny Hernandez
Tristan Hume
Josh Jacobson
Scott Johnston
Shauna Kravec
Catherine Olsson
Sam Ringer
Eli Tran-Johnson
Dario Amodei
Tom Brown
Nicholas Joseph
Sam McCandlish
Chris Olah
Jared Kaplan
Jack Clark. 2022. Red
Aaron Grattafiori
Abhimanyu Dubey
Abhinav Jauhri
Abhinav Pandey
Abhishek Kadian
Ahmad Al-Dahle
Aiesha Letman
Akhil Mathur
Alan Schel-ten
Alex Vaughan
Amy Yang
Angela Fan
A. Hartshorn
Aobo Yang
Archi Mitra
Archie Sravankumar
Artem Korenev
Arthur Hinsvark
Arun Rao
Aston Zhang
Aurelien Ro-driguez
Austen Gregerson
Ava Spataru
Baptiste Rozière
Bethany Biron
Binh Tang
Bobbie Chern
Charlotte Caucheteux
Chaya Nayak
Chloe Bi
Chris Marra
Chris McConnell
Christian Keller
Christophe Touret
Chunyang Wu
Corinne Wong
Cris-tian Cantón Ferrer
Cyrus Nikolaidis
Damien Al-lonsius
Daniel Song
Danielle Pintz
Danny Livshits
Danny Wyatt
David Esiobu
Dhruv Choudhary
Dhruv Mahajan 0001
Diego Garcia-Olano
Diego Perino
Dieuwke Hupkes
Egor Lakomkin
Ehab A. AlBadawy
Elina Lobanova
Emily Dinan
Eric Michael Smith
Filip Radenovic
Francisco Guzmán
Frank Zhang
Gabriele Synnaeve
Gabrielle Lee
Georgia Lewis
G. Thattai
Graeme Nail
Gregoire Mi-alon
Guan Pang
Guillem Cucurell
Hailey Nguyen
Han-nah Korevaar
Hu Xu
Hugo Touvron
Imanol Iliyan Zarov
Arrieta Ibarra
Is-abel Kloumann
Ishan Misra
Ivan Evtimov
Jack Zhang
Jade Copet
Jaewon Lee
Jan Geffert
Jana Vranes
Jason Park
Jay Mahadeokar
Jeet Shah
Jelmer van der Linde
Jennifer Billock
Jenny Hong
Jenya Lee
Jeremy Fu
J. Fu
Jianfeng Chi
Jianyu Huang
Jiawen Liu
Jie Wang
Jiecao Yu
Joanna Bitton
Joe Spisak
Jongsoo Park
Joseph Rocca
J. Johnstun
Joshua Saxe
Junteng Jia
Kalyan Vasuden Alwala
Karthik Prasad
Kartikeya Upasani
Kate Plawiak
Keqian Li
Kenneth Heafield
Kevin R. Stone
Khalid El-Arini
Krithika Iyer
Kshitiz Malik
Kuen-ley Chiu
Kunal Bhalla
Kushal Lakhotia
Lauren Rantala-Yeary
Laurens van der Maaten
Lawrence Chen
Liang Tan
Liz Jenkins
Louis Martin
Lovish Madaan
Lubo Malo
Lukas Blecher
Lukas Landzaat
Luke de Oliveira
Madeline Muzzi
Mahesh Pasupuleti
Mannat Singh
Manohar Paluri
Marcin Kardas
Maria Tsimpoukelli
Mathew Oldham
Mathieu Rita
Maya Pavlova
Melanie Kam-badur
Mike Lewis
Mitesh Min Si
Kumar Singh
Mona Hassan
Naman Goyal
Narjes Torabi
Niko-lay Bashlykov
Nikolay Bogoychev
Niladri S. Chatterji
Ning Zhang
Olivier Duchenne
Onur Çelebi
Patrick Alrassy
Petar Pengwei Li
Peter Weng
Prajjwal Bhargava
Pratik Dubal
Punit Praveen Krishnan
Singh Koura
Puxin Xu
Qing He
Qingxiao Dong
Ragavan Srinivasan
Raj Ganapathy
Ramon Calderer
Ricardo Silveira Cabral
Robert Stojnic
Roberta Raileanu
Rohan Maheswari
Rohit Girdhar
Rohit Patel
Ro-main Sauvestre
Ron-nie Polidoro
Roshan Sumbaly
Ross Taylor
Ruan Silva
Rui Hou
Rui Wang
S. Hosseini
Sa-hana Chennabasappa
Sanjay Singh
Sean Bell
Seo-hyun Sonia Kim
Sergey Edunov
Shaoliang Nie
Sharan Narang
Sheng Shen
Shengye Wan
Shruti Bhosale
Shun Zhang
Simon Van-denhende
Soumya Batra
Spencer Whitman
Sten Sootla
Stephane Collot
Suchin Gururangan
S. Borodinsky
Tamar Herman
Tara Fowler
Tarek Sheasha
Thomas Georgiou
Thomas Scialom
Tobias Speckbacher
Todor Mihaylov
Tong Xiao
Ujjwal Karn
Vedanuj Goswami
Vibhor Gupta
Vignesh Ramanathan
Viktor Kerkez
Vincent Gonguet
Vir-ginie Do
Vish Vogeti
Vitor Albiero
Vladan Petro-vic
Weiwei Chu
Wenhan Xiong
Wenyin Fu
Self-Play Q-Learners Can Provably Collude in the Iterated Prisoner's Dilemma
Juan Agustin Duque
Emilio Calvano
A growing body of computational studies shows that simple machine learning agents converge to cooperative behaviors in social dilemmas, such… (voir plus) as collusive price-setting in oligopoly markets, raising questions about what drives this outcome. In this work, we provide theoretical foundations for this phenomenon in the context of self-play multi-agent Q-learners in the iterated prisoner’s dilemma. We characterize broad conditions under which such agents provably learn the cooperative Pavlov (win-stay, lose-shift) policy rather than the Pareto-dominated “always defect” policy. We validate our theoretical results through additional experiments, demonstrating their robustness across a broader class of deep learning algorithms.