Portrait of Gauthier Gidel

Gauthier Gidel

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Generative Models
Machine Learning Theory
Optimization
Reinforcement Learning

Biography

I am an assistant professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal, a core academic member of Mila – Quebec Artificial Intelligence Institute, and a Canada CIFAR AI Chair.

Previously, I was awarded a Borealis AI Graduate Fellowship, worked at DeepMind and Element AI, and was a Long-Term Visitor at the Simons Institute at UC Berkeley.

My research interests lie at the intersection of game theory, optimization and machine learning.

Current Students

Master's Research - Université de Montréal
Collaborating researcher - Université de Montréal
PhD - Université de Montréal
Independent visiting researcher - N/A
PhD - Université de Montréal
Co-supervisor :
Research Intern - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
Co-supervisor :
Collaborating researcher - Université de Montréal
Independent visiting researcher - Technical Univeristy of Munich
Research Intern - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Collaborating Alumni - N/A

Publications

Soft Mellowmax Monte Carlo Planning
Soft mellowmax (SMM) recently emerged as an alternative operator in Q-learning, achieving impressive performance in games and scientific dis… (see more)covery tasks. Despite SMM's ability to achieve high returns and its enticing robustness, diversity, and sample efficiency characteristics, SMM has not yet been translated into a Monte Carlo tree search algorithm. To address this gap, a soft mellowmax-based Monte Carlo tree search algorithm, SMM-TS, is proposed and theoretically justified. It is empirically demonstrated that SMM-TS converges significantly faster than other tree search methods in synthetic environments, while maintaining competitive performance in games. The fast convergence of SMM-TS makes recursive self-improvement loops more scalable, while the stability gained via planning and the robustness of the operator make SMM-TS more practical for agents operating in uncertain and changing environments.
Logarithmic-time Schedules for Scaling Language Models with Momentum
In practice, the hyperparameters …
Dimension-adapted Momentum Outscales SGD
We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by dat… (see more)a complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.
Tight Lower Bounds and Improved Convergence in Performative Prediction
Performative prediction is a framework accounting for the shift in the data distribution induced by the prediction of a model deployed in th… (see more)e real world. Ensuring rapid convergence to a stable solution where the data distribution remains the same after the model deployment is crucial, especially in evolving environments. This paper extends the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers and enabling convergence to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that utilizing historical datasets can surpass the lower bound for last iterate RRM, and empirically observe faster convergence to the stable point on various performative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Minimizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our framework.
Discrete Compositional Generation via General Soft Operators and Robust Reinforcement Learning
A major bottleneck in scientific discovery consists of narrowing an exponentially large set of objects, such as proteins or molecules, to a … (see more)small set of promising candidates with desirable properties. While this process can rely on expert knowledge, recent methods leverage reinforcement learning (RL) guided by a proxy reward function to enable this filtering. By employing various forms of entropy regularization, these methods aim to learn samplers that generate diverse candidates that are highly rated by the proxy function. In this work, we make two main contributions. First, we show that these methods are liable to generate overly diverse, suboptimal candidates in large search spaces. To address this issue, we introduce a novel unified operator that combines several regularized RL operators into a general framework that better targets peakier sampling distributions. Secondly, we offer a novel, robust RL perspective of this filtering process. The regularization can be interpreted as robustness to a compositional form of uncertainty in the proxy function (i.e., the true evaluation of a candidate differs from the proxy's evaluation). Our analysis leads us to a novel, easy-to-use algorithm we name trajectory general mellowmax (TGM): we show it identifies higher quality, diverse candidates than baselines in both synthetic and real-world tasks. Code: https://github.com/marcojira/tgm.
Jailbreak Distillation: Renewable Safety Benchmarking
Jingyu Zhang
Ahmed Elgohary
Xiawei Wang
A S M Iftekhar
Ahmed Magooda
Benjamin Van Durme
Daniel Khashabi
Kyle Jackson
JBDistill Benchmark JBDistill Benchmark
Marah Ihab Abdin
Jyoti Aneja
Harkirat Singh Behl
Sébastien Bubeck
Ronen Eldan
S. Gunasekar
Michael Harrison
Russell J. Hewett
Mojan Javaheripi
Piero Kauffmann
James R. Lee … (see 484 more)
Yin Tat Lee
Yuanzhi Li
Weishung Liu
C. C. T. Mendes
Anh Nguyen
Eric Price
Gustavo de Rosa
Olli Saarikivi
Adil Salim
Tim Beyer
Simon Geisler
Stephan Günnemann. 2025
Blake Bullwinkel
Amanda Minnich
Shiven Chawla
Gary Lopez
Martin Pouliot
Whitney Maxwell
Patrick Chao
Edoardo Debenedetti
Alexander Robey
Maksym Andriushchenko
Francesco Croce
Vikash Sehwag
Edgar Dobriban
Nicolas Flammarion
George J. Pappas
Florian Tramèr
Hamed Hassani
Eric Wong
Jailbreakbench
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
Lev E McKinney
Rohit Gandikota
Aidan Ewart
Domenic Rosati
Zichu Wu
Zikui Cai
Daya Guo
Dejian Yang
Haowei Zhang
Jun-Mei Song
Ruoyu Zhang
Runxin Xu
Qihao Zhu
Shirong Ma
Peiyi Wang
Xiaoling Bi
Xiaokang Zhang
Xingkai Yu
Yu Wu
Z. F. Wu
Zhibin Gou
Zhihong Shao
Zhuoshu Li
Ziyi Gao
A. Liu
Bing Xue
Bingxuan Wang
Bo WU
Bei Feng
Chenggang Lu
Chenggang Zhao
Chengqi Deng
Chenyu Zhang
C. Ruan
Damai Dai
Deli Chen
Dong-Li Ji
Erhang Li
Fangyun Lin
Fucong Dai
Fuli Luo
Guangbo Hao
Guanting Chen
Guowei Li
Han Bao
Hanwei Xu
Haocheng Wang
Honghui Ding
Huajian Xin
Huazuo Gao
Hui Qu
Hui Li
Jianzhong Guo
Jiashi Li
Jiawei Wang
Jingchang Chen
Jingyang Yuan
Junjie Qiu
Junlong Li
Jinbo Cai
Jia Ni
Jian Liang
Jin Chen
Kai Dong
Kai Hu
Kaige Gao
Kang Guan
Kexin Huang
Kuai Yu
Lean Wang
Lecong Zhang
Liang Zhao
Litong Wang
Liyue Zhang
Lei Xu
Leyi Xia
Mingchuan Zhang
Minghua Zhang
Min Tang
Meng Li
Miaojun Wang
Mingming Li
Ning Tian
Panpan Huang
Meng Wang
Qiancheng Wang
Qinyu Chen
Qiushi Du
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Runji Wang
R. J. Chen
Rong Jin
Ruyi Chen
Shanghao Lu
Shangyan Zhou
Shanhuang Chen
Shengfeng Ye
Shiyu Wang
Shuiping Yu
Shunfeng Zhou
Shuting Pan
S. S. Li
Shuang Zhou
Shao-Ping Wu
Tao Yun
Tian Pei
Tianyu Sun
T. Wang
Wangding Zeng
Wanjia Zhao
Wen Liu
Wenfeng Liang
Wenjun Gao
Wen-Xuan Yu
Wentao Zhang
Wei Xiao
Wei An
Xiaodong Liu
Xiaohan Wang
Xiaokang Chen
Xiaotao Nie
Xin Cheng
Jian Li
Xinfeng Xie
Xingchao Liu
Xinyu Yang
Xinyuan Li
Xuecheng Su
Xuheng Lin
Xiangyu Jin
Xi-Cheng Shen
Xiaosha Chen
Xiaowen Sun
Xiaoxi-ang Wang
Xinnan Song
Xinyi Zhou
Xianzu Wang
Xinxia Shan
Y. K. Li
Y. Q. Wang
Y. X. Wei
Yang Zhang
Yan-Hong Xu
Yao Zhao
Yaofeng Sun
Yaohui Wang
Yi Yu
Yichao Zhang
Yifan Shi
Yi Xiong
Ying He
Yishi Piao
Yisong Wang
Yi Chern Tan
Yiyang Ma
Yiyuan Liu
Yongqiang Guo
Yuan Ou
Yuduan Wang
Yue Gong
Yuheng Zou
Yuzi He
Yunfan Xiong
Yuxiang Luo
Yuxiang You
Yu-mei You
Yuxuan Liu
Yuyang Zhou
Y. X. Zhu
Yanping Huang
Yaohui Li
Yang Li
Yi Zheng
Yunxiang Ma
Ying Tang
Yukun Zha
Yuting Yan
Z. Z. Ren
Zehui Ren
Zhangli Sha
Zhe Fu
Zhean Xu
Zhenda Xie
Zhengyan Zhang
Zhewen Hao
Zhicheng Ma
Zhigang Yan
Zhiyu Wu
Zihui Gu
Zijia Zhu
Zijun Liu
Zi-An Li
Ziwei Xie
Ziyang Song
Deep Ganguli
Liane Lovitt
Jackson Kernion
Amanda Askell
Yuntao Bai
Saurav Kadavath
Benjamin Mann
Nicholas Schiefer
Kamal Ndousse
Andy Jones
Sam Bowman
Anna Chen
Tom Con-erly
Nova Dassarma
Dawn Drain
Nelson Elhage Sheer
Stanislav Fort
Zac Hatfield-Dodds
T. Henighan
Danny Hernandez
Tristan Hume
Josh Jacobson
Scott Johnston
Shauna Kravec
Catherine Olsson
Sam Ringer
Eli Tran-Johnson
Dario Amodei
Tom Brown
Nicholas Joseph
Sam McCandlish
Chris Olah
Jared Kaplan
Jack Clark. 2022. Red
Aaron Grattafiori
Abhimanyu Dubey
Abhinav Jauhri
Abhinav Pandey
Abhishek Kadian
Ahmad Al-Dahle
Aiesha Letman
Akhil Mathur
Alan Schel-ten
Alex Vaughan
Amy Yang
Angela Fan
A. Hartshorn
Aobo Yang
Archi Mitra
Archie Sravankumar
Artem Korenev
Arthur Hinsvark
Arun Rao
Aston Zhang
Aurelien Ro-driguez
Austen Gregerson
Ava Spataru
Baptiste Rozière
Bethany Biron
Binh Tang
Bobbie Chern
Charlotte Caucheteux
Chaya Nayak
Chloe Bi
Chris Marra
Chris McConnell
Christian Keller
Christophe Touret
Chunyang Wu
Corinne Wong
Cris-tian Cantón Ferrer
Cyrus Nikolaidis
Damien Al-lonsius
Daniel Song
Danielle Pintz
Danny Livshits
Danny Wyatt
David Esiobu
Dhruv Choudhary
Dhruv Mahajan 0001
Diego Garcia-Olano
Diego Perino
Dieuwke Hupkes
Egor Lakomkin
Ehab A. AlBadawy
Elina Lobanova
Emily Dinan
Eric Michael Smith
Filip Radenovic
Francisco Guzmán
Frank Zhang
Gabriele Synnaeve
Gabrielle Lee
Georgia Lewis
G. Thattai
Graeme Nail
Gregoire Mi-alon
Guan Pang
Guillem Cucurell
Hailey Nguyen
Han-nah Korevaar
Hu Xu
Hugo Touvron
Imanol Iliyan Zarov
Arrieta Ibarra
Is-abel Kloumann
Ishan Misra
Ivan Evtimov
Jack Zhang
Jade Copet
Jaewon Lee
Jan Geffert
Jana Vranes
Jason Park
Jay Mahadeokar
Jeet Shah
Jelmer van der Linde
Jennifer Billock
Jenny Hong
Jenya Lee
Jeremy Fu
J. Fu
Jianfeng Chi
Jianyu Huang
Jiawen Liu
Jie Wang
Jiecao Yu
Joanna Bitton
Joe Spisak
Jongsoo Park
Joseph Rocca
J. Johnstun
Joshua Saxe
Junteng Jia
Kalyan Vasuden Alwala
Karthik Prasad
Kartikeya Upasani
Kate Plawiak
Keqian Li
Kenneth Heafield
Kevin R. Stone
Khalid El-Arini
Krithika Iyer
Kshitiz Malik
Kuen-ley Chiu
Kunal Bhalla
Kushal Lakhotia
Lauren Rantala-Yeary
Laurens van der Maaten
Lawrence Chen
Liang Tan
Liz Jenkins
Louis Martin
Lovish Madaan
Lubo Malo
Lukas Blecher
Lukas Landzaat
Luke de Oliveira
Madeline Muzzi
Mahesh Pasupuleti
Mannat Singh
Manohar Paluri
Marcin Kardas
Maria Tsimpoukelli
Mathew Oldham
Mathieu Rita
Maya Pavlova
Melanie Kam-badur
Mike Lewis
Mitesh Min Si
Kumar Singh
Mona Hassan
Naman Goyal
Narjes Torabi
Niko-lay Bashlykov
Nikolay Bogoychev
Niladri S. Chatterji
Ning Zhang
Olivier Duchenne
Onur Çelebi
Patrick Alrassy
Petar Pengwei Li
Peter Weng
Prajjwal Bhargava
Pratik Dubal
Punit Praveen Krishnan
Singh Koura
Puxin Xu
Qing He
Qingxiao Dong
Ragavan Srinivasan
Raj Ganapathy
Ramon Calderer
Ricardo Silveira Cabral
Robert Stojnic
Roberta Raileanu
Rohan Maheswari
Rohit Girdhar
Rohit Patel
Ro-main Sauvestre
Ron-nie Polidoro
Roshan Sumbaly
Ross Taylor
Ruan Silva
Rui Hou
Rui Wang
S. Hosseini
Sa-hana Chennabasappa
Sanjay Singh
Sean Bell
Seo-hyun Sonia Kim
Sergey Edunov
Shaoliang Nie
Sharan Narang
Sheng Shen
Shengye Wan
Shruti Bhosale
Shun Zhang
Simon Van-denhende
Soumya Batra
Spencer Whitman
Sten Sootla
Stephane Collot
Suchin Gururangan
S. Borodinsky
Tamar Herman
Tara Fowler
Tarek Sheasha
Thomas Georgiou
Thomas Scialom
Tobias Speckbacher
Todor Mihaylov
Tong Xiao
Ujjwal Karn
Vedanuj Goswami
Vibhor Gupta
Vignesh Ramanathan
Viktor Kerkez
Vincent Gonguet
Vir-ginie Do
Vish Vogeti
Vitor Albiero
Vladan Petro-vic
Weiwei Chu
Wenhan Xiong
Wenyin Fu
Self-Play Q-Learners Can Provably Collude in the Iterated Prisoner's Dilemma
Juan Agustin Duque
Emilio Calvano
A growing body of computational studies shows that simple machine learning agents converge to cooperative behaviors in social dilemmas, such… (see more) as collusive price-setting in oligopoly markets, raising questions about what drives this outcome. In this work, we provide theoretical foundations for this phenomenon in the context of self-play multi-agent Q-learners in the iterated prisoner’s dilemma. We characterize broad conditions under which such agents provably learn the cooperative Pavlov (win-stay, lose-shift) policy rather than the Pareto-dominated “always defect” policy. We validate our theoretical results through additional experiments, demonstrating their robustness across a broader class of deep learning algorithms.
Performative Prediction on Games and Mechanism Design
A Generative Approach to LLM Harmfulness Detection with Red Flag Tokens
Most safety training methods for large-language models (LLMs) based on fine-tuning rely on dramatically changing the output distribution of … (see more)the model when faced with a harmful request, shifting it from an unsafe answer to a refusal to respond. These methods inherently compromise model capabilities and might make auto-regressive models vulnerable to attacks that make likely an initial token of affirmative response. To avoid that, we propose to expand the model's vocabulary with a special token we call a *red flag token* (
LLM-Safety Evaluations Lack Robustness
Tim Beyer
Simon Geisler
Stephan Günnemann
In this paper, we argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of… (see more) noise, such as small datasets, methodological inconsistencies, and unreliable evaluation setups. This can, at times, make it impossible to evaluate and compare attacks and defenses fairly, thereby slowing progress. We systematically analyze the LLM safety evaluation pipeline, covering dataset curation, optimization strategies for automated red-teaming, response generation, and response evaluation using LLM judges. At each stage, we identify key issues and highlight their practical impact. We also propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers. Lastly, we offer an opposing perspective, highlighting practical reasons for existing limitations. We believe that addressing the outlined problems in future research will improve the field's ability to generate easily comparable results and make measurable progress.
Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives
Yan Scholten
Tom Wollschlager
Stephen Casper
Stephan Günnemann
Advantage Alignment Algorithms
The growing presence of artificially intelligent agents in everyday decision-making, from LLM assistants to autonomous vehicles, hints at a … (see more)future in which conflicts may arise from each agent optimizing individual interests. In general-sum games these conflicts are apparent, where naive Reinforcement Learning agents get stuck in Pareto-suboptimal Nash equilibria. Consequently, opponent shaping has been introduced as a method with success at finding socially beneficial equilibria in social dilemmas. In this work, we introduce Advantage Alignment, a family of algorithms derived from first principles that perform opponent shaping efficiently and intuitively. This is achieved by aligning the advantages of conflicting agents in a given game by increasing the probability of mutually-benefiting actions. We prove that existing opponent shaping methods, including LOLA and LOQA, implicitly perform Advantage Alignment. Compared to these works, Advantage Alignment mathematically simplifies the formulation of opponent shaping and seamlessly works for continuous action domains. We also demonstrate the effectiveness of our algorithm in a wide range of social dilemmas, achieving state of the art results in each case, including a social dilemma version of the Negotiation Game.