Publications

Scaling Trends in Language Model Robustness

Nikolaus Howe

Ian McKenzie

Oskar Hollinsworth

Michał Zając

Tom Tseng

Aaron Tucker

Pierre-Luc Bacon

Adam Gleave

Increasing model size has unlocked a dazzling array of capabilities in modern language models. At the same time, even frontier models remain… (see more) vulnerable to jailbreaks and prompt injections, despite concerted efforts to make them robust. As both attack and defense gain access to more compute, and as models become larger, what happens to robustness? We argue that to answer this question requires a \emph{scaling} approach, which we employ in an extensive study of language model robustness across several classification tasks, model families, and adversarial attacks. We find that in the absence of explicit safety training, larger models are not consistently more robust; however, scale improves sample efficiency in adversarial training, though it worsens compute efficiency. Further, we find that increasing attack compute smoothly improves attack success rate against both undefended and adversarially trained models. Finally, after exploring robustness transfer across attacks and threat models, we combine attack and defense scaling rates to study the offense-defense balance. We find that while attack scaling outpaces adversarial training across all models studied, larger adversarially trained models might give defense the advantage in the long run. These results underscore the utility of the scaling lens, and provide a paradigm for evaluating future attacks and defenses on frontier models.

2025-04-30

ICML.cc/2025/Conference (poster)

doi.org

proceedings.mlr.press

SCAR: Shapley Credit Assignment for More Efficient RLHF

Meng Cao

Shuyuan Zhang

Xiao-Wen Chang

Doina Precup

2025-04-30

arXiv (published)

doi.org

arxiv.org

SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs

Roozbeh Aghili

Xingfang Wu

Foutse Khomh

Heng Li

2025-04-30

arXiv (published)

doi.org

arxiv.org

Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models

Lucas Berry

Axel Brando

Wei-Di Chang

Juan Higuera

David Meger

2025-04-30

arXiv (published)

doi.org

openreview.net

Self-Evolving Curriculum for LLM Reasoning

Nicolas Gontier

Ehsan Kamalloo

2025-04-30

arXiv (published)

doi.org

arxiv.org

Self-Play Q-Learners Can Provably Collude in the Iterated Prisoner's Dilemma

Quentin Bertrand

Juan Agustin Duque

Emilio Calvano

Gauthier Gidel

A growing body of computational studies shows that simple machine learning agents converge to cooperative behaviors in social dilemmas, such… (see more) as collusive price-setting in oligopoly markets, raising questions about what drives this outcome. In this work, we provide theoretical foundations for this phenomenon in the context of self-play multi-agent Q-learners in the iterated prisoner’s dilemma. We characterize broad conditions under which such agents provably learn the cooperative Pavlov (win-stay, lose-shift) policy rather than the Pareto-dominated “always defect” policy. We validate our theoretical results through additional experiments, demonstrating their robustness across a broader class of deep learning algorithms.

2025-04-30

ICML.cc/2025/Conference (poster)

doi.org

proceedings.mlr.press

Structure-Aligned Protein Language Model

Can Chen

David Heurtel-Depeiges

Robert M. Vernon

Christopher J. Langmead

Yoshua Bengio

Quentin Fournier

2025-04-30

arXiv (published)

doi.org

arxiv.org

On the generalization of language models from in-context learning and finetuning: a controlled study

Andrew Lampinen

Arslan Chaudhry

Stephanie C.Y. Chan

Cody Wild

Diane Wan

Alexander Y. Ku

Alex Ku

Jörg Bornschein

Razvan Pascanu

Murray P. Shanahan

James L McClelland

2025-04-30

arXiv (published)

doi.org

arxiv.org

The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages

Chris Emezue

The NaijaVoices Community

Busayo Awobade

Abraham Owodunni

Handel Emezue

Gloria Monica Tobechukwu Emezue

N. N. Emezue

Sewade Ogun

Bunmi Akinremi

David Ifeoluwa Adelani

Christopher Pal

2025-04-30

arXiv (published)

doi.org

arxiv.org

TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories

Honghua Dong

Jiacheng Yang

Xun Deng

Yuhe Jiang

Gennady Pekhimenko

Fan Long

Xujie Si

2025-04-30

ICML.cc/2025/Conference (poster)

proceedings.mlr.press

Virtual Cells: Predict, Explain, Discover

Emmanuel Noutahi

Jason Hartford

Prudencio Tossou

Shawn Whitfield

Ali Denton

Cas Wognum

Kristina Ulicna

Michael Craig

Jonathan Hsu

Michael Cuccarese

Emmanuel Bengio

Dominique Beaini

Christopher Gibson

Daniel Cohen

Berton Earnshaw

2025-04-30

arXiv (published)

doi.org

arxiv.org

Caffeine induces age-dependent increases in brain complexity and criticality during sleep

Philipp Thölke

Maxine Arcand-Lavigne

Tarek Lajnef

Sonia Frenette

Julie Carrier

Karim Jerbi

Caffeine is the most widely consumed psychoactive stimulant worldwide. Yet important gaps persist in understanding its effects on the brain,… (see more) especially during sleep. We analyzed sleep electroencephalography (EEG) in 40 subjects, contrasting 200 mg of caffeine against a placebo condition, utilizing inferential statistics and machine learning. We found that caffeine ingestion led to an increase in brain complexity, a widespread flattening of the power spectrum’s 1/f-like slope, and a reduction in long-range temporal correlations. Being most prominent during non-rapid eye movement (NREM) sleep, these results suggest that caffeine shifts the brain towards a critical regime and more diverse neural dynamics. Interestingly, this was more pronounced in younger adults (20–27 years) compared to middle-aged participants (41–58 years) during rapid eye movement (REM) sleep, while no significant age effects were observed during NREM. Interpreting these data in the light of modeling and empirical work on EEG-derived measures of excitation-inhibition balance suggests that caffeine promotes a shift in brain dynamics towards increased neural excitation and closer proximity to a critical regime, particularly during NREM sleep.

2025-04-29

Communications Biology (published)

doi.org

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications