Publications

Scaling Trends in Language Model Robustness
Nikolaus Howe
Ian McKenzie
Oskar Hollinsworth
Michał Zając
Tom Tseng
Aaron Tucker
Adam Gleave
Increasing model size has unlocked a dazzling array of capabilities in modern language models. At the same time, even frontier models remain… (see more) vulnerable to jailbreaks and prompt injections, despite concerted efforts to make them robust. As both attack and defense gain access to more compute, and as models become larger, what happens to robustness? We argue that to answer this question requires a \emph{scaling} approach, which we employ in an extensive study of language model robustness across several classification tasks, model families, and adversarial attacks. We find that in the absence of explicit safety training, larger models are not consistently more robust; however, scale improves sample efficiency in adversarial training, though it worsens compute efficiency. Further, we find that increasing attack compute smoothly improves attack success rate against both undefended and adversarially trained models. Finally, after exploring robustness transfer across attacks and threat models, we combine attack and defense scaling rates to study the offense-defense balance. We find that while attack scaling outpaces adversarial training across all models studied, larger adversarially trained models might give defense the advantage in the long run. These results underscore the utility of the scaling lens, and provide a paradigm for evaluating future attacks and defenses on frontier models.
SCAR: Shapley Credit Assignment for More Efficient RLHF
Meng Cao
Xiao-Wen Chang
SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs
Roozbeh Aghili
Xingfang Wu
Heng Li
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models
Lucas Berry
Axel Brando
Wei-Di Chang
Juan Higuera
Self-Evolving Curriculum for LLM Reasoning
Self-Play Q-Learners Can Provably Collude in the Iterated Prisoner's Dilemma
Juan Agustin Duque
Emilio Calvano
A growing body of computational studies shows that simple machine learning agents converge to cooperative behaviors in social dilemmas, such… (see more) as collusive price-setting in oligopoly markets, raising questions about what drives this outcome. In this work, we provide theoretical foundations for this phenomenon in the context of self-play multi-agent Q-learners in the iterated prisoner’s dilemma. We characterize broad conditions under which such agents provably learn the cooperative Pavlov (win-stay, lose-shift) policy rather than the Pareto-dominated “always defect” policy. We validate our theoretical results through additional experiments, demonstrating their robustness across a broader class of deep learning algorithms.
Structure-Aligned Protein Language Model
Can Chen
David Heurtel-Depeiges
Robert M. Vernon
Christopher J. Langmead
On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew Lampinen
Arslan Chaudhry
Stephanie C.Y. Chan
Cody Wild
Diane Wan
Alexander Y. Ku
Alex Ku
Murray P. Shanahan
James L McClelland
The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages
The NaijaVoices Community
Busayo Awobade
Abraham Owodunni
Handel Emezue
Gloria Monica Tobechukwu Emezue
N. N. Emezue
Sewade Ogun
Bunmi Akinremi
Christopher Pal
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
Honghua Dong
Jiacheng Yang
Xun Deng
Yuhe Jiang
Gennady Pekhimenko
Fan Long
Virtual Cells: Predict, Explain, Discover
Emmanuel Noutahi
Jason Hartford
Ali Denton
Kristina Ulicna
Michael Craig
Jonathan Hsu
Michael Cuccarese
Christopher Gibson
Daniel Cohen
Berton Earnshaw
Caffeine induces age-dependent increases in brain complexity and criticality during sleep
Maxine Arcand-Lavigne
Tarek Lajnef
Sonia Frenette
Julie Carrier
Caffeine is the most widely consumed psychoactive stimulant worldwide. Yet important gaps persist in understanding its effects on the brain,… (see more) especially during sleep. We analyzed sleep electroencephalography (EEG) in 40 subjects, contrasting 200 mg of caffeine against a placebo condition, utilizing inferential statistics and machine learning. We found that caffeine ingestion led to an increase in brain complexity, a widespread flattening of the power spectrum’s 1/f-like slope, and a reduction in long-range temporal correlations. Being most prominent during non-rapid eye movement (NREM) sleep, these results suggest that caffeine shifts the brain towards a critical regime and more diverse neural dynamics. Interestingly, this was more pronounced in younger adults (20–27 years) compared to middle-aged participants (41–58 years) during rapid eye movement (REM) sleep, while no significant age effects were observed during NREM. Interpreting these data in the light of modeling and empirical work on EEG-derived measures of excitation-inhibition balance suggests that caffeine promotes a shift in brain dynamics towards increased neural excitation and closer proximity to a critical regime, particularly during NREM sleep.