Publications

On the Analysis and Distillation of Emergent Outlier Properties in Pre-trained Language Models
Tianyang Zhao
Kunwar Yashraj Singh
Srikar Appalaraju
Peng Tang
Ying Nian Wu
Li Erran Li
Li
Nino Vieillard
Yongchao Zhou
Piotr Stańczyk
Sabela Ramos Garea
Matthieu Geist
Rohan Anil
Andrew M. Dai
Melvin Orhan Firat
Dmitry Lepikhin
Alexandre Passos
Siamak Shakeri
Emanuel Taropa … (see 478 more)
Paige Bailey
Zhifeng Chen
Eric Chu
Jonathan H. Clark
Laurent El
Yanping Huang
K. Meier-Hellstern
Gaurav Mishra
Erica Moreira
Mark Omernick
Kevin Robinson
Sebastian Ruder
Yi Tay
Kefan Xiao
Yuanzhong Xu
Yujing Zhang
Gustavo Hernández Abrego
Junwhan Ahn
Jacob Austin
Paul R. Barham
Jan Botha
James Bradbury
Siddhartha Brahma
Kevin Brooks
M. Catasta
Yong Cheng
Colin Cherry
Christopher A. Choquette-Choo
Aakanksha Chowdhery
Clé-ment Crepy
Shachi Dave
Mostafa Dehghani
Sunipa Dev
Jacob Devlin
Mark Díaz
Nan Du
Ethan Dyer
Vladimir Feinberg
Fangxiaoyu Feng
Vlad Fienber
Markus Freitag
Xavier Garcia
Sebastian Gehrmann
Lucas Gonzalez
Guy Gur-Ari
Steven Hand
Hadi Hashemi
Le Hou
Joshua Howland
Andrea Hu
Jeffrey Hui
Jeremy Hur-witz
Michael Acheson Isard
Abe Ittycheriah
Matthew Jagiel-ski
Wenhao Jia
Kathleen Kenealy
M. Krikun
Sneha Kudugunta 0001
Chang Lan
Kather-ine Lee
Benjamin Lee
Music Eric Li
Wei Li
YaGuang Li
Li Jian
Hyeontaek Li
Hanzhao Lim
Zhongtao Lin
Liu Frederick
Marcello Liu
Aroma Maggioni
Mahendru Joshua
Vedant Maynez
Maysam Misra
Moussalem Zachary
John Nado
E. Nham
Andrew Ni
Alicia Nys-trom
Marie Parrish
M. Pellat
Polacek Alex
Reiner Polozov
Siyuan Pope
Emily Qiao
Reif Bryan
Parker Richter
Alex Riley
Castro Ros
Aurko Roy
Brennan Saeta
Rajkumar Samuel
Renee Shelby
Ambrose Slone
Daniel Smilkov
David R. So
Daniel Sohn
Simon Tokumine
Dasha Valter
Haim-ing Bao
Mo Bavarian
Jeff Belgum
Ir-wan Bello
Jake Berdine
Gabriel Bernadett-Shapiro
Christopher Berner
Lenny Bogdonoff
Oleg Boiko
Madelaine Boyd
Anna-Luisa Brakman
Greg Brock-man
Tim Brooks
M. Brundage
Kevin Button
Trevor Cai
Rosie Campbell
Andrew Cann
Brittany Carey
Chelsea Carlson
Rory Carmichael
Brooke Chan
Che Chang
Fotis Chantzis
Derek Chen
Sully Chen
Ruby Chen
Jason Chen
Mark Chen
Benjamin Chess
Chester Cho
Hyung Casey Chu
Won Chung
Dave Cummings
Jeremiah Currier
Yunxing Dai
Tarun Goel
Gabriel Gogineni
Rapha Goh
Jonathan Gontijo-Lopes
Morgan Gordon
Scott Grafstein
Ryan Gray
Joshua Greene
Shixiang Shane Gross
Yufei Gu
Chris Guo
Jesse Hallacy
Jeff Han
Harris Yuchen
Mike He
Johannes Heaton
C. Heidecke
Alan Hesse
Wade Hickey
Peter Hickey
Hoeschele Brandon
Kenny Houghton
Shengli Hsu
Xin Hu
Joost Hu
Shantanu Huizinga
Shawn Jain
Jain Joanne
Angela Jang
Roger Jiang
Haozhun Jiang
Denny Jin
Shino Jin
Billie Jomoto
Hee-woo Jonn
Tomer Jun
Łukasz Kaftan
Ali Kaiser
Ingmar Ka-mali
Kanitscheider
Nitish Shirish
Keskar Tabarak
Logan Khan
J. Kilpatrick
Kim Christina
Yongjik Kim
Jan Hendrik Kim
Jamie Kirch-ner
Matt Kiros
Daniel Knight
Kokotajlo Łukasz
A. Kondraciuk
Aris Kondrich
Kyle Kon-stantinidis
Gretchen Kosic
Vishal Krueger
Michael Kuo
Ikai Lampe
Teddy Lan
Jan Lee
Jade Leike
Daniel Leung
Chak Ming Levy
Li Rachel
Molly Lim
Stephanie Lin
Mateusz Lin
Theresa Litwin
Ryan Lopez
Patricia Lowe
Lue Anna
Kim Makanju
S. Malfacini
Todor Manning
Yaniv Markov
Bianca Markovski
Katie Martin
Andrew Mayer
Bob Mayne
Scott Mayer McGrew
Christine McKinney
Paul McLeavey
McMillan Jake
David McNeil
Aalok Medina
Jacob Mehta
Luke Menick
Andrey Metz
Pamela Mishchenko
Vinnie Mishkin
Evan Monaco
Daniel Morikawa
Tong Mossing
Mira Mu
Oleg Murati
David Murk
Ashvin Mély
Reiichiro Nair
Rajeev Nakano
Nayak Arvind
Richard Neelakantan
Hyeonwoo Ngo
Noh Long
Cullen Ouyang
Jakub O’Keefe
Alex Pachocki
J. Paino
Ashley Palermo
Pantuliano
Carl Ross
Bob Rotsted
Henri Roussez
Nick Ry-der
Mario Saltarelli
Ted Sanders
Shibani Santurkar
Girish Sastry
Heather Schmidt
David Schnurr
John Schulman
Daniel Selsam
Kyla Sheppard
Toki Sherbakov
Jessica Shieh
Sarah Shoker
Pranav Shyam
Szymon Sidor
Eric Sigler
Maddie Simens
Jordan Sitkin
Katarina Slama
Ian Sohl
Benjamin D. Sokolowsky
Yang Song
Natalie Staudacher
Clemens Winter
Samuel Wolrich
Hannah Wong
Lauren Workman
Sherwin Wu
Michael Wu
Kai Xiao
Tao Xu
Sarah Yoo
Kevin Yu
Qim-ing Yuan
Wojciech Zaremba
Rowan G. Zellers
Chong Zhang
Marvin Zhang
Tianhao Shengjia Zhao
Ouyang Long
Jeff Wu
Xu Jiang
Diogo Almeida
C. Wainwright
Pamela Mishkin
Sandhini Agarwal
Alex Ray
Jacob Hilton
Fraser Kelton
Luke Miller
Amanda Askell
Peter Welinder
Paul F. Christiano
Jan Leike
Ryan Lowe. 2022
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
Gregory Chanan
Trevor Killeen
Ze-Bin Lin
Natalia Gimelshein
L. Antiga
Alban Desmaison
Andreas Köpf
Edward Yang
Zachary DeVito
Martin Raison
A. Tejani
Sasank Chilamkurthy
Benoit Steiner
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
Felice
Dell’Orletta. 2022. Outlier
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya Ramesh
Gabriel Goh
Girish Sas-try
J. Clark
Rewon Child
David Luan
Victor Sanh
Alex Webson
Colin Raffel
Stephen H. Bach
Lintang A. Sutawika
Zaid Alyafeai
Antoine Chaffin
Arnaud Stiegler
Arun Raja
Saiful Bari
Canwen Xu
Urmish Thakker
Shanya Sharma Sharma
Eliza Szczechla
Taewoon Kim 0002
Gunjan Chhablani
Ni-hal Nayak
Debajyoti Datta
Mike Jonathan Chang
Tian-Jian Jiang
Han Wang
Matteo Manica
Sheng Shen
Zheng-Xin Yong
Harshit Pandey
Rachel Bawden
Thomas Wang
Trishala Neeraj
Jos Rozen
Abheesht Sharma
Thibault Févry
Jason Alan Fries
Ryan Teehan
Teven Le Scao
Stella Biderman
Leo Gao
Thomas Wolf 0008
A. M. R. 2022
Multi-task
Richard Socher
Alex Perelygin
Jean Wu
Jason Chuang
Christopher D Manning
Andrew Ng
Christopher Potts
Recursive
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal
Md. Shoeb
Abubakar Abid
Adam Fisch
Adam R. Brown
Adam Santoro
Aditya Gupta
Adrià Garriga-Alonso
Agnieszka Kluska
Aitor Lewkowycz
Akshat Agarwal
Alethea Power
Alex Warstadt
Alexander W. Kocurek
Ali Safaya
Ali Tazarv
Alice Xiang
Alicia Parrish
Allen Nie
Aman Hussain
Amanda Dsouza
Ameet Rahane
Anantharaman S. Iyer
Anders Johan Andreassen
Andrea Madotto
Andrea Santilli
Andreas Stuhlmüller
Andrew La
Andrew Lampinen
Andy Zou
Angela Jiang
Angelica Chen
Anh Vuong
Animesh Gupta
Anna Gottardi
Antonio Norelli
Anu Venkatesh
Arash Gholamidavoodi
Arfa Tabassum
Arul Menezes
Arun Kirubara-jan
Asher Mullokandov
Ashish Sabharwal
Austin Herrick
Avia Efrat
Aykut Erdem
Ayla Karaka¸s
Ryan Roberts
Bao Sheng Loe
Barret Zoph
Bartłomiej Bojanowski
Batuhan Özyurt
Behnam Hedayatnia
Behnam Neyshabur
Benjamin Inden
Benno Stein
Berk Ekmekci
Bill Yuchen
Blake Lin
Bryan Howald
Cameron Orinion
Cameron Diao
Catherine Dour
Cedrick Stinson
César Argueta
Chandan Ferri
Charles Singh
Chenlin Rathkopf
Chitta Meng
C. Baral
Chris Wu
Chris Callison-Burch
Christopher Waites
Christo-pher D Voigt
Cindy Potts
E. RamirezClara
Clemencia Rivera
Colin Siro
Court-ney Raffel
Cristina Ashcraft
Damien Garbacea
Sileo Dan
Dan Garrette
Dan Hendrycks
Dan Kilman
C. Roth
C. Daniel Freeman
Daniel Khashabi
Daniel Moseguí González
Danielle Perszyk
Danny Hernandez
Danqi Chen
The BrowserGym Ecosystem for Web Agent Research
Maxime Gasse
Alexandre Lacoste
Massimo Caccia
Lawrence Keunho Jang
Ori Yoran
Dehan Kong
Frank F. Xu
Graham Neubig
Ruslan Salakhutdinov
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (see more)utomation and Large Language Models (LLMs). Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. In an earlier work, Drouin et al. (2024) introduced BrowserGym which aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature and includes AgentLab, a complementary framework that aids in agent creation, testing, and analysis. Our proposed ecosystem offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks made available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
Neural network training is inherently sensitive to initialization and the randomness induced by stochastic gradient descent. However, it is … (see more)unclear to what extent such effects lead to meaningfully different networks, either in terms of the models' weights or the underlying functions that were learned. In this work, we show that during the initial "chaotic" phase of training, even extremely small perturbations reliably causes otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time. We quantify this divergence through (i)
"On the goals of linguistic theory": Revisiting Chomskyan theories in the era of AI
Masoud Jasbi
Theoretical linguistics seeks to explain what human language is, and why. Linguists and cognitive scientists have proposed different theoret… (see more)ical models of what language is, as well as cognitive factors that shape it, and allow humans to 'produce', 'understand', and 'acquire' natural languages. However, humans may no longer be the only ones learning to 'generate', 'parse', and 'learn' natural language: artificial intelligence (AI) models such as large language models are proving to have impressive linguistic capabilities. Many are thus questioning what role, if any, such models should play in helping theoretical linguistics reach its ultimate research goals? In this paper, we propose to answer this question, by reiterating the tenets of generative linguistics, a leading school of thought in the field, and by considering how AI models as theories of language relate to each of these important concepts. Specifically, we consider three foundational principles, finding roots in the early works of Noam Chomsky: (1) levels of theoretical adequacy; (2) procedures for linguistic theory development; (3) language learnability and Universal Grammar. In our discussions of each principle, we give special attention to two types of AI models: neural language models and neural grammar induction models. We will argue that such models, in particular neural grammar induction models, do have a role to play, but that this role is largely modulated by the stance one takes regarding each of these three guiding principles.
The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
Reinforcement learning (RL) has recently become a strong recipe for training reasoning LLMs that produce long chains of thought (LongCoT). Y… (see more)et the standard RL"thinking environment", where the state is the prompt plus all prior reasoning tokens, makes the state unbounded and forces attention-based policies to pay quadratic compute as thoughts lengthen. We revisit the environment itself. We propose Markovian Thinking, a paradigm in which the policy advances reasoning while conditioning on a constant-size state, decoupling thinking length from context size. As an immediate consequence this yields linear compute with constant memory. We instantiate this idea with Delethink, an RL environment that structures reasoning into fixed-size chunks. Within each chunk, the model thinks as usual; at the boundary, the environment resets the context and reinitializes the prompt with a short carryover. Through RL, the policy learns to write a textual state near the end of each chunk sufficient for seamless continuation of reasoning after reset. Trained in this environment, an R1-Distill 1.5B model reasons in 8K-token chunks yet thinks up to 24K tokens, matching or surpassing LongCoT-RL trained with a 24K budget. With test-time scaling, Delethink continues to improve where LongCoT plateaus. The effect of linear compute is substantial: we empirically estimate at 96K average thinking length LongCoT-RL costs 27 H100-months vs. 7 for Delethink. Analysis at RL initialization shows off-the-shelf reasoning models (1.5B-120B) often sample Markovian traces zero-shot across diverse benchmarks, providing positive samples that make RL effective at scale. Our results show that redesigning the thinking environment is a powerful lever: it enables very long reasoning without quadratic overhead and opens a path toward efficient, scalable reasoning LLMs.
The Normative Leadership of the World Health Organization : a quantitative analysis 
Jean-Louis Denis
Pierre Larouche
Miriam Cohen
The role of AI for MRI-analysis in multiple sclerosis—A brief overview
Jean-Pierre R. Falet
Steven Nobile
Aliya Szpindel
Joshua D. Durso-Finley
Douglas Arnold
The Singapore Consensus on Global AI Safety Research Priorities
Luke Ong
Stuart Russell
Dawn Song
Max Tegmark
Lan Xue
Ya-Qin Zhang
Stephen Casper
Wan Sie Lee
Vanessa Wilfred
Vidhisha Balachandran
Fazl Barez
Michael Belinsky
Ima Bello
Malo Bourgon
Mark Brakel
Simeon Campos
Duncan Cass-Beggs … (see 67 more)
Jiahao Chen
Rumman Chowdhury
Chua Kuan Seah
Jeff Clune
Juntao Dai
Agnes Delaborde
Francisco Eiras
Joshua Engels
Jinyu Fan
Adam Gleave
Noah Goodman
Fynn Heide
Johannes Heidecke
Dan Hendrycks
Cyrus Hodes
Bryan Low
Minlie Huang
Sami Jawhar
Jingyu Wang
Adam Kalai
Meindert Kamphuis
Mohan Kankanhalli
Subhash Kantamneni
Mathias Kirk Bonde
Thomas Kwa
Jeffrey Ladish
Kwok Yan Lam
Wan Sie Lee
Taewhi Lee
Xiaojian Li
Jiajun Liu
Chaochao Lu
Yifan Mai
Richard Mallah
Julian Michael
Nicolas Moës
Simon Moeller
Kihyuk Nam
Kwan Yee Ng
Mark Nitzberg
Besmira Nushi
Seán Ó hÉigeartaigh
Alejandro Ortega
Pierre Peigné
James Petrie
Nayat Sanchez-Pi
Sarah Schwettmann
Buck Shlegeris
Saad Siddiqui
Anu Sinha
Martin Soto
Cheston Tan
Anthony Tung
William Tjhi
Robert Trager
Brian Tse
Anthony Tung
John Willes
Denise Wong
Wei Xu
Rongwu Xu
Yi Zeng
Hongjiang Zhang
Djordje Zikelic
Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to en… (see more)sure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential – it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. This requires policymakers, industry, researchers and the broader public to collectively work toward securing positive outcomes from AI’s development. AI safety research is a key dimension. Given that the state of science today for building trustworthy AI does not fully cover all risks, accelerated investment in research is required to keep pace with commercially driven growth in system capabilities. Goals: The 2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety aims to support research in this important space by bringing together AI scientists across geographies to identify and synthesise research priorities in AI safety. The result, The Singapore Consensus on Global AI Safety Research Priorities, builds on the International AI Safety Report-A (IAISR) chaired by Yoshua Bengio and backed by 33 governments. By adopting a defence-in-depth model, this document organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment), and challenges with monitoring and intervening after deployment (Control). Through the Singapore Consensus, we hope to globally facilitate meaningful conversations between AI scientists and AI policymakers for maximally beneficial outcomes. Our goal is to enable more impactful R&D efforts to rapidly develop safety and evaluation mechanisms and foster a trusted ecosystem where AI is harnessed for the public good.
The Size of Teachers as a Measure of Data Complexity: PAC-Bayes Excess Risk Bounds and Scaling Laws
We study the generalization properties of randomly initialized neural networks, under the assumption that the network is larger than some un… (see more)known "teacher" network that achieves low risk. We extend the analysis of Buzaglo et al. (2024) to allow for student networks of arbitrary width and depth, and to the setting where no (small) teacher network perfectly interpolates the data. We obtain an oracle inequality, relating the risk of Gibbs posterior sampling to that of narrow teacher networks. As a result, the sample complexity is once again bounded in terms of the size of narrow teacher networks that themselves achieve small risk. We then introduce a new notion of data complexity, based on the minimal size of a teacher network required to achieve a certain level of excess risk. By comparing the scaling laws resulting from our bounds to those observed in empirical studies, we are able to estimate the data complexity of standard benchmarks according to our measure.
Towards contrast-agnostic soft segmentation of the spinal cord
Enamundram Naga Karthik
Charidimos Tsagkas
Emanuele Pravatà
Cristina Granziera
Andrew Smith
Kenneth Arnold Weber II
Spinal cord segmentation is clinically relevant and is notably used to compute spinal cord cross-sectional area (CSA) for the diagnosis and … (see more)monitoring of cord compression or neurodegenerative diseases such as multiple sclerosis. While several semi and automatic methods exist, one key limitation remains: the segmentation depends on the MRI contrast, resulting in different CSA across contrasts. This is partly due to the varying appearance of the boundary between the spinal cord and the cerebrospinal fluid that depends on the sequence and acquisition parameters. This contrast-sensitive CSA adds variability in multi-center studies where protocols can vary, reducing the sensitivity to detect subtle atrophies. Moreover, existing methods enhance the CSA variability by training one model per contrast, while also producing binary masks that do not account for partial volume effects. In this work, we present a deep learning-based method that produces soft segmentations of the spinal cord. Using the Spine Generic Public Database of healthy participants (
Towards More Realistic Extraction Attacks: An Adversarial Perspective
Language models are prone to memorizing their training data, making them vulnerable to extraction attacks. While existing research often exa… (see more)mines isolated setups, such as a single model or a fixed prompt, real-world adversaries have a considerably larger attack surface due to access to models across various sizes and checkpoints, and repeated prompting. In this paper, we revisit extraction attacks from an adversarial perspective -- with multi-faceted access to the underlying data. We find significant churn in extraction trends, i.e., even unintuitive changes to the prompt, or targeting smaller models and earlier checkpoints, can extract distinct information. By combining multiple attacks, our adversary doubles (
Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code
Vahid Majdinasab
Amin Nikanjam
Code auditing ensures that the developed code adheres to standards, regulations, and copyright protection by verifying that it does not cont… (see more)ain code from protected sources. The recent advent of Large Language Models (LLMs) as coding assistants in the software development process poses new challenges for code auditing. The dataset for training these models is mainly collected from publicly available sources. This raises the issue of intellectual property infringement as developers' codes are already included in the dataset. Therefore, auditing code developed using LLMs is challenging, as it is difficult to reliably assert if an LLM used during development has been trained on specific copyrighted codes, given that we do not have access to the training datasets of these models. Given the non-disclosure of the training datasets, traditional approaches such as code clone detection are insufficient for asserting copyright infringement. To address this challenge, we propose a new approach, TraWiC; a model-agnostic and interpretable method based on membership inference for detecting code inclusion in an LLM's training dataset. We extract syntactic and semantic identifiers unique to each program to train a classifier for detecting code inclusion. In our experiments, we observe that TraWiC is capable of detecting 83.87% of codes that were used to train an LLM. In comparison, the prevalent clone detection tool NiCad is only capable of detecting 47.64%. In addition to its remarkable performance, TraWiC has low resource overhead in contrast to pair-wise clone detection that is conducted during the auditing process of tools like CodeWhisperer reference tracker, across thousands of code snippets.