Publications

The Singapore Consensus on Global AI Safety Research Priorities
Luke Ong
Stuart Russell
Dawn Song
Max Tegmark
Lan Xue
Ya-Qin Zhang
Stephen Casper
Wan Sie Lee
Vanessa Wilfred
Vidhisha Balachandran
Fazl Barez
Michael Belinsky
Imane Bello
Malo Bourgon
Mark Brakel
Sim'eon Campos
Duncan Cass-Beggs … (see 67 more)
Jiahao Chen
Rumman Chowdhury
Kuan Chua Seah
Jeff Clune
Juntao Dai
Agnès Delaborde
Nouha Dziri
Francisco Eiras
Joshua Engels
Jinyu Fan
Adam Gleave
Noah D. Goodman
Fynn Heide
Johannes Heidecke
Dan Hendrycks
Cyrus Hodes
Bryan Low Kian Hsiang
Minlie Huang
Sami Jawhar
Jingyu Wang
Adam Tauman Kalai
Meindert Kamphuis
Mohan S. Kankanhalli
Subhash Kantamneni
Mathias Bonde Kirk
Thomas Kwa
Jeffrey Ladish
Kwok-Yan Lam
Wan Lee Sie
Taewhi Lee
Xiaojian Li
Jiajun Liu
Chaochao Lu
Yifan Mai
Richard Mallah
Julian Michael
Nick Moës
Simon Möller
Kihyuk Nam
Kwan Yee Ng
Mark Nitzberg
Besmira Nushi
Sean O hEigeartaigh
Alejandro Ortega
Pierre Peigné
James Petrie
Nayat Sanchez-Pi
Sarah Schwettmann
Buck Shlegeris
Saad Siddiqui
Aradhana Sinha
Martín Soto
Cheston Tan
Dong Ting
William Tjhi
Robert Trager
Brian Tse
H. AnthonyTungK.
John Willes
Denise Wong
Wei Xu
Rongwu Xu
Yi Zeng 0005
HongJiang Zhang
Djordje Zikelic
The Size of Teachers as a Measure of Data Complexity: PAC-Bayes Excess Risk Bounds and Scaling Laws
We study the generalization properties of randomly initialized neural networks, under the assumption that the network is larger than some un… (see more)known "teacher" network that achieves low risk. We extend the analysis of Buzaglo et al. (2024) to allow for student networks of arbitrary width and depth, and to the setting where no (small) teacher network perfectly interpolates the data. We obtain an oracle inequality, relating the risk of Gibbs posterior sampling to that of narrow teacher networks. As a result, the sample complexity is once again bounded in terms of the size of narrow teacher networks that themselves achieve small risk. We then introduce a new notion of data complexity, based on the minimal size of a teacher network required to achieve a certain level of excess risk. By comparing the scaling laws resulting from our bounds to those observed in empirical studies, we are able to estimate the data complexity of standard benchmarks according to our measure.
The Superposition of Diffusion Models Using the Itô Density Estimator
Lazar Atanackovic
Alexander Tong
The Cambrian explosion of easily accessible pre-trained diffusion models suggests a demand for methods that combine multiple different pre-t… (see more)rained diffusion models without incurring the significant computational burden of re-training a larger combined model. In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. Theoretically, we derive superposition from rigorous first principles stemming from the celebrated continuity equation and design two novel algorithms tailor-made for combining diffusion models in SuperDiff. SuperDiff leverages a new scalable It\^o density estimator for the log likelihood of the diffusion SDE which incurs no additional overhead compared to the well-known Hutchinson's estimator needed for divergence calculations. We demonstrate that SuperDiff is scalable to large pre-trained diffusion models as superposition is performed solely through composition during inference, and also enjoys painless implementation as it combines different pre-trained vector fields through an automated re-weighting scheme. Notably, we show that SuperDiff is efficient during inference time, and mimics traditional composition operators such as the logical OR and the logical AND. We empirically demonstrate the utility of using SuperDiff for generating more diverse images on CIFAR-10, more faithful prompt conditioned image editing using Stable Diffusion, as well as improved conditional molecule generation and unconditional de novo structure design of proteins. https://github.com/necludov/super-diffusion
Towards contrast-agnostic soft segmentation of the spinal cord
Sandrine Bédard
Enamundram Naga Karthik
Charidimos Tsagkas
Emanuele Pravatà
Cristina Granziera
Andrew C. Smith
Kenneth Arnold Weber
Spinal cord segmentation is clinically relevant and is notably used to compute spinal cord cross-sectional area (CSA) for the diagnosis and … (see more)monitoring of cord compression or neurodegenerative diseases such as multiple sclerosis. While several semi and automatic methods exist, one key limitation remains: the segmentation depends on the MRI contrast, resulting in different CSA across contrasts. This is partly due to the varying appearance of the boundary between the spinal cord and the cerebrospinal fluid that depends on the sequence and acquisition parameters. This contrast-sensitive CSA adds variability in multi-center studies where protocols can vary, reducing the sensitivity to detect subtle atrophies. Moreover, existing methods enhance the CSA variability by training one model per contrast, while also producing binary masks that do not account for partial volume effects. In this work, we present a deep learning-based method that produces soft segmentations of the spinal cord. Using the Spine Generic Public Database of healthy participants (
Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code
Vahid Majdinasab
Amin Nikanjam
Code auditing ensures that the developed code adheres to standards, regulations, and copyright protection by verifying that it does not cont… (see more)ain code from protected sources. The recent advent of Large Language Models (LLMs) as coding assistants in the software development process poses new challenges for code auditing. The dataset for training these models is mainly collected from publicly available sources. This raises the issue of intellectual property infringement as developers' codes are already included in the dataset. Therefore, auditing code developed using LLMs is challenging, as it is difficult to reliably assert if an LLM used during development has been trained on specific copyrighted codes, given that we do not have access to the training datasets of these models. Given the non-disclosure of the training datasets, traditional approaches such as code clone detection are insufficient for asserting copyright infringement. To address this challenge, we propose a new approach, TraWiC; a model-agnostic and interpretable method based on membership inference for detecting code inclusion in an LLM's training dataset. We extract syntactic and semantic identifiers unique to each program to train a classifier for detecting code inclusion. In our experiments, we observe that TraWiC is capable of detecting 83.87% of codes that were used to train an LLM. In comparison, the prevalent clone detection tool NiCad is only capable of detecting 47.64%. In addition to its remarkable performance, TraWiC has low resource overhead in contrast to pair-wise clone detection that is conducted during the auditing process of tools like CodeWhisperer reference tracker, across thousands of code snippets.
Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning
Supriyo Chakraborty
Nima Chitsazan
This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models … (see more)undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl
Training Language Models to Self-Correct via Reinforcement Learning
Aviral Kumar
Vincent Zhuang
Yi Su
John D Co-Reyes
Avi Singh
Kate Baumli
Shariq Iqbal
Colton Bishop
Rebecca Roelofs
Lei M Zhang
Kay McKinney
Disha Shrivastava
Cosmin Paduraru
George Tucker
Feryal Behbahani
Aleksandra Faust
Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffecti… (see more)ve in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are insufficient for instilling self-correction behavior. In particular, we observe that training via SFT either suffers from a distribution mismatch between the training data and the model's own responses or implicitly prefers only a certain mode of correction behavior that is often not effective at test time. SCoRe addresses these challenges by training under the model's own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction strategy that is effective at test time as opposed to simply fitting high-reward responses for a given prompt. This regularization prescribes running a first phase of RL on a base model to generate a policy initialization that is less susceptible to collapse and then using a reward bonus to amplify self-correction during training. When applied to Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on the MATH and HumanEval benchmarks.
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Brian R. Bartoldson
James Diffenderfer
Moksh J. Jain
Tal Ben-Nun
Minsu Kim
Bhavya Kailkhura
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Brian R. Bartoldson
James Diffenderfer
Moksh J. Jain
Tal Ben-Nun
Minsu Kim
Bhavya Kailkhura
TransCeption: Enhancing medical image segmentation with an inception-like transformer design for efficient feature fusion
Reza Azad
Yiwei Jia
Ehsan Khodapanah Aghdam
Dorit Merhof
Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems
Emma Harvey
Emily Sheng
Su Lin Blodgett
Alexandra Chouldechova
Jean Garcia-Gathright
Hanna Wallach
The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language mo… (see more)del (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments - even useful instruments - are not used by practitioners due to practical and institutional barriers impeding their uptake. Drawing on measurement theory and pragmatic measurement, we provide recommendations for addressing these challenges to better meet practitioner needs.
Variation Matters: from Mitigating to Embracing Zero-Shot NAS Ranking Function Variation
Pavel Rumiantsev
Neural Architecture Search (NAS) is a powerful automatic alternative to manual design of a neural network. In the zero-shot version, a fast … (see more)ranking function is used to compare architectures without training them. The outputs of the ranking functions often vary significantly due to different sources of randomness, including the evaluated architecture's weights' initialization or the batch of data used for calculations. A common approach to addressing the variation is to average a ranking function output over several evaluations. We propose taking into account the variation in a different manner, by viewing the ranking function output as a random variable representing a proxy performance metric. During the search process, we strive to construct a stochastic ordering of the performance metrics to determine the best architecture. Our experiments show that the proposed stochastic ordering can effectively boost performance of a search on standard benchmark search spaces.