Publications

Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias
Yu Yang
Eric Gan
Baharan Mirzasoleiman
Neural networks trained with (stochastic) gradient descent have an inductive bias towards learning simpler solutions. This makes them highly… (see more) prone to learning spurious correlations in the training data, that may not hold at test time. In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations. Notably, we show that examples with spurious features are provably separable based on the model's output early in training. We further illustrate that if spurious features have a small enough noise-to-signal ratio, the network's output on the majority of examples is almost exclusively determined by the spurious features, leading to poor worst-group test accuracy. Finally, we propose SPARE, which identifies spurious correlations early in training and utilizes importance sampling to alleviate their effect. Empirically, we demonstrate that SPARE outperforms state-of-the-art methods by up to 21.1% in worst-group accuracy, while being up to 12x faster. We also show that SPARE is a highly effective but lightweight method to discover spurious correlations.
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Bertie Vidgen
Adarsh Agrawal
Ahmed M. Ahmed
Victor Akinwande
Namir Al-nuaimi
Najla Alfaraj
Elie Alhajjar
Lora Aroyo
Trupti Bavalatti
Borhane Blili-Hamelin
K. Bollacker
Rishi Bomassani
Marisa Ferrara Boston
Sim'eon Campos
Kal Chakra
Canyu Chen
Cody Coleman
Zacharie Delpierre Coudert
Leon Strømberg Derczynski
Debojyoti Dutta … (see 77 more)
Ian Eisenberg
James R. Ezick
Heather Frase
Brian Fuller
Ram Gandikota
Agasthya Gangavarapu
Ananya Gangavarapu
James Gealy
Rajat Ghosh
James Goel
Usman Gohar
Sujata Goswami
Scott A. Hale
Wiebke Hutiri
Joseph Marvin Imperial
Surgan Jandial
Nicholas C. Judd
Felix Juefei-Xu
Bhavya Kailkhura
Hannah Rose Kirk
Kevin Klyman
Chris Knotz
Michael Kuchnik
Shachi H. Kumar
Chris Lengerich
Bo Li
Zeyi Liao
Eileen Peters Long
Victor Lu
Yifan Mai
Priyanka Mary Mammen
Kelvin Manyeki
Sean McGregor
Virendra Mehta
Shafee Mohammed
Emanuel Moss
Lama Nachman
Dinesh Jinenhally Naganna
Amin Nikanjam
Besmira Nushi
Luis Oala
Iftach Orr
Alicia Parrish
Çigdem Patlak
William Pietri
Forough Poursabzi-Sangdeh
Eleonora Presani
Fabrizio Puletti
Paul Rottger
Saurav Sahay
Tim Santos
Nino Scherrer
Alice Schoenauer Sebag
Patrick Schramowski
Abolfazl Shahbazi
Vin Sharma
Xudong Shen
Vamsi Sistla
Leonard Tang
Davide Testuggine
Vithursan Thangarasa
Elizabeth A Watkins
Rebecca Weiss
Christoper A. Welty
Tyler Wilbers
Adina Williams
Carole-Jean Wu
Poonam Yadav
Xianjun Yang
Yi Zeng
Wenhui Zhang
Fedor Zhdanov
Jiacheng Zhu
Percy Liang
Peter Mattson
Joaquin Vanschoren
On learning history-based policies for controlling Markov decision processes
Gandharv Patil
Multi-phase black-hole feedback and a bright [CII] halo in a Lo-BAL quasar at $z\sim6.6$
Manuela Bischetti
Hyunseop Choi
Fabrizio Fiore
Chiara Feruglio
Stefano Carniani
Valentina D'odorico
Eduardo Banados
Huanqing Chen
Roberto Decarli
Simona Gallerani
Julie Hlavacek-larrondo
Samuel Lai
K. Leighly
Chiara Mazzucchelli
Roberta Tripodi
Fabian Walter
Feige Wang
Jinyi Yang
Maria Vittoria Zanchettin … (see 1 more)
Yongda Zhu
Multi-resolution Time-Series Transformer for Long-term Forecasting
Yitian Zhang
Liheng Ma
Soumyasundar Pal
Yingxue Zhang
Simulating weighted automata over sequences and trees with transformers
Michael Rizvi
Maude Lizaire
Clara Lacroce
On the Privacy of Selection Mechanisms with Gaussian Noise
Jonathan Lebensold
Borja Balle
Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding … (see more)noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand, when instantiated using Gaussian noise, standard analyses only yield approximate DP guarantees despite the fact that the outputs of these mechanisms lie in a discrete space. In this work, we revisit the analysis of Report Noisy Max and Above Threshold with Gaussian noise and show that, under the additional assumption that the underlying queries are bounded, it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold. The resulting bounds are tight and depend on closed-form expressions that can be numerically evaluated using standard methods. Empirically we find these lead to tighter privacy accounting in the high privacy, low data regime. Further, we propose a simple privacy filter for composing pure ex-post DP guarantees, and use it to derive a fully adaptive Gaussian Sparse Vector Technique mechanism. Finally, we provide experiments on mobility and energy consumption datasets demonstrating that our Sparse Vector Technique is practically competitive with previous approaches and requires less hyper-parameter tuning.
Weight-Sharing Regularization
Mehran Shakerinava
Motahareh Sohrabi
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G Parameswaran
Effects of gene dosage on cognitive ability: A function-based association study across brain and non-brain processes
Guillaume Huguet
Thomas Renne
Cécile Poulain
Alma Dubuc
Kuldeep Kumar
Sayeh Kazem
Worrawat Engchuan
Omar Shanta
Elise Douard
Catherine Proulx
Martineau Jean-Louis
Zohra Saci
Josephine Mollon
Laura Schultz
Emma E M Knowles
Simon R. Cox
David Porteous
Gail Davies
Paul Redmond
Sarah E. Harris … (see 10 more)
Gunter Schumann
Aurélie Labbe
Zdenka Pausova
Tomas Paus
Stephen W Scherer
Jonathan Sebat
Laura Almasy
David C. Glahn
Sébastien Jacquemont
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar
Abulhair Saparov
Javier Rando
Daniel Paleka
Miles Turpin
Peter Hase
Ekdeep Singh Lubana
Erik Jenner
Stephen Casper
Oliver Sourbut
Benjamin L. Edelman
Zhaowei Zhang
Mario Gunther
Anton Korinek
Jose Hernandez-Orallo
Lewis Hammond
Eric J Bigelow
Alexander Pan
Lauro Langosco
Tomasz Korbak … (see 18 more)
Heidi Zhang
Ruiqi Zhong
Sean 'o H'eigeartaigh
Gabriel Recchia
Giulio Corsi
Alan Chan
Markus Anderljung
Lilian Edwards
Danqi Chen
Samuel Albanie
Jakob Nicolaus Foerster
Florian Tramèr
He He
Atoosa Kasirzadeh
Yejin Choi
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are o… (see more)rganized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose