Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias
Yu Yang
Eric Gan
Baharan Mirzasoleiman
Neural networks trained with (stochastic) gradient descent have an inductive bias towards learning simpler solutions. This makes them highly… (see more) prone to learning spurious correlations in the training data, that may not hold at test time. In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations. Notably, we show that examples with spurious features are provably separable based on the model's output early in training. We further illustrate that if spurious features have a small enough noise-to-signal ratio, the network's output on the majority of examples is almost exclusively determined by the spurious features, leading to poor worst-group test accuracy. Finally, we propose SPARE, which identifies spurious correlations early in training and utilizes importance sampling to alleviate their effect. Empirically, we demonstrate that SPARE outperforms state-of-the-art methods by up to 21.1% in worst-group accuracy, while being up to 12x faster. We also show that SPARE is a highly effective but lightweight method to discover spurious correlations.
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Bertie Vidgen
Adarsh Agrawal
Ahmed M. Ahmed
Victor Akinwande
Namir Al-nuaimi
Najla Alfaraj
Elie Alhajjar
Lora Aroyo
Trupti Bavalatti
Borhane Blili-Hamelin
K. Bollacker
Rishi Bomassani
Marisa Ferrara Boston
Sim'eon Campos
Kal Chakra
Canyu Chen
Cody Coleman
Zacharie Delpierre Coudert
Leon Strømberg Derczynski
Debojyoti Dutta … (see 77 more)
Ian Eisenberg
James R. Ezick
Heather Frase
Brian Fuller
Ram Gandikota
Agasthya Gangavarapu
Ananya Gangavarapu
James Gealy
Rajat Ghosh
James Goel
Usman Gohar
Sujata Goswami
Scott A. Hale
Wiebke Hutiri
Joseph Marvin Imperial
Surgan Jandial
Nicholas C. Judd
Felix Juefei-Xu
Bhavya Kailkhura
Hannah Rose Kirk
Kevin Klyman
Chris Knotz
Michael Kuchnik
Shachi H. Kumar
Chris Lengerich
Bo Li
Zeyi Liao
Eileen Peters Long
Victor Lu
Yifan Mai
Priyanka Mary Mammen
Kelvin Manyeki
Sean McGregor
Virendra Mehta
Shafee Mohammed
Emanuel Moss
Lama Nachman
Dinesh Jinenhally Naganna
Amin Nikanjam
Besmira Nushi
Luis Oala
Iftach Orr
Alicia Parrish
Çigdem Patlak
William Pietri
Forough Poursabzi-Sangdeh
Eleonora Presani
Fabrizio Puletti
Paul Rottger
Saurav Sahay
Tim Santos
Nino Scherrer
Alice Schoenauer Sebag
Patrick Schramowski
Abolfazl Shahbazi
Vin Sharma
Xudong Shen
Vamsi Sistla
Leonard Tang
Davide Testuggine
Vithursan Thangarasa
Elizabeth A Watkins
Rebecca Weiss
Christoper A. Welty
Tyler Wilbers
Adina Williams
Carole-Jean Wu
Poonam Yadav
Xianjun Yang
Yi Zeng
Wenhui Zhang
Fedor Zhdanov
Jiacheng Zhu
Percy Liang
Peter Mattson
Joaquin Vanschoren
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchm… (see more)ark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Bertie Vidgen
Adarsh Agrawal
Ahmed M. Ahmed
Victor Akinwande
Namir Al-nuaimi
Najla Alfaraj
Elie Alhajjar
Lora Aroyo
Trupti Bavalatti
Borhane Blili-Hamelin
K. Bollacker
Rishi Bomassani
Marisa Ferrara Boston
Sim'eon Campos
Kal Chakra
Canyu Chen
Cody Coleman
Zacharie Delpierre Coudert
Leon Strømberg Derczynski
Debojyoti Dutta … (see 77 more)
Ian Eisenberg
James R. Ezick
Heather Frase
Brian Fuller
Ram Gandikota
Agasthya Gangavarapu
Ananya Gangavarapu
James Gealy
Rajat Ghosh
James Goel
Usman Gohar
Sujata Goswami
Scott A. Hale
Wiebke Hutiri
Joseph Marvin Imperial
Surgan Jandial
Nicholas C. Judd
Felix Juefei-Xu
Bhavya Kailkhura
Hannah Rose Kirk
Kevin Klyman
Chris Knotz
Michael Kuchnik
Shachi H. Kumar
Chris Lengerich
Bo Li
Zeyi Liao
Eileen Peters Long
Victor Lu
Yifan Mai
Priyanka Mary Mammen
Kelvin Manyeki
Sean McGregor
Virendra Mehta
Shafee Mohammed
Emanuel Moss
Lama Nachman
Dinesh Jinenhally Naganna
Amin Nikanjam
Besmira Nushi
Luis Oala
Iftach Orr
Alicia Parrish
Çigdem Patlak
William Pietri
Forough Poursabzi-Sangdeh
Eleonora Presani
Fabrizio Puletti
Paul Rottger
Saurav Sahay
Tim Santos
Nino Scherrer
Alice Schoenauer Sebag
Patrick Schramowski
Abolfazl Shahbazi
Vin Sharma
Xudong Shen
Vamsi Sistla
Leonard Tang
Davide Testuggine
Vithursan Thangarasa
Elizabeth A Watkins
Rebecca Weiss
Christoper A. Welty
Tyler Wilbers
Adina Williams
Carole-Jean Wu
Poonam Yadav
Xianjun Yang
Yi Zeng
Wenhui Zhang
Fedor Zhdanov
Jiacheng Zhu
Percy Liang
Peter Mattson
Joaquin Vanschoren
Introducing v0.5 of the AI Safety Benchmark from MLCommons
Bertie Vidgen
Adarsh Agrawal
Ahmed M. Ahmed
Victor Akinwande
Namir Al-nuaimi
Najla Alfaraj
Elie Alhajjar
Lora Aroyo
Trupti Bavalatti
Borhane Blili-Hamelin
K. Bollacker
Rishi Bomassani
Marisa Ferrara Boston
Sim'eon Campos
Kal Chakra
Canyu Chen
Cody Coleman
Zacharie Delpierre Coudert
Leon Strømberg Derczynski
Debojyoti Dutta … (see 77 more)
Ian Eisenberg
James R. Ezick
Heather Frase
Brian Fuller
Ram Gandikota
Agasthya Gangavarapu
Ananya Gangavarapu
James Gealy
Rajat Ghosh
James Goel
Usman Gohar
Sujata Goswami
Scott A. Hale
Wiebke Hutiri
Joseph Marvin Imperial
Surgan Jandial
Nicholas C. Judd
Felix Juefei-Xu
Bhavya Kailkhura
Hannah Rose Kirk
Kevin Klyman
Chris Knotz
Michael Kuchnik
Shachi H. Kumar
Chris Lengerich
Bo Li
Zeyi Liao
Eileen Peters Long
Victor Lu
Yifan Mai
Priyanka Mary Mammen
Kelvin Manyeki
Sean McGregor
Virendra Mehta
Shafee Mohammed
Emanuel Moss
Lama Nachman
Dinesh Jinenhally Naganna
Amin Nikanjam
Besmira Nushi
Luis Oala
Iftach Orr
Alicia Parrish
Çigdem Patlak
William Pietri
Forough Poursabzi-Sangdeh
Eleonora Presani
Fabrizio Puletti
Paul Rottger
Saurav Sahay
Tim Santos
Nino Scherrer
Alice Schoenauer Sebag
Patrick Schramowski
Abolfazl Shahbazi
Vin Sharma
Xudong Shen
Vamsi Sistla
Leonard Tang
Davide Testuggine
Vithursan Thangarasa
Elizabeth A Watkins
Rebecca Weiss
Christoper A. Welty
Tyler Wilbers
Adina Williams
Carole-Jean Wu
Poonam Yadav
Xianjun Yang
Yi Zeng
Wenhui Zhang
Fedor Zhdanov
Jiacheng Zhu
Percy Liang
Peter Mattson
Joaquin Vanschoren
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchm… (see more)ark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.
On learning history-based policies for controlling Markov decision processes
Gandharv Patil
Length independent PAC-Bayes bounds for Simple RNNs
Volodimir Mitarchuk
Clara Lacroce
Rémi Eyraud
Rémi Emonet
Amaury Habrard
Multiphase Black Hole Feedback and a Bright [C ii] Halo in a LoBAL Quasar at z ∼ 6.6
Manuela Bischetti
Hyunseop Choi
Fabrizio Fiore
Chiara Feruglio
Stefano Carniani
Valentina D'Odorico
Eduardo Banados
Huanqing Chen
Roberto Decarli
Simona Gallerani
Julie Hlavacek-larrondo
Samuel Lai
K. Leighly
Chiara Mazzucchelli
Roberta Tripodi
Fabian Walter
Feige Wang
Jinyi Yang
Maria Vittoria Zanchettin … (see 1 more)
Yongda Zhu
Multiphase Black Hole Feedback and a Bright [C ii] Halo in a LoBAL Quasar at z ∼ 6.6
Manuela Bischetti
Hyunseop Choi
Fabrizio Fiore
Chiara Feruglio
Stefano Carniani
Valentina D'Odorico
Eduardo Banados
Huanqing Chen
Roberto Decarli
Simona Gallerani
Julie Hlavacek-larrondo
Samuel Lai
K. Leighly
Chiara Mazzucchelli
Roberta Tripodi
Fabian Walter
Feige Wang
Jinyi Yang
Maria Vittoria Zanchettin … (see 1 more)
Yongda Zhu
Multi-phase black-hole feedback and a bright [CII] halo in a Lo-BAL quasar at $z\sim6.6$
Manuela Bischetti
Hyunseop Choi
Fabrizio Fiore
Chiara Feruglio
Stefano Carniani
Valentina D'Odorico
Eduardo Banados
Huanqing Chen
Roberto Decarli
Simona Gallerani
Julie Hlavacek-larrondo
Samuel Lai
K. Leighly
Chiara Mazzucchelli
Roberta Tripodi
Fabian Walter
Feige Wang
Jinyi Yang
Maria Vittoria Zanchettin … (see 1 more)
Yongda Zhu
Multi-resolution Time-Series Transformer for Long-term Forecasting
Yitian Zhang
Liheng Ma
Soumyasundar Pal
Yingxue Zhang
Simulating weighted automata over sequences and trees with transformers
Michael Rizvi-Martel
Maude Lizaire
Clara Lacroce