Publications

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Bertie Vidgen

Adarsh Agrawal

Ahmed M. Ahmed

Victor Akinwande

Namir Al-nuaimi

Najla Alfaraj

Elie Alhajjar

Lora Aroyo

Trupti Bavalatti

Borhane Blili-Hamelin

K. Bollacker

Rishi Bomassani

Marisa Ferrara Boston

Sim'eon Campos

Kal Chakra

Canyu Chen

Cody Coleman

Zacharie Delpierre Coudert

Leon Strømberg Derczynski

Debojyoti Dutta … (see 77 more)

Ian Eisenberg

James R. Ezick

Heather Frase

Brian Fuller

Ram Gandikota

Agasthya Gangavarapu

Ananya Gangavarapu

James Gealy

Rajat Ghosh

James Goel

Usman Gohar

Sujata Goswami

Scott A. Hale

Wiebke Hutiri

Joseph Marvin Imperial

Surgan Jandial

Nicholas C. Judd

Felix Juefei-Xu

Foutse Khomh

Bhavya Kailkhura

Hannah Rose Kirk

Kevin Klyman

Chris Knotz

Michael Kuchnik

Shachi H. Kumar

Chris Lengerich

Bo Li

Zeyi Liao

Eileen Peters Long

Victor Lu

Yifan Mai

Priyanka Mary Mammen

Kelvin Manyeki

Sean McGregor

Virendra Mehta

Shafee Mohammed

Emanuel Moss

Lama Nachman

Dinesh Jinenhally Naganna

Amin Nikanjam

Besmira Nushi

Luis Oala

Iftach Orr

Alicia Parrish

Çigdem Patlak

William Pietri

Forough Poursabzi-Sangdeh

Eleonora Presani

Fabrizio Puletti

Paul Rottger

Saurav Sahay

Tim Santos

Nino Scherrer

Alice Schoenauer Sebag

Patrick Schramowski

Abolfazl Shahbazi

Vin Sharma

Xudong Shen

Vamsi Sistla

Leonard Tang

Davide Testuggine

Vithursan Thangarasa

Elizabeth A Watkins

Rebecca Weiss

Christoper A. Welty

Tyler Wilbers

Adina Williams

Carole-Jean Wu

Poonam Yadav

Xianjun Yang

Yi Zeng

Wenhui Zhang

Fedor Zhdanov

Jiacheng Zhu

Percy Liang

Peter Mattson

Joaquin Vanschoren

This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchm… (see more)ark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.

2024-04-18

ArXiv (preprint)

doi.org

arxiv.org

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Bertie Vidgen

Adarsh Agrawal

Ahmed M. Ahmed

Victor Akinwande

Namir Al-nuaimi

Najla Alfaraj

Elie Alhajjar

Lora Aroyo

Trupti Bavalatti

Borhane Blili-Hamelin

K. Bollacker

Rishi Bomassani

Marisa Ferrara Boston

Sim'eon Campos

Kal Chakra

Canyu Chen

Cody Coleman

Zacharie Delpierre Coudert

Leon Strømberg Derczynski

Debojyoti Dutta … (see 77 more)

Ian Eisenberg

James R. Ezick

Heather Frase

Brian Fuller

Ram Gandikota

Agasthya Gangavarapu

Ananya Gangavarapu

James Gealy

Rajat Ghosh

James Goel

Usman Gohar

Sujata Goswami

Scott A. Hale

Wiebke Hutiri

Joseph Marvin Imperial

Surgan Jandial

Nicholas C. Judd

Felix Juefei-Xu

Foutse Khomh

Bhavya Kailkhura

Hannah Rose Kirk

Kevin Klyman

Chris Knotz

Michael Kuchnik

Shachi H. Kumar

Chris Lengerich

Bin Li

Zeyi Liao

Eileen Peters Long

Victor Lu

Yifan Mai

Priyanka Mary Mammen

Kelvin Manyeki

Sean McGregor

Virendra Mehta

Shafee Mohammed

Emanuel Moss

Lama Nachman

Dinesh Jinenhally Naganna

Amin Nikanjam

Besmira Nushi

Luis Oala

Iftach Orr

Alicia Parrish

Çigdem Patlak

William Pietri

Forough Poursabzi-Sangdeh

Eleonora Presani

Fabrizio Puletti

Paul Rottger

Saurav Sahay

Tim Santos

Nino Scherrer

Alice Schoenauer Sebag

Patrick Schramowski

Abolfazl Shahbazi

Vin Sharma

Xudong Shen

Vamsi Sistla

Leonard Tang

Davide Testuggine

Vithursan Thangarasa

Elizabeth A Watkins

Rebecca Weiss

Christoper A. Welty

Tyler Wilbers

Adina Williams

Carole-Jean Wu

Poonam Yadav

Xianjun Yang

Yi Zeng

Wenhui Zhang

Fedor Zhdanov

Jiacheng Zhu

Percy Liang

Peter Mattson

Joaquin Vanschoren

2024-04-18

ArXiv (preprint)

doi.org

arxiv.org

On learning history-based policies for controlling Markov decision processes

Gandharv Patil

Aditya Mahajan

Doina Precup

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

doi.org

openreview.net

Length independent PAC-Bayes bounds for Simple RNNs

Volodimir Mitarchuk

Clara Lacroce

Rémi Eyraud

Rémi Emonet

Amaury Habrard

Guillaume Rabusseau

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Multiphase Black Hole Feedback and a Bright [C ii] Halo in a LoBAL Quasar at z ∼ 6.6

Manuela Bischetti

Hyunseop Choi

Fabrizio Fiore

Chiara Feruglio

Stefano Carniani

Valentina D'Odorico

Eduardo Banados

Huanqing Chen

Roberto Decarli

Simona Gallerani

Julie Hlavacek-Larrondo

Samuel Lai

K. Leighly

Chiara Mazzucchelli

Laurence Perreault-Levasseur

Roberta Tripodi

Fabian Walter

Feige Wang

Jinyi Yang

Maria Vittoria Zanchettin … (see 1 more)

Yongda Zhu

2024-04-18

ArXiv (preprint)

doi.org

arxiv.org

Multiphase Black Hole Feedback and a Bright [C ii] Halo in a LoBAL Quasar at z ∼ 6.6

Manuela Bischetti

Hyunseop 현섭 Choi 최

Fabrizio Fiore

Chiara Feruglio

Stefano Carniani

Valentina D'Odorico

Eduardo Banados

Huanqing Chen

Roberto Decarli

Simona Gallerani

J. Hlavacek-Larrondo

Samuel Lai

Karen M. Leighly

Chiara Mazzucchelli

Laurence Perreault-Levasseur

Roberta Tripodi

Fabian Walter

Feige Wang

Jinyi Yang

Maria Vittoria Zanchettin … (see 1 more)

Yongda Zhu

2024-04-18

ArXiv (preprint)

doi.org

arxiv.org

Multi-phase black-hole feedback and a bright [CII] halo in a Lo-BAL quasar at $z\sim6.6$

Manuela Bischetti

Hyunseop 현섭 Choi 최

Fabrizio Fiore

Chiara Feruglio

Stefano Carniani

Valentina D'Odorico

Eduardo Banados

Huanqing Chen

Roberto Decarli

Simona Gallerani

J. Hlavacek-Larrondo

Samuel Lai

Karen M. Leighly

Chiara Mazzucchelli

Laurence Perreault-Levasseur

Roberta Tripodi

Fabian Walter

Feige Wang

Jinyi Yang

Maria Vittoria Zanchettin … (see 1 more)

Yongda Zhu

2024-04-18

ArXiv (preprint)

doi.org

arxiv.org

Multi-resolution Time-Series Transformer for Long-term Forecasting

Yitian Zhang

Liheng Ma

Soumyasundar Pal

Yingxue Zhang

Mark Coates

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

doi.org

arxiv.org

Simulating weighted automata over sequences and trees with transformers

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

doi.org

arxiv.org

Tackling the XAI Disagreement Problem with Regional Explanations

Gabriel Laberge

Yann Batiste Pequignot

Mario Marchand

Foutse Khomh

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

On the Privacy of Selection Mechanisms with Gaussian Noise

Jonathan Lebensold

Doina Precup

Borja Balle

Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding … (see more)noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand, when instantiated using Gaussian noise, standard analyses only yield approximate DP guarantees despite the fact that the outputs of these mechanisms lie in a discrete space. In this work, we revisit the analysis of Report Noisy Max and Above Threshold with Gaussian noise and show that, under the additional assumption that the underlying queries are bounded, it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold. The resulting bounds are tight and depend on closed-form expressions that can be numerically evaluated using standard methods. Empirically we find these lead to tighter privacy accounting in the high privacy, low data regime. Further, we propose a simple privacy filter for composing pure ex-post DP guarantees, and use it to derive a fully adaptive Gaussian Sparse Vector Technique mechanism. Finally, we provide experiments on mobility and energy consumption datasets demonstrating that our Sparse Vector Technique is practically competitive with previous approaches and requires less hyper-parameter tuning.

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

doi.org

arxiv.org

Weight-Sharing Regularization

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

doi.org