Publications

Categorical Generative Model Evaluation via Synthetic Distribution Coarsening

Florence Regol

As we expect to see a rapid integration of generative models in our day to day lives, the development of rigorous methods of evaluation and … (see more)analysis for generative models has never been more pressing. Multiple works have highlighted the shortcomings of widely used metrics and exposed how they fail to behave as expected in some settings. So far, the response has been to use a variety of metrics that target different desirable and interpretable properties such as fidelity, diversity, and authenticity, to obtain a clearer picture of a generative model’s capabilities. These methods mainly focus on ordinal data and they all suffer from the same unavoidable issues stemming from estimating quantities of high-dimensional data from a limited number of samples. We propose to take an alternative approach and to return to the synthetic data setting where the ground truth is explicit and known. We focus on nominal categorical data and introduce an evaluation method that can scale to the high-dimensional settings often encountered in practice. Our method involves successively binning the large space to obtain smaller probability spaces and coarser distributions where meaningful statistical estimates can be obtained. This allows us to provide probabilistic guarantees and sample complexities and we illustrate how our method can be applied to distinguish between the capabilities of several state-of-the-art categorical models.

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Conditions on Preference Relations that Guarantee the Existence of Optimal Policies

Jonathan Colaco Carr

Prakash Panangaden

Doina Precup

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

doi.org

arxiv.org

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

Yu Yang

Eric Gan

Gintare Karolina Dziugaite

Baharan Mirzasoleiman

Neural networks trained with (stochastic) gradient descent have an inductive bias towards learning simpler solutions. This makes them highly… (see more) prone to learning spurious correlations in the training data, that may not hold at test time. In this work, we provide the first theoretical analysis of the effect of simplicity bias on learning spurious correlations. Notably, we show that examples with spurious features are provably separable based on the model's output early in training. We further illustrate that if spurious features have a small enough noise-to-signal ratio, the network's output on the majority of examples is almost exclusively determined by the spurious features, leading to poor worst-group test accuracy. Finally, we propose SPARE, which identifies spurious correlations early in training and utilizes importance sampling to alleviate their effect. Empirically, we demonstrate that SPARE outperforms state-of-the-art methods by up to 21.1% in worst-group accuracy, while being up to 12x faster. We also show that SPARE is a highly effective but lightweight method to discover spurious correlations.

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

doi.org

openreview.net

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Bertie Vidgen

Adarsh Agrawal

Ahmed M. Ahmed

Victor Akinwande

Namir Al-nuaimi

Najla Alfaraj

Elie Alhajjar

Lora Aroyo

Trupti Bavalatti

Borhane Blili-Hamelin

K. Bollacker

Rishi Bomassani

Marisa Ferrara Boston

Sim'eon Campos

Kal Chakra

Canyu Chen

Cody Coleman

Zacharie Delpierre Coudert

Leon Strømberg Derczynski

Debojyoti Dutta … (see 77 more)

Ian Eisenberg

James R. Ezick

Heather Frase

Brian Fuller

Ram Gandikota

Agasthya Gangavarapu

Ananya Gangavarapu

James Gealy

Rajat Ghosh

James Goel

Usman Gohar

Sujata Goswami

Scott A. Hale

Wiebke Hutiri

Joseph Marvin Imperial

Surgan Jandial

Nicholas C. Judd

Felix Juefei-Xu

Foutse Khomh

Bhavya Kailkhura

Hannah Rose Kirk

Kevin Klyman

Chris Knotz

Michael Kuchnik

Shachi H. Kumar

Chris Lengerich

Bo Li

Zeyi Liao

Eileen Peters Long

Victor Lu

Yifan Mai

Priyanka Mary Mammen

Kelvin Manyeki

Sean McGregor

Virendra Mehta

Shafee Mohammed

Emanuel Moss

Lama Nachman

Dinesh Jinenhally Naganna

Amin Nikanjam

Besmira Nushi

Luis Oala

Iftach Orr

Alicia Parrish

Çigdem Patlak

William Pietri

Forough Poursabzi-Sangdeh

Eleonora Presani

Fabrizio Puletti

Paul Rottger

Saurav Sahay

Tim Santos

Nino Scherrer

Alice Schoenauer Sebag

Patrick Schramowski

Abolfazl Shahbazi

Vin Sharma

Xudong Shen

Vamsi Sistla

Leonard Tang

Davide Testuggine

Vithursan Thangarasa

Elizabeth A Watkins

Rebecca Weiss

Christoper A. Welty

Tyler Wilbers

Adina Williams

Carole-Jean Wu

Poonam Yadav

Xianjun Yang

Yi Zeng

Wenhui Zhang

Fedor Zhdanov

Jiacheng Zhu

Percy Liang

Peter Mattson

Joaquin Vanschoren

2024-04-18

ArXiv (preprint)

doi.org

arxiv.org

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Bertie Vidgen

Adarsh Agrawal

Ahmed M. Ahmed

Victor Akinwande

Namir Al-nuaimi

Najla Alfaraj

Elie Alhajjar

Lora Aroyo

Trupti Bavalatti

Borhane Blili-Hamelin

K. Bollacker

Rishi Bomassani

Marisa Ferrara Boston

Sim'eon Campos

Kal Chakra

Canyu Chen

Cody Coleman

Zacharie Delpierre Coudert

Leon Strømberg Derczynski

Debojyoti Dutta … (see 77 more)

Ian Eisenberg

James R. Ezick

Heather Frase

Brian Fuller

Ram Gandikota

Agasthya Gangavarapu

Ananya Gangavarapu

James Gealy

Rajat Ghosh

James Goel

Usman Gohar

Sujata Goswami

Scott A. Hale

Wiebke Hutiri

Joseph Marvin Imperial

Surgan Jandial

Nicholas C. Judd

Felix Juefei-Xu

Foutse Khomh

Bhavya Kailkhura

Hannah Rose Kirk

Kevin Klyman

Chris Knotz

Michael Kuchnik

Shachi H. Kumar

Chris Lengerich

Bo Li

Zeyi Liao

Eileen Peters Long

Victor Lu

Yifan Mai

Priyanka Mary Mammen

Kelvin Manyeki

Sean McGregor

Virendra Mehta

Shafee Mohammed

Emanuel Moss

Lama Nachman

Dinesh Jinenhally Naganna

Amin Nikanjam

Besmira Nushi

Luis Oala

Iftach Orr

Alicia Parrish

Çigdem Patlak

William Pietri

Forough Poursabzi-Sangdeh

Eleonora Presani

Fabrizio Puletti

Paul Rottger

Saurav Sahay

Tim Santos

Nino Scherrer

Alice Schoenauer Sebag

Patrick Schramowski

Abolfazl Shahbazi

Vin Sharma

Xudong Shen

Vamsi Sistla

Leonard Tang

Davide Testuggine

Vithursan Thangarasa

Elizabeth A Watkins

Rebecca Weiss

Christoper A. Welty

Tyler Wilbers

Adina Williams

Carole-Jean Wu

Poonam Yadav

Xianjun Yang

Yi Zeng

Wenhui Zhang

Fedor Zhdanov

Jiacheng Zhu

Percy Liang

Peter Mattson

Joaquin Vanschoren

This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchm… (see more)ark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.

2024-04-18

ArXiv (preprint)

doi.org

arxiv.org

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Bertie Vidgen

Adarsh Agrawal

Ahmed M. Ahmed

Victor Akinwande

Namir Al-nuaimi

Najla Alfaraj

Elie Alhajjar

Lora Aroyo

Trupti Bavalatti

Borhane Blili-Hamelin

K. Bollacker

Rishi Bomassani

Marisa Ferrara Boston

Sim'eon Campos

Kal Chakra

Canyu Chen

Cody Coleman

Zacharie Delpierre Coudert

Leon Strømberg Derczynski

Debojyoti Dutta … (see 77 more)

Ian Eisenberg

James R. Ezick

Heather Frase

Brian Fuller

Ram Gandikota

Agasthya Gangavarapu

Ananya Gangavarapu

James Gealy

Rajat Ghosh

James Goel

Usman Gohar

Sujata Goswami

Scott A. Hale

Wiebke Hutiri

Joseph Marvin Imperial

Surgan Jandial

Nicholas C. Judd

Felix Juefei-Xu

Foutse Khomh

Bhavya Kailkhura

Hannah Rose Kirk

Kevin Klyman

Chris Knotz

Michael Kuchnik

Shachi H. Kumar

Chris Lengerich

Bo Li

Zeyi Liao

Eileen Peters Long

Victor Lu

Yifan Mai

Priyanka Mary Mammen

Kelvin Manyeki

Sean McGregor

Virendra Mehta

Shafee Mohammed

Emanuel Moss

Lama Nachman

Dinesh Jinenhally Naganna

Amin Nikanjam

Besmira Nushi

Luis Oala

Iftach Orr

Alicia Parrish

Çigdem Patlak

William Pietri

Forough Poursabzi-Sangdeh

Eleonora Presani

Fabrizio Puletti

Paul Rottger

Saurav Sahay

Tim Santos

Nino Scherrer

Alice Schoenauer Sebag

Patrick Schramowski

Abolfazl Shahbazi

Vin Sharma

Xudong Shen

Vamsi Sistla

Leonard Tang

Davide Testuggine

Vithursan Thangarasa

Elizabeth A Watkins

Rebecca Weiss

Christoper A. Welty

Tyler Wilbers

Adina Williams

Carole-Jean Wu

Poonam Yadav

Xianjun Yang

Yi Zeng

Wenhui Zhang

Fedor Zhdanov

Jiacheng Zhu

Percy Liang

Peter Mattson

Joaquin Vanschoren

This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchm… (see more)ark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark.

2024-04-18

ArXiv (preprint)

doi.org

arxiv.org

On learning history-based policies for controlling Markov decision processes

Gandharv Patil

Aditya Mahajan

Doina Precup

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

doi.org

openreview.net