Alan Chan

Carson Ezell

Max Kaufmann

Kevin Wei

Lewis Hammond

Herbie Bradley

Emma Bluemke

Nitarshan Rajkumar

Noam Kolt

Lennart Heim

Markus Anderljung

Increased delegation of commercial, scientific, governmental, and personal activities to AI agents—systems capable of pursuing complex goa… (voir plus)ls with limited supervision—may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ensuring accountability of key stakeholders. Information about where, why, how, and by whom certain AI agents are used, which we refer to as visibility, is critical to these objectives. In this paper, we assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging. For each, we outline potential implementations that vary in intrusiveness and informativeness. We analyze how the measures apply across a spectrum of centralized through decentralized deployment contexts, accounting for various actors in the supply chain including hardware and software service providers. Finally, we discuss the implications of our measures for privacy and concentration of power. Further work into understanding the measures and mitigating their negative impacts can help to build a foundation for the governance of AI agents.

2024-01-23

ArXiv (prépublication)

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Usman Anwar

Abulhair Saparov

Javier Rando

Daniel Paleka

Miles Turpin

Peter Hase

Ekdeep Singh Lubana

Erik Jenner

Stephen Casper

Oliver Sourbut

Benjamin L. Edelman

Zhaowei Zhang

Mario Günther

Anton Korinek

Jose Hernandez-Orallo

Lewis Hammond

Eric J Bigelow

Alexander Pan

Lauro Langosco

Tomasz Korbak … (voir 22 de plus)

Heidi Chenyu Zhang

Ruiqi Zhong

Sean O hEigeartaigh

Gabriel Recchia

Giulio Corsi

Markus Anderljung

Lilian Edwards

Aleksandar Petrov

Christian Schroeder de Witt

Sumeet Ramesh Motwani

Yoshua Bengio

Danqi Chen

Philip Torr

Samuel Albanie

Tegan Maharaj

Jakob Nicolaus Foerster

Florian Tramèr

He He

Atoosa Kasirzadeh

Yejin Choi

2024-01-01

Trans. Mach. Learn. Res. (publié)

openreview.net

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Usman Anwar

Abulhair Saparov

Javier Rando

Daniel Paleka

Miles Turpin

Peter Hase

Ekdeep Singh Lubana

Erik Jenner

Stephen Casper

Oliver Sourbut

Benjamin L. Edelman

Zhaowei Zhang

Mario Günther

Anton Korinek

Jose Hernandez-Orallo

Lewis Hammond

Eric J Bigelow

Alexander Pan

Lauro Langosco

Tomasz Korbak … (voir 22 de plus)

Heidi Chenyu Zhang

Ruiqi Zhong

Sean O hEigeartaigh

Gabriel Recchia

Giulio Corsi

Markus Anderljung

Lilian Edwards

Aleksandar Petrov

Christian Schroeder de Witt

Sumeet Ramesh Motwani

Yoshua Bengio

Danqi Chen

Philip Torr

Samuel Albanie

Tegan Maharaj

Jakob Nicolaus Foerster

Florian Tramèr

He He

Atoosa Kasirzadeh

Yejin Choi

2024-01-01

Trans. Mach. Learn. Res. (publié)

dblp.uni-trier.de

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Benjamin Bucknall

Herbie Bradley

Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enable… (voir plus)s fine-tuning without the prohibitive expense of pretraining. Our work argues that increasingly accessible fine-tuning of downloadable models may increase hazards. First, we highlight research to improve the accessibility of fine-tuning. We split our discussion into research that A) reduces the computational cost of fine-tuning and B) improves the ability to share that cost across more actors. Second, we argue that increasingly accessible fine-tuning methods may increase hazard through facilitating malicious use and making oversight of models with potentially dangerous capabilities more difficult. Third, we discuss potential mitigatory measures, as well as benefits of more accessible fine-tuning. Given substantial remaining uncertainty about hazards, we conclude by emphasizing the urgent need for the development of mitigations.

2023-12-22

ArXiv (prépublication)

Characterizing Manipulation from AI Systems

Micah Carroll

Henry Ashton

Manipulation is a concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our digital intera… (voir plus)ctions, it is important to understand the degree to which AI systems might manipulate humans without the intent of the system designers. Our work clarifies challenges in defining and measuring this kind of manipulation from AI systems. Firstly, we build upon prior literature on manipulation and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, covertness, and harm. We review proposals on how to operationalize each concept and we outline challenges in including each concept in a definition of manipulation. Second, we discuss the connections between manipulation and related concepts, such as deception and coercion. We then analyze how our characterization of manipulation applies to recommender systems and language models, and give a brief overview of the regulation of manipulation in other domains. While some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. Manipulation could pose a significant threat to human autonomy and precautionary actions to mitigate it are likely warranted.

2023-10-30

Equity and Access in Algorithms, Mechanisms, and Optimization (publié)

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Benjamin Bucknall

Herbie Bradley

2023-10-23

NeurIPS.cc/2023/Workshop/SoLaR (spotlight)

openreview.net

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Benjamin Bucknall

Herbie Bradley

2023-10-23

NeurIPS.cc/2023/Workshop/SoLaR (spotlight)

openreview.net

Harms from Increasingly Agentic Algorithmic Systems

Rebecca Salganik

Alva Markelius

Chris Pang

Nitarshan Rajkumar

Dmitrii Krasheninnikov

Lauro Langosco

Zhonghao He

Yawen Duan

Micah Carroll

Michelle Lin

Alex Mayhew

Katherine Collins

Maryam Molamohammadi

John Burden

Wanru Zhao

Shalaleh Rismani

Konstantinos Voudouris

Umang Bhatt

Adrian Weller … (voir 2 de plus)

Tegan Maharaj

Research in Fairness, Accountability, Transparency, and Ethics (FATE)1 has established many sources and forms of algorithmic harm, in domain… (voir plus)s as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed, typically without strong regulatory barriers, threatening the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms, rather than just responding to them. Anticipation of harms is especially important given the rapid pace of developments in machine learning (ML). Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency – notably, these include systemic and/or long-range impacts, often on marginalized or unconsidered stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems.

2023-06-12

2023 ACM Conference on Fairness, Accountability, and Transparency (publié)