Alan Chan

Visibility into AI Agents

Alan Chan

Carson Ezell

Max Kaufmann

Kevin Wei

Lewis Hammond

Herbie Bradley

Emma Bluemke

Nitarshan Rajkumar

David Krueger

Noam Kolt

Lennart Heim

Markus Anderljung

Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex go… (see more)als with limited supervision -- may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ensuring accountability of key stakeholders. Information about where, why, how, and by whom certain AI agents are used, which we refer to as visibility, is critical to these objectives. In this paper, we assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging. For each, we outline potential implementations that vary in intrusiveness and informativeness. We analyze how the measures apply across a spectrum of centralized through decentralized deployment contexts, accounting for various actors in the supply chain including hardware and software service providers. Finally, we discuss the implications of our measures for privacy and concentration of power. Further work into understanding the measures and mitigating their negative impacts can help to build a foundation for the governance of AI agents.

2024-06-04

The 2024 ACM Conference on Fairness, Accountability, and Transparency (published)

doi.org

arxiv.org

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Usman Anwar

Abulhair Saparov

Javier Rando

Daniel Paleka

Miles Turpin

Peter Hase

Ekdeep Singh Lubana

Erik Jenner

Stephen Casper

Oliver Sourbut

Benjamin L. Edelman

Zhaowei Zhang

Mario Günther

Anton Korinek

Jose Hernandez-Orallo

Lewis Hammond

Eric Bigelow

Alexander Pan

Lauro Langosco

Tomasz Korbak … (see 22 more)

Heidi Zhang

Ruiqi Zhong

Seán Ó hÉigeartaigh

Gabriel Recchia

Giulio Corsi

Alan Chan

Markus Anderljung

Lilian Edwards

Aleksandar Petrov

Christian Schroeder de Witt

Sumeet Ramesh Motwani

Samuel Albanie

Yoshua Bengio

Danqi Chen

Philip H.S. Torr

Tegan Maharaj

Jakob Foerster

Florian Tramèr

He He

Atoosa Kasirzadeh

Yejin Choi

David Krueger

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are o… (see more)rganized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose

2023-12-31

Trans. Mach. Learn. Res. (published)

doi.org

openreview.net

Characterizing Manipulation from Al Systems

MICAH CARROLL

Alan Chan

Henry Ashton

David Krueger

Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interac… (see more)tions with the world, it is important to understand the degree to which AI systems might manipulate humans without the intent of the system designers. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.

2023-10-28

Equity and Access in Algorithms, Mechanisms, and Optimization (published)

doi.org

arxiv.org

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Alan Chan

Benjamin Bucknall

Herbie Bradley

David M. Krueger

2023-10-22

NeurIPS.cc/2023/Workshop/SoLaR (spotlight)

doi.org

openreview.net

Harms from Increasingly Agentic Algorithmic Systems

Alan Chan

Rebecca Salganik

ALVA MARKELIUS

CHRIS PANG

Nitarshan Rajkumar

Dmitrii Krasheninnikov

Lauro Langosco

ZHONGHAO HE

Yawen Duan

MICAH CARROLL

Michelle Lin

ALEX MAYHEW

KATHERINE COLLINS

Maryam Molamohammadi

John Burden

WANRU ZHAO

Shalaleh Rismani

KONSTANTINOS VOUDOURIS

UMANG BHATT

Adrian Weller … (see 2 more)

David Krueger

Tegan Maharaj

Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains… (see more) as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed which threaten the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms. Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency -- notably, these include systemic and/or long-range impacts, often on marginalized stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems.

2023-06-11

2023 ACM Conference on Fairness, Accountability, and Transparency (published)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Alan Chan

Publications