Portrait of Alan Chan is unavailable

Alan Chan

Alumni

Publications

Visibility into AI Agents
Carson Ezell
Max Kaufmann
Kevin Wei
Lewis Hammond
Herbie Bradley
Emma Bluemke
David Krueger
Noam Kolt
Lennart Heim
Markus Anderljung
Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex go… (see more)als with limited supervision -- may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ensuring accountability of key stakeholders. Information about where, why, how, and by whom certain AI agents are used, which we refer to as visibility, is critical to these objectives. In this paper, we assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging. For each, we outline potential implementations that vary in intrusiveness and informativeness. We analyze how the measures apply across a spectrum of centralized through decentralized deployment contexts, accounting for various actors in the supply chain including hardware and software service providers. Finally, we discuss the implications of our measures for privacy and concentration of power. Further work into understanding the measures and mitigating their negative impacts can help to build a foundation for the governance of AI agents.
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar
Abulhair Saparov
Javier Rando
Daniel Paleka
Miles Turpin
Peter Hase
Ekdeep Singh Lubana
Erik Jenner
Stephen Casper
Oliver Sourbut
Benjamin L. Edelman
Zhaowei Zhang
Mario Günther
Anton Korinek
Jose Hernandez-Orallo
Lewis Hammond
Eric Bigelow
Alexander Pan
Lauro Langosco
Tomasz Korbak … (see 22 more)
Heidi Zhang
Ruiqi Zhong
Seán Ó hÉigeartaigh
Gabriel Recchia
Giulio Corsi
Markus Anderljung
Lilian Edwards
Aleksandar Petrov
Christian Schroeder de Witt
Sumeet Ramesh Motwani
Samuel Albanie
Danqi Chen
Philip H.S. Torr
Jakob Foerster
Florian Tramèr
He He
Atoosa Kasirzadeh
Yejin Choi
David Krueger
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are o… (see more)rganized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose
Characterizing Manipulation from Al Systems
MICAH CARROLL
Henry Ashton
David Krueger
Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interac… (see more)tions with the world, it is important to understand the degree to which AI systems might manipulate humans without the intent of the system designers. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Benjamin Bucknall
Herbie Bradley
David M. Krueger
Harms from Increasingly Agentic Algorithmic Systems
ALVA MARKELIUS
CHRIS PANG
Dmitrii Krasheninnikov
Lauro Langosco
ZHONGHAO HE
Yawen Duan
MICAH CARROLL
ALEX MAYHEW
KATHERINE COLLINS
John Burden
WANRU ZHAO
KONSTANTINOS VOUDOURIS
UMANG BHATT
Adrian Weller … (see 2 more)
David Krueger
Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains… (see more) as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed which threaten the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms. Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency -- notably, these include systemic and/or long-range impacts, often on marginalized stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems.