Portrait of Alan Chan is unavailable

Alan Chan

Alumni

Publications

Open Problems in Technical AI Governance
Anka Reuel
Benjamin Bucknall
Stephen Casper
Timothy Fist
Lisa Soder
Onni Aarne
Lewis Hammond
Lujain Ibrahim
Peter Wills
Markus Anderljung
Ben Garfinkel
Lennart Heim
Andrew Trask
Gabriel Mukobi
Rylan Schaeffer
Mauricio Baker
Sara Hooker
Irene Solaiman
Alexandra Luccioni
Nicolas Moës
Jeffrey Ladish
David Bau
Paul Bricman
Neel Guha
Jessica Newman
Tobin South
Alex Pentland
Sanmi Koyejo
Mykel Kochenderfer
Robert Trager
AI progress is creating a growing range of risks and opportunities, but it is often unclear how they should be navigated. In many cases, the… (see more) barriers and uncertainties faced are at least partly technical. Technical AI governance, referring to technical analysis and tools for supporting the effective governance of AI, seeks to address such challenges. It can help to (a) identify areas where intervention is needed, (b) identify and assess the efficacy of potential governance actions, and (c) enhance governance options by designing mechanisms for enforcement, incentivization, or compliance. In this paper, we explain what technical AI governance is, why it is important, and present a taxonomy and incomplete catalog of its open problems. This paper is intended as a resource for technical researchers or research funders looking to contribute to AI governance.
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. … (see more)A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of domains, IDs address analogous problems by identifying particular entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to instances of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, provide concrete examples where IDs could be useful, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore a potential implementation of our framework for deployers of AI systems, and highlight limitations and risks. IDs seem most warranted in settings where AI systems could have a large impact upon the world, such as in making financial transactions or contacting real humans. With further study, IDs could help to manage a world where AI systems pervade society.
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. … (see more)A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of domains, IDs address analogous problems by identifying particular entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to instances of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, provide concrete examples where IDs could be useful, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore a potential implementation of our framework for deployers of AI systems, and highlight limitations and risks. IDs seem most warranted in settings where AI systems could have a large impact upon the world, such as in making financial transactions or contacting real humans. With further study, IDs could help to manage a world where AI systems pervade society.
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. … (see more)A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of domains, IDs address analogous problems by identifying particular entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to instances of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, provide concrete examples where IDs could be useful, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore a potential implementation of our framework for deployers of AI systems, and highlight limitations and risks. IDs seem most warranted in settings where AI systems could have a large impact upon the world, such as in making financial transactions or contacting real humans. With further study, IDs could help to manage a world where AI systems pervade society.
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. … (see more)A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of domains, IDs address analogous problems by identifying particular entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to instances of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, provide concrete examples where IDs could be useful, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore a potential implementation of our framework for deployers of AI systems, and highlight limitations and risks. IDs seem most warranted in settings where AI systems could have a large impact upon the world, such as in making financial transactions or contacting real humans. With further study, IDs could help to manage a world where AI systems pervade society.
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. … (see more)A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of domains, IDs address analogous problems by identifying particular entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to instances of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, provide concrete examples where IDs could be useful, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore a potential implementation of our framework for deployers of AI systems, and highlight limitations and risks. IDs seem most warranted in settings where AI systems could have a large impact upon the world, such as in making financial transactions or contacting real humans. With further study, IDs could help to manage a world where AI systems pervade society.
IDs for AI Systems
Noam Kolt
Peter Wills
Usman Anwar
Christian Schroeder de Witt
Lewis Hammond
David M. Krueger
Lennart Heim
Markus Anderljung
Black-Box Access is Insufficient for Rigorous Al Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Benjamin Bucknall
Andreas Haupt
Kevin Wei
Jérémy Scheurer
Marius Hobbhahn
Lee Sharkey
Satyapriya Krishna
Marvin Von Hagen
Silas Alberti
Qinyi Sun
Michael Gerovitch
David Bau
Max Tegmark
David Krueger … (see 1 more)
Dylan Hadfield-Menell
External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depe… (see more)nds on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.