David Scott Krueger

Frank Hutter

Atilim Güneş Baydin

Sheila McIlraith

Qiqi Gao

Ashwin Acharya

Anca Dragan

Philip Torr … (see 4 more)

Stuart Russell

Daniel Kahneman

Jan Brauner

Sören Mindermann

In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses, … (see more)as well as an irreversible loss of human control over autonomous AI systems. In light of rapid and continuing AI progress, we propose priorities for AI R&D and governance.

2023-10-26

ArXiv (preprint)

Managing AI Risks in an Era of Rapid Progress

Yoshua Bengio

Geoffrey Hinton

Andrew Yao

Dawn Song

Pieter Abbeel

Yuval Noah Harari

Ya-Qin Zhang

Lan Xue

Shai Shalev-Shwartz

Gillian K. Hadfield

Jeff Clune

Frank Hutter

Atilim Güneş Baydin

Sheila McIlraith

Qiqi Gao

Ashwin Acharya

Anca Dragan

Philip Torr … (see 4 more)

Stuart Russell

Daniel Kahneman

Jan Brauner

Sören Mindermann

2023-10-26

ArXiv (preprint)

Managing AI Risks in an Era of Rapid Progress

Yoshua Bengio

Geoffrey Hinton

Andrew Yao

Dawn Song

Pieter Abbeel

Yuval Noah Harari

Trevor Darrell

Ya-Qin Zhang

Lan Xue

Shai Shalev-Shwartz

Gillian K. Hadfield

Jeff Clune

Frank Hutter

Atilim Güneş Baydin

Sheila McIlraith

Qiqi Gao

Ashwin Acharya

Anca Dragan … (see 5 more)

Philip Torr

Stuart Russell

Daniel Kahneman

Jan Brauner

Sören Mindermann

2023-10-26

Science (published)

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Alan Chan

Benjamin Bucknall

Herbie Bradley

2023-10-23

NeurIPS.cc/2023/Workshop/SoLaR (spotlight)

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Alan Chan

Benjamin Bucknall

Herbie Bradley

2023-10-23

NeurIPS.cc/2023/Workshop/SoLaR (spotlight)

Meta- (out-of-context) learning in neural networks

Dmitrii Krasheninnikov

Egor Krasheninnikov

Bruno Mlodozeniec

Brown et al. (2020) famously introduced the phenomenon of in-context learning in large language models (LLMs). We establish the existence of… (see more) a phenomenon we call meta-out-of-context learning (meta-OCL) via carefully designed synthetic experiments with LLMs. Our results suggest that meta-OCL leads LLMs to more readily"internalize"the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and use it in appropriate circumstances. We further demonstrate meta-OCL in a synthetic computer vision setting, and propose two hypotheses for the emergence of meta-OCL: one relying on the way models store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based optimizers may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks. Our code can be found at https://github.com/krasheninnikov/internalization.

2023-10-23

ArXiv (preprint)

Thinker: Learning to Plan and Act

Stephen Chung

Ivan Anokhin

We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a le… (see more)arned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for handcrafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. Thinker is the first work showing that an RL agent can learn to plan with a learned world model in complex environments.

Mechanistic Mode Connectivity

Ekdeep Singh Lubana

Eric J Bigelow

Robert P. Dick

Hidenori Tanaka

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

Towards Out-of-Distribution Adversarial Robustness

Adam Ibrahim

Charles Guille-escuret

Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fail… (see more)s to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different

2023-06-20

ICML.cc/2023/Workshop/AdvML-Frontiers (published)

Harms from Increasingly Agentic Algorithmic Systems

Alan Chan

Rebecca Salganik

Alva Markelius

Chris Pang

Nitarshan Rajkumar

Dmitrii Krasheninnikov

Lauro Langosco

Zhonghao He

Yawen Duan

Micah Carroll

Michelle Lin

Alex Mayhew

Katherine Collins

Maryam Molamohammadi

John Burden

Wanru Zhao

Shalaleh Rismani

Konstantinos Voudouris

Umang Bhatt

Adrian Weller … (see 2 more)

Research in Fairness, Accountability, Transparency, and Ethics (FATE)1 has established many sources and forms of algorithmic harm, in domain… (see more)s as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed, typically without strong regulatory barriers, threatening the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms, rather than just responding to them. Anticipation of harms is especially important given the rapid pace of developments in machine learning (ML). Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency – notably, these include systemic and/or long-range impacts, often on marginalized or unconsidered stakeholders. We emphasize that recognizing agency of algorithmic systems does not absolve or shift the human responsibility for algorithmic harms. Rather, we use the term agency to highlight the increasingly evident fact that ML systems are not fully under human control. Our work explores increasingly agentic algorithmic systems in three parts. First, we explain the notion of an increase in agency for algorithmic systems in the context of diverse perspectives on agency across disciplines. Second, we argue for the need to anticipate harms from increasingly agentic systems. Third, we discuss important harms from increasingly agentic systems and ways forward for addressing them. We conclude by reflecting on implications of our work for anticipating algorithmic harms from emerging systems.

2023-06-12

2023 ACM Conference on Fairness, Accountability, and Transparency (published)

The Flag and the Cross: White Christian Nationalism and the Threat to American Democracy by Philip S. Gorski and Samuel L. Perry (review)

The Flag and the Cross: White Christian Nationalism and the Threat to American Democracy by Philip S. Gorski and Samuel L. Perry (review)