Sören Mindermann

International AI Safety Report Second Key Update: Technical Safeguards and Risk Management

Yoshua Bengio

Stephen Clare

Carina Prunkl

Maksym Andriushchenko

BEN BUCKNALL

Philip Fox

Nestor Maslej

Conor McGlynn

Malcolm Murray

Shalaleh Rismani

Stephen Casper

Jessica Newman

Daniel Privitera

Sören Mindermann

Daron Acemoglu

Thomas G. Dietterich

Fredrik Heintz

Geoffrey Hinton

Nick Jennings

Susan Leavy … (voir 17 de plus)

Teresa Ludermir

Vidushi Marda

Helen Margetts

John McDermid

Jane Munga

Arvind Narayanan

Alondra Nelson

Clara Neppel

Sarvapali D. (Gopal) Ramchurn

Stuart Russell

Marietje Schaake

Bernhard Schölkopf

Alvaro Soto

Lee Tiedrich

Gael Varoquaux

Andrew Yao

Ya-Qin Zhang

This is the Second Key Update to the 2025 International AI Safety Report. The First Key Update (1) discussed developments in the capabilitie… (voir plus)s of general-purpose AI models and systems and associated risks. This Key Update covers how various actors, including researchers, companies, and governments, are approaching risk management and technical mitigations for AI. The past year has seen important developments in AI risk management, including better techniques for training safer models and monitoring their outputs. While this represents tangible progress, significant gaps remain. It is often uncertain how effective current measures are at preventing harms, and effectiveness varies across time and applications. There are many opportunities to further strengthen existing safeguard techniques and to develop new ones. This Key Update provides a concise overview of critical developments in risk management practices and technical risk mitigation since the publication of the 2025 AI Safety Report in January. It highlights where progress is being made and where gaps remain. Above all, it aims to support policymakers, researchers, and the public in navigating a rapidly changing environment, helping them to make informed and timely decisions about the governance of general-purpose AI. Professor Yoshua BengioUniversité de Montréal / LawZero /Mila – Quebec AI Institute & Chair

2025-12-06

SuperIntelligence - Robotics - Safety & Alignment (publié)

doi.org

International AI Safety Report: First Key Update, Capabilities and Risk Implications

Prof. Yoshua Bengio

Stephen Clare

Carina Prunkl

Maksym Andriushchenko

BEN BUCKNALL

Philip Fox

Tiancheng Hu

Cameron Jones

Sam Manning

Nestor Maslej

Vasilios Mavroudis

Conor McGlynn

Malcolm Murray

Shalaleh Rismani

Charlotte Stix

Lucia Velasco

Nicole Wheeler

Daniel Privitera

Sören Mindermann

Daron Acemoglu … (voir 36 de plus)

Thomas G. Dietterich

Fredrik Heintz

Geoffrey Hinton

Nick Jennings

Susan Leavy

Teresa Ludermir

Vidushi Marda

Helen Margetts

John McDermid

Jane Munga

Arvind Narayanan

Alondra Nelson

Clara Neppel

Sarvapali D. (Gopal) Ramchurn

Stuart Russell

Marietje Schaake

Bernhard Schölkopf

Alvaro Soto

Lee Tiedrich

Gael Varoquaux

Andrew Yao

Ya-Qin Zhang

Lambrini Das

Claire Dennis

Arianna Dini

Freya Hempleman

Samuel Kenny

Patrick King

Hannah Merchant

Jamie-Day Rawal

Rose Woolhouse

The field of AI is moving too quickly for a single yearly publication to keep pace. Significant changes can occur on a timescale of months, … (voir plus)sometimes weeks. This is why we are releasing Key Updates: shorter, focused reports that highlight the most important developments between full editions of the International AI Safety Report. With these updates, we aim to provide policymakers, researchers, and the public with up-to-date information to support wise decisions about AI governance. This first Key Update focuses on areas where especially significant changes have occurred since January 2025: advances in general-purpose AI systems' capabilities, and the implications for several critical risks. New training techniques have enabled AI systems to reason step-by-step and operate autonomously for longer periods, allowing them to tackle more kinds of work. However, these same advances create new challenges across biological risks, cyber security, and oversight of AI systems themselves. The International AI Safety Report is intended to help readers assess, anticipate, and manage risks from general-purpose AI systems. These Key Updates ensure that critical developments receive timely attention as the field rapidly evolves.

2025-10-21

SuperIntelligence - Robotics - Safety & Alignment (publié)

doi.org

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Michael Cohen

Joumana Ghosn

Adam Oberman

Jesse Richardson

Oliver Richardson

Marc-Antoine Rondeau

Pierre-Luc St-Charles

David Williams-King

The leading AI companies are increasingly focused on building generalist AI agents -- systems that can autonomously plan, act, and pursue go… (voir plus)als across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by malicious actors to a potentially irreversible loss of human control. We discuss how these risks arise from current AI training methods. Indeed, various scenarios and experiments have demonstrated the possibility of AI agents engaging in deception or pursuing goals that were not specified by human operators and that conflict with human interests, such as self-preservation. Following the precautionary principle, we see a strong need for safer, yet still useful, alternatives to the current agency-driven trajectory. Accordingly, we propose as a core building block for further advances the development of a non-agentic AI system that is trustworthy and safe by design, which we call Scientist AI. This system is designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans. It comprises a world model that generates theories to explain data and a question-answering inference machine. Both components operate with an explicit notion of uncertainty to mitigate the risks of overconfident predictions. In light of these considerations, a Scientist AI could be used to assist human researchers in accelerating scientific progress, including in AI safety. In particular, our system can be employed as a guardrail against AI agents that might be created despite the risks involved. Ultimately, focusing on non-agentic AI may enable the benefits of AI innovation while avoiding the risks associated with the current trajectory. We hope these arguments will motivate researchers, developers, and policymakers to favor this safer path.

2025-08-14

SuperIntelligence - Robotics - Safety & Alignment (publié)

doi.org

arxiv.org

Open Problems in Machine Unlearning for AI Safety

Fazl Barez

Tingchen Fu

Ameya Prabhu

Stephen Casper

Amartya Sanyal

Adel Bibi

Aidan O'Gara

Robert Kirk

Benjamin Bucknall

Timothy Fist

Luke Ong

Philip Torr

Kwok-Yan Lam

Robert Trager

David M. Krueger

Sören Mindermann

Jose Hernandez-Orallo

Mor Geva

Yarin Gal

As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research… (voir plus), and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.

2025-01-08

ArXiv (prépublication)

doi.org

arxiv.org

International AI Safety Report

Yoshua Bengio

Bronwyn Fox

André Carlos Ponce de Leon Ferreira de Carvalho

Mona Nemer

Raquel Pezoa Rivera

Yi Zeng

Juha Heikkilä

Guillaume Avrin

Antonio Krüger

Balaraman Ravindran

Hammam Riza

Ciarán Seoighe

Ziv Katzir

Andrea Monti

Hiroaki Kitano

Nusu Mwamanzi

Fahad Albalawi

José Ramón López Portillo

Haroon Sheikh

Gill Jolly … (voir 86 de plus)

Olubunmi Ajala

Jerry Sheehan

Dominic Vincent Ligot

Kyoung Mu Lee

Crystal Rugege

Denise Wong

Nuria Oliver

Christian Busch

Ahmet Halit Hatip

Oleksii Molchanovskyi

Marwan Alserkal

Chris Johnson

Amandeep Singh Gill

Saif M. Khan

Sören Mindermann

Daniel Privitera

Tamay Besiroglu

Rishi Bommasani

Stephen Casper

Yejin Choi

Philip Fox

Ben Garfinkel

Danielle Goldfarb

Hoda Heidari

Anson Ho

Sayash Kapoor

Leila Khalatbari

Shayne Longpre

Sam Manning

Vasilios Mavroudis

Mantas Mazeika

Julian Michael

Jessica Newman

Kwan Yee Ng

Chinasa T. Okolo

Deborah Raji

Girish Sastry

Elizabeth Seger

Theodora Skeadas

Tobin South

Daron Acemoglu

Olubayo Adekanmbi

David Dalrymple

Thomas G. Dietterich

Edward W. Felten

Pascale Fung

Pierre-Olivier Gourinchas

Fredrik Heintz

Geoffrey Hinton

Nick Jennings

Andreas Krause

Susan Leavy

Percy Liang

Teresa Ludermir

Vidushi Marda

Emma Strubell

Florian Tramèr

Lucia Velasco

Nicole Wheeler

Helen Margetts

John McDermid

Jane Munga

Arvind Narayanan

Alondra Nelson

Clara Neppel

Alice Oh

Gopal Ramchurn

Stuart Russell

Marietje Schaake

Bernhard Schölkopf

Dawn Song

Alvaro Soto

Lee Tiedrich

Gael Varoquaux

Andrew Yao

Ya-Qin Zhang

Baran Acar

Ben Clifford

Lambrini Das

Claire Dennis

Freya Hempleman

Hannah Merchant

Rian Overy

Ben Snodin

Jonathan Barry

Benjamin Prud’homme

The first International AI Safety Report comprehensively synthesizes the current evidence on the capabilities, risks, and safety of advanced… (voir plus) AI systems. The report was mandated by the nations attending the AI Safety Summit in Bletchley, UK. Thirty nations, the UN, the OECD, and the EU each nominated a representative to the report's Expert Advisory Panel. A total of 100 AI experts contributed, representing diverse perspectives and disciplines. Led by the report's Chair, these independent experts collectively had full discretion over the report's content.

2024-12-31

arXiv (prépublication)

doi.org

arxiv.org

Open Technical Problems in Open-Weight AI Model Risk Management

Stephen Casper

Kyle O'Brien

Shayne Longpre

Elizabeth Seger

Kevin Klyman

Rishi Bommasani

Aniruddha Nrusimha

Ilia Shumailov

Sören Mindermann

Steven Basart

Frank Rudzicz

Kellin Pelrine

Avijit Ghosh

Andrew Strait

Robert Kirk

Dan Hendrycks

Peter Henderson

J. Zico Kolter

Geoffrey Irving

Yarin Gal … (voir 2 de plus)

Yoshua Bengio

Dylan Hadfield-Menell

2024-12-31

SSRN Electronic Journal (accepté)

doi.org

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Michael Cohen

Joumana Ghosn

Adam Oberman

Jesse Richardson

Oliver Richardson

Marc-Antoine Rondeau

Pierre-Luc St-Charles

David Williams-King

The leading AI companies are increasingly focused on building generalist AI agents -- systems that can autonomously plan, act, and pursue go… (voir plus)als across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by malicious actors to a potentially irreversible loss of human control. We discuss how these risks arise from current AI training methods. Indeed, various scenarios and experiments have demonstrated the possibility of AI agents engaging in deception or pursuing goals that were not specified by human operators and that conflict with human interests, such as self-preservation. Following the precautionary principle, we see a strong need for safer, yet still useful, alternatives to the current agency-driven trajectory. Accordingly, we propose as a core building block for further advances the development of a non-agentic AI system that is trustworthy and safe by design, which we call Scientist AI. This system is designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans. It comprises a world model that generates theories to explain data and a question-answering inference machine. Both components operate with an explicit notion of uncertainty to mitigate the risks of overconfident predictions. In light of these considerations, a Scientist AI could be used to assist human researchers in accelerating scientific progress, including in AI safety. In particular, our system can be employed as a guardrail against AI agents that might be created despite the risks involved. Ultimately, focusing on non-agentic AI may enable the benefits of AI innovation while avoiding the risks associated with the current trajectory. We hope these arguments will motivate researchers, developers, and policymakers to favor this safer path.

2024-12-31

arXiv (prépublication)

doi.org

arxiv.org

The Singapore Consensus on Global AI Safety Research Priorities

Yoshua Bengio

Tegan Maharaj

Luke Ong

Stuart Russell

Dawn Song

Max Tegmark

Lan Xue

Ya-Qin Zhang

Stephen Casper

Wan Sie Lee

Sören Mindermann

Vanessa Wilfred

Vidhisha Balachandran

Fazl Barez

Michael Belinsky

Ima Bello

Malo Bourgon

Mark Brakel

Simeon Campos

Duncan Cass-Beggs … (voir 67 de plus)

Jiahao Chen

Rumman Chowdhury

Chua Kuan Seah

Jeff Clune

Juntao Dai

Agnes Delaborde

Nouha Dziri

Francisco Eiras

Joshua Engels

Jinyu Fan

Adam Gleave

Noah Goodman

Fynn Heide

Johannes Heidecke

Dan Hendrycks

Cyrus Hodes

Bryan Low

Minlie Huang

Sami Jawhar

Jingyu Wang

Adam Kalai

Meindert Kamphuis

Mohan Kankanhalli

Subhash Kantamneni

Mathias Kirk Bonde

Thomas Kwa

Jeffrey Ladish

Kwok Yan Lam

Wan Sie Lee

Taewhi Lee

Xiaojian Li

Jiajun Liu

Chaochao Lu

Yifan Mai

Richard Mallah

Julian Michael

Nicolas Moës

Simon Moeller

Kihyuk Nam

Kwan Yee Ng

Mark Nitzberg

Besmira Nushi

Seán Ó hÉigeartaigh

Alejandro Ortega

Pierre Peigné

James Petrie

Benjamin Prud'homme

Reihaneh Rabbany

Nayat Sanchez-Pi

Sarah Schwettmann

Buck Shlegeris

Saad Siddiqui

Anu Sinha

Martin Soto

Cheston Tan

Anthony Tung

William Tjhi

Robert Trager

Brian Tse

Anthony Tung

John Willes

Denise Wong

Wei Xu

Rongwu Xu

Yi Zeng

Hongjiang Zhang

Djordje Zikelic

Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to en… (voir plus)sure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential – it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. This requires policymakers, industry, researchers and the broader public to collectively work toward securing positive outcomes from AI’s development. AI safety research is a key dimension. Given that the state of science today for building trustworthy AI does not fully cover all risks, accelerated investment in research is required to keep pace with commercially driven growth in system capabilities. Goals: The 2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety aims to support research in this important space by bringing together AI scientists across geographies to identify and synthesise research priorities in AI safety. The result, The Singapore Consensus on Global AI Safety Research Priorities, builds on the International AI Safety Report-A (IAISR) chaired by Yoshua Bengio and backed by 33 governments. By adopting a defence-in-depth model, this document organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment), and challenges with monitoring and intervening after deployment (Control). Through the Singapore Consensus, we hope to globally facilitate meaningful conversations between AI scientists and AI policymakers for maximally beneficial outcomes. Our goal is to enable more impactful R&D efforts to rapidly develop safety and evaluation mechanisms and foster a trusted ecosystem where AI is harnessed for the public good.

2024-12-31

arXiv (prépublication)

doi.org

arxiv.org

In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?

BEN BUCKNALL

Saad Siddiqui

LARA THURNHERR

CONOR MCGURK

BEN HARACK

Anka Reuel

PATRICIA PASKOV

CASEY MAHONEY

Sören Mindermann

Scott Singer

VINAY HIREMATH

Charbel-Raphael Segerie

OSCAR DELANEY

Alessandro Abate

Fazl Barez

Michael K. Cohen

Philip Torr

FERENC HUSZÁR

ANISOARA CALINESCU

GABRIEL DAVIS JONES … (voir 2 de plus)

Yoshua Bengio

Robert Trager

International cooperation is common in AI research, including between geopolitical rivals. While many experts advocate for greater internati… (voir plus)onal cooperation on AI safety to address shared global risks, some view cooperation on AI with suspicion, arguing that it can pose unacceptable risks to national security. However, the extent to which cooperation on AI safety poses such risks, as well as provides benefits, depends on the specific area of cooperation. In this paper, we consider technical factors that impact the risks of international cooperation on AI safety research, focusing on the degree to which such cooperation can advance dangerous capabilities, result in the sharing of sensitive information, or provide opportunities for harm. We begin by why nations historically cooperate on strategic technologies and analyse current US-China cooperation in AI as a case study. We further argue that existing frameworks for managing associated risks can be supplemented with consideration of key risks specific to cooperation on technical AI safety research. Through our analysis, we find that research into AI verification mechanisms and shared protocols may be suitable areas for such cooperation. Through this analysis we aim to help researchers and governments identify and mitigate the risks of international cooperation on AI safety research, so that the benefits of cooperation can be fully realised.

2024-12-31

ACM Conference on Fairness, Accountability, and Transparency (publié)

doi.org

arxiv.org

Managing extreme AI risks amid rapid progress

Yoshua Bengio

Geoffrey Hinton

Andrew Yao

Dawn Song

Pieter Abbeel

Yuval Noah Harari

Trevor Darrell

Ya-Qin Zhang

Lan Xue

Shai Shalev-Shwartz

Gillian Hadfield

Jeff Clune

Tegan Maharaj

Frank Hutter

Atilim Güneş Baydin

Sheila McIlraith

Qiqi Gao

Ashwin Acharya

David Krueger

Anca Dragan … (voir 5 de plus)

Philip Torr

Stuart Russell

Daniel Kahneman

Jan Brauner

Sören Mindermann

Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can aut… (voir plus)onomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI, there is a lack of consensus about how exactly such risks arise, and how to manage them. Society's response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems. In this short consensus paper, we describe extreme risks from upcoming, advanced AI systems. Drawing on lessons learned from other safety-critical technologies, we then outline a comprehensive plan combining technical research and development with proactive, adaptive governance mechanisms for a more commensurate preparation.

2023-10-25

Science (publié)

doi.org

arxiv.org

Mila Techaide 2026

Désinformation 2.0 : quand l’IA brouille nos ondes

Avantage IA : productivité dans la fonction publique

Sören Mindermann

Publications

Mila Techaide 2026

Désinformation 2.0 : quand l’IA brouille nos ondes

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Sören Mindermann

Publications