Gael Varoquaux

Andrew Yao

Ya-Qin Zhang

This is the Second Key Update to the 2025 International AI Safety Report. The First Key Update (1) discussed developments in the capabilitie… (see more)s of general-purpose AI models and systems and associated risks. This Key Update covers how various actors, including researchers, companies, and governments, are approaching risk management and technical mitigations for AI. The past year has seen important developments in AI risk management, including better techniques for training safer models and monitoring their outputs. While this represents tangible progress, significant gaps remain. It is often uncertain how effective current measures are at preventing harms, and effectiveness varies across time and applications. There are many opportunities to further strengthen existing safeguard techniques and to develop new ones. This Key Update provides a concise overview of critical developments in risk management practices and technical risk mitigation since the publication of the 2025 AI Safety Report in January. It highlights where progress is being made and where gaps remain. Above all, it aims to support policymakers, researchers, and the public in navigating a rapidly changing environment, helping them to make informed and timely decisions about the governance of general-purpose AI. Professor Yoshua BengioUniversité de Montréal / LawZero /Mila – Quebec AI Institute & Chair

2025-12-06

SuperIntelligence - Robotics - Safety & Alignment (published)

International AI Safety Report: First Key Update, Capabilities and Risk Implications

Prof. Yoshua Bengio

Stephen Clare

Carina Prunkl

Maksym Andriushchenko

BEN BUCKNALL

Philip Fox

Tiancheng Hu

Cameron Jones

Sam Manning

Nestor Maslej

Vasilios Mavroudis

Conor McGlynn

Malcolm Murray

Shalaleh Rismani

Charlotte Stix

Lucia Velasco

Nicole Wheeler

Daniel Privitera

Sören Mindermann

Daron Acemoglu … (see 36 more)

Thomas G. Dietterich

Fredrik Heintz

Geoffrey Hinton

Nick Jennings

Susan Leavy

Teresa Ludermir

Vidushi Marda

Helen Margetts

John McDermid

Jane Munga

Arvind Narayanan

Alondra Nelson

Clara Neppel

Sarvapali D. (Gopal) Ramchurn

Stuart Russell

Marietje Schaake

Bernhard Schölkopf

Alvaro Soto

Lee Tiedrich

Andrew Yao

Ya-Qin Zhang

Lambrini Das

Claire Dennis

Arianna Dini

Freya Hempleman

Samuel Kenny

Patrick King

Hannah Merchant

Jamie-Day Rawal

Rose Woolhouse

The field of AI is moving too quickly for a single yearly publication to keep pace. Significant changes can occur on a timescale of months, … (see more)sometimes weeks. This is why we are releasing Key Updates: shorter, focused reports that highlight the most important developments between full editions of the International AI Safety Report. With these updates, we aim to provide policymakers, researchers, and the public with up-to-date information to support wise decisions about AI governance. This first Key Update focuses on areas where especially significant changes have occurred since January 2025: advances in general-purpose AI systems' capabilities, and the implications for several critical risks. New training techniques have enabled AI systems to reason step-by-step and operate autonomously for longer periods, allowing them to tackle more kinds of work. However, these same advances create new challenges across biological risks, cyber security, and oversight of AI systems themselves. The International AI Safety Report is intended to help readers assess, anticipate, and manage risks from general-purpose AI systems. These Key Updates ensure that critical developments receive timely attention as the field rapidly evolves.

2025-10-21

SuperIntelligence - Robotics - Safety & Alignment (published)

Advancing science- and evidence-based AI policy.

Rishi Bommasani

Sanjeev Arora

Jennifer Chayes

Yejin Choi

Mariano-Florentino Cuéllar

Li Fei-Fei

Daniel E. Ho

Dan Jurafsky

Sanmi Koyejo

Hima Lakkaraju

Arvind Narayanan

Alondra Nelson

Emma Pierson

Joelle Pineau

Scott Singer

Suresh Venkatasubramanian

Ion Stoica

Percy Liang

Dawn Song

2025-07-30

Science (published)

International AI Safety Report

Yoshua Bengio

Bronwyn Fox

André Carlos Ponce de Leon Ferreira de Carvalho

Mona Nemer

Raquel Pezoa Rivera

Yi Zeng

Juha Heikkilä

Guillaume Avrin

Antonio Krüger

Balaraman Ravindran

Hammam Riza

Ciarán Seoighe

Ziv Katzir

Andrea Monti

Hiroaki Kitano

Nusu Mwamanzi

Fahad Albalawi

José Ramón López Portillo

Haroon Sheikh

Gill Jolly … (see 86 more)

Olubunmi Ajala

Jerry Sheehan

Dominic Vincent Ligot

Kyoung Mu Lee

Crystal Rugege

Denise Wong

Nuria Oliver

Christian Busch

Ahmet Halit Hatip

Oleksii Molchanovskyi

Marwan Alserkal

Chris Johnson

Amandeep Singh Gill

Saif M. Khan

Sören Mindermann

Daniel Privitera

Tamay Besiroglu

Rishi Bommasani

Stephen Casper

Yejin Choi

Philip Fox

Ben Garfinkel

Danielle Goldfarb

Hoda Heidari

Anson Ho

Sayash Kapoor

Leila Khalatbari

Shayne Longpre

Sam Manning

Vasilios Mavroudis

Mantas Mazeika

Julian Michael

Jessica Newman

Kwan Yee Ng

Chinasa T. Okolo

Deborah Raji

Girish Sastry

Elizabeth Seger

Theodora Skeadas

Tobin South

Daron Acemoglu

Olubayo Adekanmbi

David Dalrymple

Thomas G. Dietterich

Edward W. Felten

Pascale Fung

Pierre-Olivier Gourinchas

Fredrik Heintz

Geoffrey Hinton

Nick Jennings

Andreas Krause

Susan Leavy

Percy Liang

Teresa Ludermir

Vidushi Marda

Emma Strubell

Florian Tramèr

Lucia Velasco

Nicole Wheeler

Helen Margetts

John McDermid

Jane Munga

Arvind Narayanan

Alondra Nelson

Clara Neppel

Alice Oh

Gopal Ramchurn

Stuart Russell

Marietje Schaake

Bernhard Schölkopf

Dawn Song

Alvaro Soto

Lee Tiedrich

Andrew Yao

Ya-Qin Zhang

Baran Acar

Ben Clifford

Lambrini Das

Claire Dennis

Freya Hempleman

Hannah Merchant

Rian Overy

Ben Snodin

Jonathan Barry

Benjamin Prud’homme

The first International AI Safety Report comprehensively synthesizes the current evidence on the capabilities, risks, and safety of advanced… (see more) AI systems. The report was mandated by the nations attending the AI Safety Summit in Bletchley, UK. Thirty nations, the UN, the OECD, and the EU each nominated a representative to the report's Expert Advisory Panel. A total of 100 AI experts contributed, representing diverse perspectives and disciplines. Led by the report's Chair, these independent experts collectively had full discretion over the report's content.

2024-12-31

arXiv (preprint)

Individual Brain Charting dataset extension, third release for movie watching and retinotopy data

Ana Lúısa Pinho

Hugo Richard

Ana Fernanda Ponce

Michael Eickenberg

Alexis Amadon

Elvis Dopgima Dohmatob

Isabelle Denghien

Juan Jesús Torre

Swetha Shankar

Himanshu Aggarwal

Alexis Thual

Thomas Chapalain

Chantal Ginisty

Séverine Becuwe-Desmidt

Séverine Roger

Yann Lecomte

Valérie Berland

Laurence Laurier

Véronique Joly-Testault

Gaëlle Médiouni-Cloarec … (see 6 more)

Christine Doublé

Bernadette Martins

Stanislas Dehaene

Lucie Hertz-Pannier

Bertrand Thirion

2024-06-04

Scientific Data (published)

Metrics reloaded: recommendations for image analysis validation.

Lena Maier-Hein

Annika Reinke

Evangelia Christodoulou

Ben Glocker

PATRICK GODAU

Fabian Isensee

Jens Kleesiek

Michal Kozubek

Mauricio Reyes

MICHAEL A. RIEGLER

Manuel Wiesenfarth

Michael Baumgartner

Matthias Eisenmann

DOREEN HECKMANN-NÖTZEL

A. EMRE KAVUR

TIM RÄDSCH

Minu Dietlinde Tizabi

Laura Acion

Michela Antonelli

Tal Arbel … (see 47 more)

Spyridon Bakas

Peter Bankhead

Allison Benis

M. Jorge Cardoso

Veronika Cheplygina

BETH A. CIMINI

Gary S. Collins

Keyvan Farahani

Bram van Ginneken

Daniel A. Hashimoto

Michael M. Hoffman

Merel Huisman

Pierre Jannin

CHARLES E. KAHN

Alexandros Karargyris

Alan Karthikesalingam

H. Kenngott

Annette Kopp-Schneider

Anna Kreshuk

Tahsin Kurc

Bennett Landman

GEERT LITJENS

Amin Madani

Klaus Maier-Hein

Anne L. Martel

Peter Mattson

Erik Meijering

Bjoern Menze

David Moher

Karel G.M. Moons

Henning Müller

Felix Nickel

Brennan Nichyporuk

Jens Petersen

Nasir Rajpoot

Nicola Rieke

Julio Saez-Rodriguez

Clarisa S'anchez Guti'errez

Shravya Shetty

M. Smeden

Carole H. Sudre

Ronald M. Summers

Abdel Aziz Taha

Sotirios A. Tsaftaris

Ben Van Calster

PAUL F. JÄGER

2024-02-11

Nature Methods (published)

Understanding metric-related pitfalls in image analysis validation

Annika Reinke

Minu D. Tizabi

Michael Baumgartner

Matthias Eisenmann

DOREEN HECKMANN-NÖTZEL

A. EMRE KAVUR

TIM RÄDSCH

Carole H. Sudre

Laura Acion

Michela Antonelli

Tal Arbel

Spyridon Bakas

Arriel Benis

Matthew B. Blaschko

Florian Buettner

M. Jorge Cardoso

Veronika Cheplygina

Jianxu Chen

Evangelia Christodoulou … (see 59 more)

BETH A. CIMINI

Keyvan Farahani

LUCIANA FERRER

Gary S. Collins

Adrian Galdran

Bram van Ginneken

Ben Glocker

PATRICK GODAU

Daniel A. Hashimoto

Michael M. Hoffman

Robert Haase

Merel Huisman

Fabian Isensee

Pierre Jannin

CHARLES E. KAHN

Dagmar Kainmueller

BERNHARD KAINZ

Alexandros Karargyris

Jens Kleesiek

Florian Kofler

Thijs Kooi

Annette Kopp-Schneider

Alan Karthikesalingam

Hannes Kenngott

Michal Kozubek

Anna Kreshuk

Tahsin Kurc

Bennett A. Landman

GEERT LITJENS

Amin Madani

Klaus Maier-Hein

Anne L. Martel

Erik Meijering

Bjoern Menze

Karel G.M. Moons

Henning Müller

Brennan Nichyporuk

Felix Nickel

Peter Mattson

Jens Petersen

Susanne M. Rafelski

Nasir Rajpoot

Mauricio Reyes

MICHAEL A. RIEGLER

Nicola Rieke

Julio Saez-Rodriguez

Clara I. Sánchez

Shravya Shetty

Ronald M. Summers

Abdel A. Taha

Aleksei Tiulpin

Sotirios A. Tsaftaris

Ben Van Calster

Amin Madani

Ziv R. Yaniv

PAUL F. JÄGER

Lena Maier-Hein

Anne L. Martel

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligen… (see more)ce (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

2024-01-31

Nature Methods (published)

Metrics Reloaded - A new recommendation framework for biomedical image analysis validation

Annika Reinke

Lena Maier-Hein

Evangelia Christodoulou

Ben Glocker

Patrick Scholz

Fabian Isensee

Jens Kleesiek

Michal Kozubek

Mauricio Reyes

Michael Alexander Riegler

Manuel Wiesenfarth

Michael Baumgartner

Matthias Eisenmann

DOREEN HECKMANN-NÖTZEL

Ali Emre Kavur

TIM RÄDSCH

Minu D. Tizabi

Laura Acion

Michela Antonelli

Tal Arbel … (see 48 more)

Spyridon Bakas

Peter Bankhead

Arriel Benis

M. Jorge Cardoso

Veronika Cheplygina

Beth A Cimini

Gary S. Collins

Keyvan Farahani

Bram van Ginneken

Fred A Hamprecht

Daniel A. Hashimoto

Michael M. Hoffman

Merel Huisman

Pierre Jannin

Charles Kahn

Alexandros Karargyris

Alan Karthikesalingam

Hannes Kenngott

Annette Kopp-Schneider

Anna Kreshuk

Tahsin Kurc

Bennett Landman

GEERT LITJENS

Amin Madani

Klaus Maier-Hein

Anne Martel

Peter Mattson

Erik Meijering

Bjoern Menze

David Moher

Karel G.M. Moons

Henning Müller

Brennan Nichyporuk

Felix Nickel

Jens Petersen

Nasir Rajpoot

Nicola Rieke

Julio Saez-Rodriguez

Clara I. Sánchez

Shravya Shetty

Maarten van Smeden

Carole H. Sudre

Ronald M. Summers

Abdel A. Taha

Sotirios A. Tsaftaris

Ben Van Calster

Paul F Jaeger

Meaningful performance assessment of biomedical image analysis algorithms depends on objective and appropriate performance metrics. There ar… (see more)e major shortcomings in the current state of the art. Yet, so far limited attention has been paid to practical pitfalls associated when using particular metrics for image analysis tasks. Therefore, a number of international initiatives have collaborated to offer researchers with guidance and tools for selecting performance metrics in a problem-aware manner. In our proposed framework, the characteristics of the given biomedical problem are first captured in a problem fingerprint, which identifies properties related to domain interests, the target structure(s), the input datasets, and algorithm output. A problem category-specific mapping is applied in the second step to match fingerprints to metrics that reflect domain requirements. Based on input from experts from more than 60 institutions worldwide, we believe our metric recommendation framework to be useful to the MIDL community and to enhance the quality of biomedical image analysis algorithm validation.

2022-05-08

MIDL.io/2022/Conference/Short (published)

openreview.net

Population modeling with machine learning can enhance measures of mental health

Kamalaker Dadi

Josselin Houenou

Danilo Bzdok

Bertrand Thirion

Denis Engemann

We applied machine learning on more than 10.000 individuals from the general population to define empirical approximations of health-related… (see more) psychological measures that do not require human judgment.We found that machine-learning enriched the given psychological measures via approximation from brain and sociodemographic data: Resulting proxy measures related as well or better to real-world health behavior than the original measures.Model comparisons showed that sociodemographic information contributed most to characterizing psychological traits beyond aging.

2021-09-23

bioRxiv (preprint)

Accounting for Variance in Machine Learning Benchmarks

Mirko Bronzi

Naz Sepah

Edward Raff

Kanika Madan

Vikram Voleti

Samira Ebrahimi Kahou

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the l… (see more)earning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.

2020-12-31

MLSys (published)

Prediction, Not Association, Paves the Road to Precision Medicine

Danilo Bzdok

Ewout W. Steyerberg

2020-08-11

JAMA Psychiatry (published)