Portrait de Sören Mindermann

Sören Mindermann

Collaborateur·rice de recherche - UdeM
Superviseur⋅e principal⋅e

Publications

Open Problems in Machine Unlearning for AI Safety
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Amartya Sanyal
Adel Bibi
Aidan O'Gara
Robert Kirk
Benjamin Bucknall
Tim Fist
Luke Ong
Philip H. S. Torr
Kwok-Yan Lam
Robert F. Trager
Jose Hernandez-Orallo
Mor Geva
Yarin Gal
As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research… (voir plus), and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.
Open Problems in Machine Unlearning for AI Safety
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Amartya Sanyal
Adel Bibi
Aidan O'Gara
Robert Kirk
Benjamin Bucknall
Timothy Fist
Luke Ong
Philip Torr
Kwok-Yan Lam
Robert Trager
Jose Hernandez-Orallo
Mor Geva
Yarin Gal
As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research… (voir plus), and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.
The Singapore Consensus on Global AI Safety Research Priorities
Luke Ong
Stuart Russell
Dawn Song
Max Tegmark
Lan Xue
Ya-Qin Zhang
Stephen Casper
Wan Sie Lee
Vanessa Wilfred
Vidhisha Balachandran
Fazl Barez
Michael Belinsky
Imane Bello
Malo Bourgon
Mark Brakel
Sim'eon Campos
Duncan Cass-Beggs … (voir 67 de plus)
Jiahao Chen
Rumman Chowdhury
Kuan Chua Seah
Jeff Clune
Juntao Dai
Agnès Delaborde
Nouha Dziri
Francisco Eiras
Joshua Engels
Jinyu Fan
Adam Gleave
Noah D. Goodman
Fynn Heide
Johannes Heidecke
Dan Hendrycks
Cyrus Hodes
Bryan Low Kian Hsiang
Minlie Huang
Sami Jawhar
Jingyu Wang
Adam Tauman Kalai
Meindert Kamphuis
Mohan S. Kankanhalli
Subhash Kantamneni
Mathias Bonde Kirk
Thomas Kwa
Jeffrey Ladish
Kwok-Yan Lam
Wan Lee Sie
Taewhi Lee
Xiaojian Li
Jiajun Liu
Chaochao Lu
Yifan Mai
Richard Mallah
Julian Michael
Nick Moës
Simon Möller
Kihyuk Nam
Kwan Yee Ng
Mark Nitzberg
Besmira Nushi
Sean O hEigeartaigh
Alejandro Ortega
Pierre Peigné
James Petrie
Nayat Sanchez-Pi
Sarah Schwettmann
Buck Shlegeris
Saad Siddiqui
Aradhana Sinha
Martín Soto
Cheston Tan
Dong Ting
William-Chandra Tjhi
Robert Trager
Brian Tse
H. AnthonyTungK.
John Willes
Denise Wong
W. Xu
Rongwu Xu
Yi Zeng
HongJiang Zhang
Djordje Zikelic
Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to en… (voir plus)sure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to support research in this space by bringing together AI scientists across geographies to identify and synthesise research priorities in AI safety. This resulting report builds on the International AI Safety Report chaired by Yoshua Bengio and backed by 33 governments. By adopting a defence-in-depth model, this report organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment), and challenges with monitoring and intervening after deployment (Control).
The Singapore Consensus on Global AI Safety Research Priorities
Luke Ong
Stuart Russell
Dawn Song
Max Tegmark
Lan Xue
Ya-Qin Zhang
Stephen Casper
Wan Sie Lee
Vanessa Wilfred
Vidhisha Balachandran
Fazl Barez
Michael Belinsky
Imane Bello
Malo Bourgon
Mark Brakel
Sim'eon Campos
Duncan Cass-Beggs … (voir 67 de plus)
Jiahao Chen
Rumman Chowdhury
Kuan Chua Seah
Jeff Clune
Juntao Dai
Agnès Delaborde
Nouha Dziri
Francisco Eiras
Joshua Engels
Jinyu Fan
Adam Gleave
Noah D. Goodman
Fynn Heide
Johannes Heidecke
Dan Hendrycks
Cyrus Hodes
Bryan Low Kian Hsiang
Minlie Huang
Sami Jawhar
Jingyu Wang
Adam Tauman Kalai
Meindert Kamphuis
Mohan S. Kankanhalli
Subhash Kantamneni
Mathias Bonde Kirk
Thomas Kwa
Jeffrey Ladish
Kwok-Yan Lam
Wan Lee Sie
Taewhi Lee
Xiaojian Li
Jiajun Liu
Chaochao Lu
Yifan Mai
Richard Mallah
Julian Michael
Nick Moës
Simon Möller
Kihyuk Nam
Kwan Yee Ng
Mark Nitzberg
Besmira Nushi
Sean O hEigeartaigh
Alejandro Ortega
Pierre Peigné
James Petrie
Nayat Sanchez-Pi
Sarah Schwettmann
Buck Shlegeris
Saad Siddiqui
Aradhana Sinha
Martín Soto
Cheston Tan
Dong Ting
William Tjhi
Robert Trager
Brian Tse
H. AnthonyTungK.
John Willes
Denise Wong
Wei Xu
Rongwu Xu
Yi Zeng 0005
HongJiang Zhang
Djordje Zikelic
Managing extreme AI risks amid rapid progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Yuval Noah Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (voir 5 de plus)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Preparation requires technical research and development, as well as adaptive, proactive governance Artificial intelligence (AI) is progressi… (voir plus)ng rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI (1), there is a lack of consensus about how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.
Managing extreme AI risks amid rapid progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Yuval Noah Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (voir 5 de plus)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Managing extreme AI risks amid rapid progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Yuval Noah Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (voir 5 de plus)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Preparation requires technical research and development, as well as adaptive, proactive governance Artificial intelligence (AI) is progressi… (voir plus)ng rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI (1), there is a lack of consensus about how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.
Managing extreme AI risks amid rapid progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Yuval Noah Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (voir 5 de plus)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Managing extreme AI risks amid rapid progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Yuval Noah Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (voir 5 de plus)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Preparation requires technical research and development, as well as adaptive, proactive governance Artificial intelligence (AI) is progressi… (voir plus)ng rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI (1), there is a lack of consensus about how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.
Managing extreme AI risks amid rapid progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Yuval Noah Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (voir 5 de plus)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Preparation requires technical research and development, as well as adaptive, proactive governance Artificial intelligence (AI) is progressi… (voir plus)ng rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI (1), there is a lack of consensus about how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.
Managing extreme AI risks amid rapid progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Yuval Noah Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (voir 5 de plus)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Preparation requires technical research and development, as well as adaptive, proactive governance Artificial intelligence (AI) is progressi… (voir plus)ng rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI (1), there is a lack of consensus about how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.
Managing extreme AI risks amid rapid progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Trevor Darrell
Yuval Noah Harari
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (voir 5 de plus)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Preparation requires technical research and development, as well as adaptive, proactive governance Artificial intelligence (AI) is progressi… (voir plus)ng rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI (1), there is a lack of consensus about how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.