Publications

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Michael Cohen

Joumana Ghosn

Adam Oberman

Jesse Richardson

Oliver Richardson

Marc-Antoine Rondeau

Pierre-Luc St-Charles

David Williams-King

The leading AI companies are increasingly focused on building generalist AI agents -- systems that can autonomously plan, act, and pursue go… (voir plus)als across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by malicious actors to a potentially irreversible loss of human control. We discuss how these risks arise from current AI training methods. Indeed, various scenarios and experiments have demonstrated the possibility of AI agents engaging in deception or pursuing goals that were not specified by human operators and that conflict with human interests, such as self-preservation. Following the precautionary principle, we see a strong need for safer, yet still useful, alternatives to the current agency-driven trajectory. Accordingly, we propose as a core building block for further advances the development of a non-agentic AI system that is trustworthy and safe by design, which we call Scientist AI. This system is designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans. It comprises a world model that generates theories to explain data and a question-answering inference machine. Both components operate with an explicit notion of uncertainty to mitigate the risks of overconfident predictions. In light of these considerations, a Scientist AI could be used to assist human researchers in accelerating scientific progress, including in AI safety. In particular, our system can be employed as a guardrail against AI agents that might be created despite the risks involved. Ultimately, focusing on non-agentic AI may enable the benefits of AI innovation while avoiding the risks associated with the current trajectory. We hope these arguments will motivate researchers, developers, and policymakers to favor this safer path.

2024-12-31

arXiv (prépublication)

doi.org

arxiv.org

A Survey of Contextual Optimization Methods for Decision-Making under Uncertainty

Utsav Sadana

Abhilash Chenreddy

Erick Delage

Alexandre Forel

Emma Frejinger

Thibaut Vidal

2024-12-31

European Journal of Operational Research (publié)

doi.org

arxiv.org

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning

Prateek Yadav

Colin Raffel

Mohammed Muqeeth

Lucas Caccia

Haokun Liu

Tianlong Chen

Mohit Bansal

Leshem Choshen

Alessandro Sordoni

The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particula… (voir plus)r domain or task. Model MoErging methods aim to recycle expert models to create an aggregate system with improved performance or generalization. A key component of MoErging methods is the creation of a router that decides which expert model(s) to use for a particular input or application. The promise, effectiveness, and large design space of MoErging has spurred the development of many new methods over the past few years. This rapid pace of development has made it challenging to compare different MoErging methods, which are rarely compared to one another and are often validated in different experimental setups. To remedy such gaps, we present a comprehensive survey of MoErging methods that includes a novel taxonomy for cataloging key design choices and clarifying suitable applications for each method. Apart from surveying MoErging research, we inventory software tools and applications that make use of MoErging. We additionally discuss related fields of study such as model merging, multitask learning, and mixture-of-experts models. Taken as a whole, our survey provides a unified overview of existing MoErging methods and creates a solid foundation for future work in this burgeoning field.

2024-12-31

Trans. Mach. Learn. Res. (publié)

doi.org

openreview.net

Task Mapping Strategies for Electric Power System Simulations on Heterogeneous Clusters

Julie Durette

Gunes Karabulut Kurt

Antoine Lesage-Landry

In this work, we propose improved task mapping strategies for real-time electric power system simulations on heterogeneous computing cluster… (voir plus)s, considering both heterogeneous communication links and processing capacities, with a focus on bottleneck objectives. We approach the problem through two complementary models: the bottleneck quadratic semi-assignment problem (BQSAP), which optimizes task configuration for a fixed number of computing nodes while minimizing communication and computation costs; and the variable-size bin packing problem with quadratic communication constraints (Q-VSBPP), which minimizes the required number of computing nodes, valuable for resource provisioning scenarios. We extend the PuLP library to solve approximately both problems, explicitly including communication costs and processing constraints, and formalize the nomenclature and definitions for bottleneck objectives in graph partitioning. This formalization fills a gap in the existing literature and provides a framework for the rigorous analysis and application of task mapping techniques to real-time electric power system simulation. Finally, we provide a quantitative study and benchmark the extended PuLP library with the SCOTCH partitioning library in the context of real-time electromagnetic transient (EMT) simulation task mapping.

2024-12-31

SmartGridComm (publié)

doi.org

A Text-guided Protein Design Framework

Shengchao Liu

Yutao Zhu

Yanjing Li

Zhuoxinran Li

Jiarui Lu

Zhao Xu

Weili Nie

Anthony Gitter

Chaowei Xiao

Jian Tang

Arvind Ramanathan

Hongyu Guo

Anima Anandkumar

Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowled… (voir plus)ge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design. ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441K text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.

2024-12-31

Nat. Mac. Intell. (publié)

doi.org

arxiv.org

On the Analysis and Distillation of Emergent Outlier Properties in Pre-trained Language Models

Tianyang Zhao

Kunwar Yashraj Singh

Srikar Appalaraju

Peng Tang

Ying Nian Wu

Li Erran Li

Li

Rishabh Agarwal

Nino Vieillard

Yongchao Zhou

Piotr Stańczyk

Sabela Ramos Garea

Matthieu Geist

Rohan Anil

Andrew M. Dai

Melvin Orhan Firat

Dmitry Lepikhin

Alexandre Passos

Siamak Shakeri

Emanuel Taropa … (voir 478 de plus)

Paige Bailey

Zhifeng Chen

Eric Chu

Jonathan H. Clark

Laurent El

Yanping Huang

K. Meier-Hellstern

Gaurav Mishra

Erica Moreira

Mark Omernick

Kevin Robinson

Sebastian Ruder

Yi Tay

Kefan Xiao

Yuanzhong Xu

Yujing Zhang

Gustavo Hernández Abrego

Junwhan Ahn

Jacob Austin

Paul R. Barham

Jan Botha

James Bradbury

Siddhartha Brahma

Kevin Brooks

M. Catasta

Yong Cheng

Colin Cherry

Christopher A. Choquette-Choo

Aakanksha Chowdhery

Clé-ment Crepy

Shachi Dave

Mostafa Dehghani

Sunipa Dev

Jacob Devlin

Mark Díaz

Nan Du

Ethan Dyer

Vladimir Feinberg

Fangxiaoyu Feng

Vlad Fienber

Markus Freitag

Xavier Garcia

Sebastian Gehrmann

Lucas Gonzalez

Guy Gur-Ari

Steven Hand

Hadi Hashemi

Le Hou

Joshua Howland

Andrea Hu

Jeffrey Hui

Jeremy Hur-witz

Michael Acheson Isard

Abe Ittycheriah

Matthew Jagiel-ski

Wenhao Jia

Kathleen Kenealy

M. Krikun

Sneha Kudugunta 0001

Chang Lan

Kather-ine Lee

Benjamin Lee

Music Eric Li

Wei Li

YaGuang Li

Li Jian

Hyeontaek Li

Hanzhao Lim

Zhongtao Lin

Liu Frederick

Marcello Liu

Aroma Maggioni

Mahendru Joshua

Vedant Maynez

Maysam Misra

Moussalem Zachary

John Nado

E. Nham

Andrew Ni

Alicia Nys-trom

Marie Parrish

M. Pellat

Polacek Alex

Reiner Polozov

Siyuan Pope

Emily Qiao

Reif Bryan

Parker Richter

Alex Riley

Castro Ros

Aurko Roy

Brennan Saeta

Rajkumar Samuel

Renee Shelby

Ambrose Slone

Daniel Smilkov

David R. So

Daniel Sohn

Simon Tokumine

Dasha Valter

Haim-ing Bao

Mo Bavarian

Jeff Belgum

Ir-wan Bello

Jake Berdine

Gabriel Bernadett-Shapiro

Christopher Berner

Lenny Bogdonoff

Oleg Boiko

Madelaine Boyd

Anna-Luisa Brakman

Greg Brock-man

Tim Brooks

M. Brundage

Kevin Button

Trevor Cai

Rosie Campbell

Andrew Cann

Brittany Carey

Chelsea Carlson

Rory Carmichael

Brooke Chan

Che Chang

Fotis Chantzis

Derek Chen

Sully Chen

Ruby Chen

Jason Chen

Mark Chen

Benjamin Chess

Chester Cho

Hyung Casey Chu

Won Chung

Dave Cummings

Jeremiah Currier

Yunxing Dai

Tarun Goel

Gabriel Gogineni

Rapha Goh

Jonathan Gontijo-Lopes

Morgan Gordon

Scott Grafstein

Ryan Gray

Joshua Greene

Shixiang Shane Gross

Yufei Gu

Chris Guo

Jesse Hallacy

Jeff Han

Harris Yuchen

Mike He

Johannes Heaton

C. Heidecke

Alan Hesse

Wade Hickey

Peter Hickey

Hoeschele Brandon

Kenny Houghton

Shengli Hsu

Xin Hu

Joost Hu

Shantanu Huizinga

Shawn Jain

Jain Joanne

Angela Jang

Roger Jiang

Haozhun Jiang

Denny Jin

Shino Jin

Billie Jomoto

Hee-woo Jonn

Tomer Jun

Łukasz Kaftan

Ali Kaiser

Ingmar Ka-mali

Kanitscheider

Nitish Shirish

Keskar Tabarak

Logan Khan

J. Kilpatrick

Kim Christina

Yongjik Kim

Jan Hendrik Kim

Jamie Kirch-ner

Matt Kiros

Daniel Knight

Kokotajlo Łukasz

A. Kondraciuk

Aris Kondrich

Kyle Kon-stantinidis

Gretchen Kosic

Vishal Krueger

Michael Kuo

Ikai Lampe

Teddy Lan

Jan Lee

Jade Leike

Daniel Leung

Chak Ming Levy

Li Rachel

Molly Lim

Stephanie Lin

Mateusz Lin

Theresa Litwin

Ryan Lopez

Patricia Lowe

Lue Anna

Kim Makanju

S. Malfacini

Todor Manning

Yaniv Markov

Bianca Markovski

Katie Martin

Andrew Mayer

Bob Mayne

Scott Mayer McGrew

Christine McKinney

Paul McLeavey

McMillan Jake

David McNeil

Aalok Medina

Jacob Mehta

Luke Menick

Andrey Metz

Pamela Mishchenko

Vinnie Mishkin

Evan Monaco

Daniel Morikawa

Tong Mossing

Mira Mu

Oleg Murati

David Murk

Ashvin Mély

Reiichiro Nair

Rajeev Nakano

Nayak Arvind

Richard Neelakantan

Hyeonwoo Ngo

Noh Long

Cullen Ouyang

Jakub O’Keefe

Alex Pachocki

J. Paino

Ashley Palermo

Pantuliano

Carl Ross

Bob Rotsted

Henri Roussez

Nick Ry-der

Mario Saltarelli

Ted Sanders

Shibani Santurkar

Girish Sastry

Heather Schmidt

David Schnurr

John Schulman

Daniel Selsam

Kyla Sheppard

Toki Sherbakov

Jessica Shieh

Sarah Shoker

Pranav Shyam

Szymon Sidor

Eric Sigler

Maddie Simens

Jordan Sitkin

Katarina Slama

Ian Sohl

Benjamin D. Sokolowsky

Yang Song

Natalie Staudacher

Clemens Winter

Samuel Wolrich

Hannah Wong

Lauren Workman

Sherwin Wu

Michael Wu

Kai Xiao

Tao Xu

Sarah Yoo

Kevin Yu

Qim-ing Yuan

Wojciech Zaremba

Rowan G. Zellers

Chong Zhang

Marvin Zhang

Tianhao Shengjia Zhao

Ouyang Long

Jeff Wu

Xu Jiang

Diogo Almeida

C. Wainwright

Pamela Mishkin

Sandhini Agarwal

Alex Ray

Jacob Hilton

Fraser Kelton

Luke Miller

Amanda Askell

Peter Welinder

Paul F. Christiano

Jan Leike

Ryan Lowe. 2022

Adam Paszke

Sam Gross

Francisco Massa

Adam Lerer

Gregory Chanan

Trevor Killeen

Ze-Bin Lin

Natalia Gimelshein

L. Antiga

Alban Desmaison

Andreas Köpf

Edward Yang

Zachary DeVito

Martin Raison

A. Tejani

Sasank Chilamkurthy

Benoit Steiner

Giovanni Puccetti

Anna Rogers

Aleksandr Drozd

Felice

Dell’Orletta. 2022. Outlier

Alec Radford

Jong Wook Kim

Chris Hallacy

Aditya Ramesh

Gabriel Goh

Girish Sas-try

J. Clark

Rewon Child

David Luan

Victor Sanh

Alex Webson

Colin Raffel

Stephen H. Bach

Lintang A. Sutawika

Zaid Alyafeai

Antoine Chaffin

Arnaud Stiegler

Arun Raja

Manan Dey

Saiful Bari

Canwen Xu

Urmish Thakker

Shanya Sharma Sharma

Eliza Szczechla

Taewoon Kim 0002

Gunjan Chhablani

Ni-hal Nayak

Debajyoti Datta

Mike Jonathan Chang

Tian-Jian Jiang

Han Wang

Matteo Manica

Sheng Shen

Zheng-Xin Yong

Harshit Pandey

Rachel Bawden

Thomas Wang

Trishala Neeraj

Jos Rozen

Abheesht Sharma

Thibault Févry

Jason Alan Fries

Ryan Teehan

Teven Le Scao

Stella Biderman

Leo Gao

Thomas Wolf 0008

A. M. R. 2022

Multi-task

Richard Socher

Alex Perelygin

Jean Wu

Jason Chuang

Christopher D Manning

Andrew Ng

Christopher Potts

Recursive

Aarohi Srivastava

Abhinav Rastogi

Abhishek Rao

Abu Awal

Md. Shoeb

Abubakar Abid

Adam Fisch

Adam R. Brown

Adam Santoro

Aditya Gupta

Adrià Garriga-Alonso

Agnieszka Kluska

Aitor Lewkowycz

Akshat Agarwal

Alethea Power

Alex Warstadt

Alexander W. Kocurek

Ali Safaya

Ali Tazarv

Alice Xiang

Alicia Parrish

Allen Nie

Aman Hussain

Amanda Dsouza

Ameet Rahane

Anantharaman S. Iyer

Anders Johan Andreassen

Andrea Madotto

Andrea Santilli

Andreas Stuhlmüller

Andrew La

Andrew Lampinen

Andy Zou

Angela Jiang

Angelica Chen

Anh Vuong

Animesh Gupta

Anna Gottardi

Antonio Norelli

Anu Venkatesh

Arash Gholamidavoodi

Arfa Tabassum

Arul Menezes

Arun Kirubara-jan

Asher Mullokandov

Ashish Sabharwal

Austin Herrick

Avia Efrat

Aykut Erdem

Ayla Karaka¸s

Ryan Roberts

Bao Sheng Loe

Barret Zoph

Bartłomiej Bojanowski

Batuhan Özyurt

Behnam Hedayatnia

Behnam Neyshabur

Benjamin Inden

Benno Stein

Berk Ekmekci

Bill Yuchen

Blake Lin

Bryan Howald

Cameron Orinion

Cameron Diao

Catherine Dour

Cedrick Stinson

César Argueta

Chandan Ferri

Charles Singh

Chenlin Rathkopf

Chitta Meng

C. Baral

Chris Wu

Chris Callison-Burch

Christopher Waites

Christo-pher D Voigt

Cindy Potts

E. RamirezClara

Clemencia Rivera

Colin Siro

Court-ney Raffel

Cristina Ashcraft

Damien Garbacea

Sileo Dan

Dan Garrette

Dan Hendrycks

Dan Kilman

C. Roth

C. Daniel Freeman

Daniel Khashabi

Daniel Levy

Daniel Moseguí González

Danielle Perszyk

Danny Hernandez

Danqi Chen

2024-12-31

NAACL (Long Papers) (publié)

doi.org

The BrowserGym Ecosystem for Web Agent Research

Thibault Le Sellier De Chezelles

Maxime Gasse

Alexandre Lacoste

Massimo Caccia

Lawrence Keunho Jang

Ori Yoran

Dehan Kong

Frank F. Xu

Siva Reddy

Quentin Cappart

Graham Neubig

Ruslan Salakhutdinov

Nicolas Chapados

The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (voir plus)utomation and Large Language Models (LLMs). Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. In an earlier work, Drouin et al. (2024) introduced BrowserGym which aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature and includes AgentLab, a complementary framework that aids in agent creation, testing, and analysis. Our proposed ecosystem offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks made available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.

2024-12-31

Trans. Mach. Learn. Res. (publié)

doi.org

openreview.net

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions

Devin Kwok

Gül Sena Altıntaş

Colin Raffel

David Rolnick

Neural network training is inherently sensitive to initialization and the randomness induced by stochastic gradient descent. However, it is … (voir plus)unclear to what extent such effects lead to meaningfully different networks, either in terms of the models' weights or the underlying functions that were learned. In this work, we show that during the initial "chaotic" phase of training, even extremely small perturbations reliably causes otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time. We quantify this divergence through (i)

2024-12-31

ICML (publié)

doi.org

proceedings.mlr.press

"On the goals of linguistic theory": Revisiting Chomskyan theories in the era of AI

Eva Portelance

Masoud Jasbi

Theoretical linguistics seeks to explain what human language is, and why. Linguists and cognitive scientists have proposed different theoret… (voir plus)ical models of what language is, as well as cognitive factors that shape it, and allow humans to 'produce', 'understand', and 'acquire' natural languages. However, humans may no longer be the only ones learning to 'generate', 'parse', and 'learn' natural language: artificial intelligence (AI) models such as large language models are proving to have impressive linguistic capabilities. Many are thus questioning what role, if any, such models should play in helping theoretical linguistics reach its ultimate research goals? In this paper, we propose to answer this question, by reiterating the tenets of generative linguistics, a leading school of thought in the field, and by considering how AI models as theories of language relate to each of these important concepts. Specifically, we consider three foundational principles, finding roots in the early works of Noam Chomsky: (1) levels of theoretical adequacy; (2) procedures for linguistic theory development; (3) language learnability and Universal Grammar. In our discussions of each principle, we give special attention to two types of AI models: neural language models and neural grammar induction models. We will argue that such models, in particular neural grammar induction models, do have a role to play, but that this role is largely modulated by the stance one takes regarding each of these three guiding principles.

2024-12-31

Nat. Comput. Sci. (publié)

doi.org

arxiv.org

The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning

Milad Aghajohari

Kamran Chitsaz

Amirhossein Kazemnejad

Reinforcement learning (RL) has recently become a strong recipe for training reasoning LLMs that produce long chains of thought (LongCoT). Y… (voir plus)et the standard RL"thinking environment", where the state is the prompt plus all prior reasoning tokens, makes the state unbounded and forces attention-based policies to pay quadratic compute as thoughts lengthen. We revisit the environment itself. We propose Markovian Thinking, a paradigm in which the policy advances reasoning while conditioning on a constant-size state, decoupling thinking length from context size. As an immediate consequence this yields linear compute with constant memory. We instantiate this idea with Delethink, an RL environment that structures reasoning into fixed-size chunks. Within each chunk, the model thinks as usual; at the boundary, the environment resets the context and reinitializes the prompt with a short carryover. Through RL, the policy learns to write a textual state near the end of each chunk sufficient for seamless continuation of reasoning after reset. Trained in this environment, an R1-Distill 1.5B model reasons in 8K-token chunks yet thinks up to 24K tokens, matching or surpassing LongCoT-RL trained with a 24K budget. With test-time scaling, Delethink continues to improve where LongCoT plateaus. The effect of linear compute is substantial: we empirically estimate at 96K average thinking length LongCoT-RL costs 27 H100-months vs. 7 for Delethink. Analysis at RL initialization shows off-the-shelf reasoning models (1.5B-120B) often sample Markovian traces zero-shot across diverse benchmarks, providing positive samples that make RL effective at scale. Our results show that redesigning the thinking environment is a powerful lever: it enables very long reasoning without quadratic overhead and opens a path toward efficient, scalable reasoning LLMs.

2024-12-31

arXiv (prépublication)

doi.org

openreview.net

The Normative Leadership of the World Health Organization : a quantitative analysis

Gaëlle Foucault

Catherine Régis

Jean-Louis Denis

Pierre Larouche

Miriam Cohen

2024-12-31

SSRN Electronic Journal (publié)

doi.org

The Singapore Consensus on Global AI Safety Research Priorities

Yoshua Bengio

Tegan Maharaj

Luke Ong

Stuart Russell

Dawn Song

Max Tegmark

Lan Xue

Ya-Qin Zhang

Stephen Casper

Wan Sie Lee

Sören Mindermann

Vanessa Wilfred

Vidhisha Balachandran

Fazl Barez

Michael Belinsky

Imane Bello

Malo Bourgon

Mark Brakel

Simeon Campos

Duncan Cass-Beggs … (voir 68 de plus)

Jiahao Chen

Rumman Chowdhury

Kuan Chua Seah

Jeff Clune

Juntao Dai

Agnes Delaborde

Nouha Dziri

Francisco Eiras

Joshua Engels

Jinyu Fan

Adam Gleave

Noah Goodman

Fynn Heide

Johannes Heidecke

Dan Hendrycks

Cyrus Hodes

Bryan Low Kian Hsiang

Minlie Huang

Sami Jawhar

Wang Jingyu

Adam Tauman Kalai

Meindert Kamphuis

Mohan Kankanhalli

Subhash Kantamneni

Mathias Bonde Kirk

Thomas Kwa

Jeffrey Ladish

Kwok-Yan Lam

Wan Lee Sie

Taewhi Lee

Xiaojian Li

Jiajun Liu

Chaochao Lu

Yifan Mai

Richard Mallah

Julian Michael

Nick Moës

Simon Möller

Kihyuk Nam

Kwan Yee Ng

Mark Nitzberg

Besmira Nushi

Seán Ó hÉigeartaigh

Alejandro Ortega

Pierre Peigné

James Petrie

Benjamin Prud'homme

Reihaneh Rabbany

Nayat Sanchez-Pi

Sarah Schwettmann

Buck Shlegeris

Saad Siddiqui

Aradhana Sinha

Martín Soto

Cheston Tan

Dong Ting

William Tjhi

Robert Trager

Brian Tse

Anthony Tung K. H.

Vanessa Wilfred

John Willes

Denise Wong

Wei Xu

Rongwu Xu

Yi Zeng

HongJiang Zhang

Djordje Žikelić

Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to en… (voir plus)sure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential – it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. This requires policymakers, industry, researchers and the broader public to collectively work toward securing positive outcomes from AI’s development. AI safety research is a key dimension. Given that the state of science today for building trustworthy AI does not fully cover all risks, accelerated investment in research is required to keep pace with commercially driven growth in system capabilities. Goals: The 2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety aims to support research in this important space by bringing together AI scientists across geographies to identify and synthesise research priorities in AI safety. The result, The Singapore Consensus on Global AI Safety Research Priorities, builds on the International AI Safety Report-A (IAISR) chaired by Yoshua Bengio and backed by 33 governments. By adopting a defence-in-depth model, this document organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment), and challenges with monitoring and intervening after deployment (Control). Through the Singapore Consensus, we hope to globally facilitate meaningful conversations between AI scientists and AI policymakers for maximally beneficial outcomes. Our goal is to enable more impactful R&D efforts to rapidly develop safety and evaluation mechanisms and foster a trusted ecosystem where AI is harnessed for the public good.

2024-12-31

arXiv (prépublication)

doi.org

arxiv.org

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Publications