Portrait de Nouha Dziri n'est pas disponible

Nouha Dziri

Alumni

Publications

The Singapore Consensus on Global AI Safety Research Priorities
Luke Ong
Stuart Russell
Dawn Song
Max Tegmark
Lan Xue
Ya-Qin Zhang
Stephen Casper
Wan Sie Lee
Vanessa Wilfred
Vidhisha Balachandran
Fazl Barez
Michael Belinsky
Imane Bello
Malo Bourgon
Mark Brakel
Sim'eon Campos
Duncan Cass-Beggs … (voir 67 de plus)
Jiahao Chen
Rumman Chowdhury
Kuan Chua Seah
Jeff Clune
Juntao Dai
Agnès Delaborde
Francisco Eiras
Joshua Engels
Jinyu Fan
Adam Gleave
Noah D. Goodman
Fynn Heide
Johannes Heidecke
Dan Hendrycks
Cyrus Hodes
Bryan Low Kian Hsiang
Minlie Huang
Sami Jawhar
Jingyu Wang
Adam Tauman Kalai
Meindert Kamphuis
Mohan S. Kankanhalli
Subhash Kantamneni
Mathias Bonde Kirk
Thomas Kwa
Jeffrey Ladish
Kwok-Yan Lam
Wan Lee Sie
Taewhi Lee
Xiaojian Li
Jiajun Liu
Chaochao Lu
Yifan Mai
Richard Mallah
Julian Michael
Nick Moës
Simon Möller
Kihyuk Nam
Kwan Yee Ng
Mark Nitzberg
Besmira Nushi
Sean O hEigeartaigh
Alejandro Ortega
Pierre Peigné
James Petrie
Nayat Sanchez-Pi
Sarah Schwettmann
Buck Shlegeris
Saad Siddiqui
Aradhana Sinha
Martín Soto
Cheston Tan
Dong Ting
William-Chandra Tjhi
Robert Trager
Brian Tse
H. AnthonyTungK.
John Willes
Denise Wong
W. Xu
Rongwu Xu
Yi Zeng
HongJiang Zhang
Djordje Zikelic
Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to en… (voir plus)sure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential – it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. This requires policymakers, industry, researchers and the broader public to collectively work toward securing positive outcomes from AI’s development. AI safety research is a key dimension. Given that the state of science today for building trustworthy AI does not fully cover all risks, accelerated investment in research is required to keep pace with commercially driven growth in system capabilities. Goals: The 2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety aims to support research in this important space by bringing together AI scientists across geographies to identify and synthesise research priorities in AI safety. The result, The Singapore Consensus on Global AI Safety Research Priorities, builds on the International AI Safety Report-A (IAISR) chaired by Yoshua Bengio and backed by 33 governments. By adopting a defence-in-depth model, this document organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment), and challenges with monitoring and intervening after deployment (Control). Through the Singapore Consensus, we hope to globally facilitate meaningful conversations between AI scientists and AI policymakers for maximally beneficial outcomes. Our goal is to enable more impactful R&D efforts to rapidly develop safety and evaluation mechanisms and foster a trusted ecosystem where AI is harnessed for the public good.
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
Zhangzhi Peng
Zachary Quinn
Cheng-Hao Liu
Michael M. Bronstein
Pranam Chatterjee
The Singapore Consensus on Global AI Safety Research Priorities
Luke Ong
Stuart Russell
Dawn Song
Max Tegmark
Lan Xue
Ya-Qin Zhang
Stephen Casper
Wan Sie Lee
Vanessa Wilfred
Vidhisha Balachandran
Fazl Barez
Michael Belinsky
Imane Bello
Malo Bourgon
Mark Brakel
Sim'eon Campos
Duncan Cass-Beggs … (voir 67 de plus)
Jiahao Chen
Rumman Chowdhury
Kuan Chua Seah
Jeff Clune
Juntao Dai
Agnès Delaborde
Francisco Eiras
Joshua Engels
Jinyu Fan
Adam Gleave
Noah D. Goodman
Fynn Heide
Johannes Heidecke
Dan Hendrycks
Cyrus Hodes
Bryan Low Kian Hsiang
Minlie Huang
Sami Jawhar
Jingyu Wang
Adam Tauman Kalai
Meindert Kamphuis
Mohan S. Kankanhalli
Subhash Kantamneni
Mathias Bonde Kirk
Thomas Kwa
Jeffrey Ladish
Kwok-Yan Lam
Wan Lee Sie
Taewhi Lee
Xiaojian Li
Jiajun Liu
Chaochao Lu
Yifan Mai
Richard Mallah
Julian Michael
Nick Moës
Simon Möller
Kihyuk Nam
Kwan Yee Ng
Mark Nitzberg
Besmira Nushi
Sean O hEigeartaigh
Alejandro Ortega
Pierre Peigné
James Petrie
Nayat Sanchez-Pi
Sarah Schwettmann
Buck Shlegeris
Saad Siddiqui
Aradhana Sinha
Martín Soto
Cheston Tan
Dong Ting
William-Chandra Tjhi
Robert Trager
Brian Tse
H. AnthonyTungK.
John Willes
Denise Wong
W. Xu
Rongwu Xu
Yi Zeng
HongJiang Zhang
Djordje Zikelic
Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to en… (voir plus)sure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on AI (SCAI): International Scientific Exchange on AI Safety" aimed to support research in this space by bringing together AI scientists across geographies to identify and synthesise research priorities in AI safety. This resulting report builds on the International AI Safety Report chaired by Yoshua Bengio and backed by 33 governments. By adopting a defence-in-depth model, this report organises AI safety research domains into three types: challenges with creating trustworthy AI systems (Development), challenges with evaluating their risks (Assessment), and challenges with monitoring and intervening after deployment (Control).
The Singapore Consensus on Global AI Safety Research Priorities
Luke Ong
Stuart Russell
Dawn Song
Max Tegmark
Lan Xue
Ya-Qin Zhang
Stephen Casper
Wan Sie Lee
Vanessa Wilfred
Vidhisha Balachandran
Fazl Barez
Michael Belinsky
Imane Bello
Malo Bourgon
Mark Brakel
Sim'eon Campos
Duncan Cass-Beggs … (voir 67 de plus)
Jiahao Chen
Rumman Chowdhury
Kuan Chua Seah
Jeff Clune
Juntao Dai
Agnès Delaborde
Francisco Eiras
Joshua Engels
Jinyu Fan
Adam Gleave
Noah D. Goodman
Fynn Heide
Johannes Heidecke
Dan Hendrycks
Cyrus Hodes
Bryan Low Kian Hsiang
Minlie Huang
Sami Jawhar
Jingyu Wang
Adam Tauman Kalai
Meindert Kamphuis
Mohan S. Kankanhalli
Subhash Kantamneni
Mathias Bonde Kirk
Thomas Kwa
Jeffrey Ladish
Kwok-Yan Lam
Wan Lee Sie
Taewhi Lee
Xiaojian Li
Jiajun Liu
Chaochao Lu
Yifan Mai
Richard Mallah
Julian Michael
Nick Moës
Simon Möller
Kihyuk Nam
Kwan Yee Ng
Mark Nitzberg
Besmira Nushi
Sean O hEigeartaigh
Alejandro Ortega
Pierre Peigné
James Petrie
Nayat Sanchez-Pi
Sarah Schwettmann
Buck Shlegeris
Saad Siddiqui
Aradhana Sinha
Martín Soto
Cheston Tan
Dong Ting
William Tjhi
Robert Trager
Brian Tse
H. AnthonyTungK.
John Willes
Denise Wong
Wei Xu
Rongwu Xu
Yi Zeng 0005
HongJiang Zhang
Djordje Zikelic
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
Zhangzhi Peng
Zachary Quinn
Cheng-Hao Liu
Michael M. Bronstein
Pranam Chatterjee
Generative modeling of discrete data underlies important applications spanning text-based agents like ChatGPT to the design of the very buil… (voir plus)ding blocks of life in protein sequences. However, application domains need to exert control over the generated data by steering the generative process - typically via RLHF - to satisfy a specified property, reward, or affinity metric. In this paper, we study the problem of steering Masked Diffusion Models (MDMs), a recent class of discrete diffusion models that offer a compelling alternative to traditional autoregressive models. We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference by learning to sample from a target Bayesian posterior. Our DDPP framework leads to a family of three novel objectives that are all simulation-free, and thus scalable while applying to general non-differentiable reward functions. Empirically, we instantiate DDPP by steering MDMs to perform class-conditional pixel-level image modeling, RLHF-based alignment of MDMs using text-based rewards, and finetuning protein language models to generate more diverse secondary structures and shorter proteins. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue
Ehsan Kamalloo
Osmar Zaiane
Mo Yu
Edoardo Ponti
Abstract The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on know… (voir plus)ledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark. We observe that FaithDial is more faithful than WoW while also maintaining engaging conversations. We show that FaithDial can serve as training signal for: i) a hallucination critic, which discriminates whether an utterance is faithful or not, and boosts the performance by 12.8 F1 score on the BEGIN benchmark compared to existing datasets for dialogue coherence; ii) high-quality dialogue generation. We benchmark a series of state-of-the-art models and propose an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness based on several automated metrics. Further, we find that the benefits of FaithDial generalize to zero-shot transfer on other datasets, such as CMU-Dog and TopicalChat. Finally, human evaluation reveals that responses generated by models trained on FaithDial are perceived as more interpretable, cooperative, and engaging.
On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?
Mo Yu
Osmar R Zaiane
Knowledge-grounded conversational models are known to suffer from producing factually invalid statements, a phenomenon commonly called hallu… (voir plus)cination. In this work, we investigate the underlying causes of this phenomenon: is hallucination due to the training data, or to the models? We conduct a comprehensive human study on both existing knowledge-grounded conversational benchmarks and several state-of-the-art models. Our study reveals that the standard benchmarks consist of > 60% hallucinated responses, leading to models that not only hallucinate but even amplify hallucinations. Our findings raise important questions on the quality of existing datasets and models trained using them. We make our annotations publicly available for future research.
On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?
Mo Yu
Osmar R Zaiane
Knowledge-grounded conversational models are known to suffer from producing factually invalid statements, a phenomenon commonly called hallu… (voir plus)cination. In this work, we investigate the underlying causes of this phenomenon: is hallucination due to the training data, or to the models? We conduct a comprehensive human study on both existing knowledge-grounded conversational benchmarks and several state-of-the-art models. Our study reveals that the standard benchmarks consist of > 60% hallucinated responses, leading to models that not only hallucinate but even amplify hallucinations. Our findings raise important questions on the quality of existing datasets and models trained using them. We make our annotations publicly available for future research.