Portrait of Amartya Sanyal is unavailable

Amartya Sanyal

Alumni

Publications

Protecting against simultaneous data poisoning attacks
Neel Alex
Shoaib Ahmed Siddiqui
Current backdoor defense methods are evaluated against a single attack at a time. This is unrealistic, as powerful machine learning systems … (see more)are trained on large datasets scraped from the internet, which may be attacked multiple times by one or more attackers. We demonstrate that multiple backdoors can be simultaneously installed in a single model through parallel data poisoning attacks without substantially degrading clean accuracy. Furthermore, we show that existing backdoor defense methods do not effectively defend against multiple simultaneous attacks. Finally, we leverage insights into the nature of backdoor attacks to develop a new defense, BaDLoss (**Ba**ckdoor **D**etection via **Loss** Dynamics), that is effective in the multi-attack setting. With minimal clean accuracy degradation, BaDLoss attains an average attack success rate in the multi-attack setting of 7.98% in CIFAR-10, 10.29% in GTSRB, and 19.17% in Imagenette, compared to the average of other defenses at 63.44%, 74.83%, and 41.74% respectively. BaDLoss scales to ImageNet-1k, reducing the average attack success rate from 88.57% to 15.61%.
Open Problems in Machine Unlearning for AI Safety
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Adel Bibi
Aidan O'Gara
Robert Kirk
Benjamin Bucknall
Tim Fist
Luke Ong
Philip H. S. Torr
Kwok-Yan Lam
Robert F. Trager
Jose Hernandez-Orallo
Mor Geva
Yarin Gal
As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research… (see more), and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.
Open Problems in Machine Unlearning for AI Safety
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Adel Bibi
Aidan O'Gara
Robert Kirk
Benjamin Bucknall
Timothy Fist
Luke Ong
Philip Torr
Kwok-Yan Lam
Robert Trager
Jose Hernandez-Orallo
Mor Geva
Yarin Gal
As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research… (see more), and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.