Gintare Karolina Dziugaite

Improved Localized Machine Unlearning Through the Lens of Memorization

Reihaneh Torkzadehmahani

Reza Nasirigerdeh

Georgios Kaissis

Daniel Rueckert

Eleni Triantafillou

Machine unlearning refers to removing the influence of a specified subset of training data from a machine learning model, efficiently, after… (voir plus) it has already been trained. This is important for key applications, including making the model more accurate by removing outdated, mislabeled, or poisoned data. In this work, we study localized unlearning, where the unlearning algorithm operates on a (small) identified subset of parameters. Drawing inspiration from the memorization literature, we propose an improved localization strategy that yields strong results when paired with existing unlearning algorithms. We also propose a new unlearning algorithm, Deletion by Example Localization (DEL), that resets the parameters deemed-to-be most critical according to our localization strategy, and then finetunes them. Our extensive experiments on different datasets, forget sets and metrics reveal that DEL sets a new state-of-the-art for unlearning metrics, against both localized and full-parameter methods, while modifying a small subset of parameters, and outperforms the state-of-the-art localized unlearning in terms of test accuracy too.

2025-10-20

TMLR (accepté)

doi.org

openreview.net

Leveraging Per-Instance Privacy for Machine Unlearning

Nazanin Mohammadi Sepahvand

Anvith Thudi

Berivan Isik

Ashmita Bhattacharyya

Nicolas Papernot

Eleni Triantafillou

Daniel M. Roy

Gintare Karolina Dziugaite

We present a principled, per-instance approach to quantifying the difficulty of unlearning via fine-tuning. We begin by sharpening an analys… (voir plus)is of noisy gradient descent for unlearning (Chien et al., 2024), obtaining a better utility–unlearning trade-off by replacing worst-case privacy loss bounds with per-instance privacy losses (Thudi et al., 2024), each of which bounds the (R ényi) divergence to retraining without an individual datapoint. To demonstrate the practical applicability of our theory, we present empirical results showing that our theoretical predictions are born out both for Stochastic Gradient Langevin Dynamics (SGLD) as well as for standard fine-tuning without explicit noise. We further demonstrate that per-instance privacy losses correlate well with several existing data difficulty metrics, while also identifying harder groups of data points, and introduce novel evaluation methods based on loss barriers. All together, our findings provide a foundation for more efficient and adaptive unlearning strategies tailored to the unique properties of individual data points.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (publié)

proceedings.mlr.press

Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization

Phillip Huang Guo

Aaquib Syed

Abhay Sheshadri

Aidan Ewart

Gintare Karolina Dziugaite

Methods for knowledge editing and unlearning in large language models seek to edit or remove undesirable knowledge or capabilities without c… (voir plus)ompromising general language modeling performance. This work investigates how mechanistic interpretability—which, in part, aims to identify model components (circuits) associated to specific interpretable mechanisms that make up a model capability—can improve the precision and effectiveness of editing and unlearning. We find a stark difference in unlearning and edit robustness when training components localized by different methods. We highlight an important distinction between methods that localize components based primarily on preserving outputs, and those finding high level mechanisms with predictable intermediate states. In particular, localizing edits/unlearning to components associated with the lookup-table mechanism for factual recall 1) leads to more robust edits/unlearning across different input/output formats, and 2) resists attempts to relearn the unwanted information, while also reducing unintended side effects compared to baselines, on both a sports facts dataset and the CounterFact dataset across multiple models. We also find that certain localized edits disrupt the latent knowledge in the model more than any other baselines, making unlearning more robust to various attacks.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (publié)

proceedings.mlr.press

Continual Learning in Vision-Language Models via Aligned Model Merging

Ghada Sokar

Gintare Karolina Dziugaite

Anurag Arnab

Ahmet Iscen

Pablo Samuel Castro

Cordelia Schmid

Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors pl… (voir plus)asticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.

2025-06-01

arXiv (publié)

doi.org

arxiv.org

Continual Learning in Vision-Language Models via Aligned Model Merging

Ghada Sokar

Gintare Karolina Dziugaite

Anurag Arnab

Ahmet Iscen

Pablo Samuel Castro

Cordelia Schmid

Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors pl… (voir plus)asticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.

2025-05-30

ArXiv (prépublication)

arxiv.org

From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization

Shoaib Ahmed Siddiqui

Adrian Weller

David Scott Krueger

Gintare Karolina Dziugaite

Michael Curtis Mozer

Eleni Triantafillou

Recent unlearning methods for LLMs are vulnerable to relearning attacks: knowledge believed-to-be-unlearned re-emerges by fine-tuning on a s… (voir plus)mall set of (even seemingly-unrelated) examples. We study this phenomenon in a controlled setting for example-level unlearning in vision classifiers. We make the surprising discovery that forget-set accuracy can recover from around 50% post-unlearning to nearly 100% with fine-tuning on just the retain set -- i.e., zero examples of the forget set. We observe this effect across a wide variety of unlearning methods, whereas for a model retrained from scratch excluding the forget set (gold standard), the accuracy remains at 50%. We observe that resistance to relearning attacks can be predicted by weight-space properties, specifically,

2025-05-28

ArXiv (prépublication)

doi.org

arxiv.org

From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization

Shoaib Ahmed Siddiqui

Adrian Weller

David Krueger 0001

Gintare Karolina Dziugaite

M. C. Mozer

Eleni Triantafillou

Recent unlearning methods for LLMs are vulnerable to relearning attacks: knowledge believed-to-be-unlearned re-emerges by fine-tuning on a s… (voir plus)mall set of (even seemingly-unrelated) examples. We study this phenomenon in a controlled setting for example-level unlearning in vision classifiers. We make the surprising discovery that forget-set accuracy can recover from around 50% post-unlearning to nearly 100% with fine-tuning on just the retain set -- i.e., zero examples of the forget set. We observe this effect across a wide variety of unlearning methods, whereas for a model retrained from scratch excluding the forget set (gold standard), the accuracy remains at 50%. We observe that resistance to relearning attacks can be predicted by weight-space properties, specifically,

2025-05-28

ArXiv (prépublication)

arxiv.org