Portrait de Zhuan Shi

Zhuan Shi

Postdoctorat - McGill
Superviseur⋅e principal⋅e
Sujets de recherche
Apprentissage fédéré
Désapprentissage
Modèles de diffusion
Modèles génératifs
Sécurité de l'IA

Publications

Neighbor-Aware Localized Concept Erasure in Text-to-Image Diffusion Models
Concept erasure in text-to-image diffusion models seeks to remove undesired concepts while preserving overall generative capability. Localiz… (voir plus)ed erasure methods aim to restrict edits to the spatial region occupied by the target concept. However, we observe that suppressing a concept can unintentionally weaken semantically related neighbor concepts, reducing fidelity in fine-grained domains. We propose Neighbor-Aware Localized Concept Erasure (NLCE), a training-free framework designed to better preserve neighboring concepts while removing target concepts. It operates in three stages: (1) a spectrally-weighted embedding modulation that attenuates target concept directions while stabilizing neighbor concept representations, (2) an attention-guided spatial gate that identifies regions exhibiting residual concept activation, and (3) a spatially-gated hard erasure that eliminates remaining traces only where necessary. This neighbor-aware pipeline enables localized concept removal while maintaining the surrounding concept neighborhood structure. Experiments on fine-grained datasets (Oxford Flowers, Stanford Dogs) show that our method effectively removes target concepts while better preserving closely related categories. Additional results on celebrity identity, explicit content and artistic style demonstrate robustness and generalization to broader erasure scenarios.
Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs
As multilingual large language models become more widely used, ensuring their safety and fairness across diverse linguistic contexts present… (voir plus)s unique challenges. While existing research on machine unlearning has primarily focused on monolingual settings, typically English, multilingual environments introduce additional complexities due to cross-lingual knowledge transfer and biases embedded in both pretraining and fine-tuning data. In this work, we study multilingual unlearning using the Aya-Expanse 8B model under two settings: (1) data unlearning and (2) concept unlearning. We extend benchmarks for factual knowledge and stereotypes to ten languages through translation: English, French, Arabic, Japanese, Russian, Farsi, Korean, Hindi, Hebrew, and Indonesian. These languages span five language families and a wide range of resource levels. Our experiments show that unlearning in high-resource languages is generally more stable, with asymmetric transfer effects observed between typologically related languages. Furthermore, our analysis of linguistic distances indicates that syntactic similarity is the strongest predictor of cross-lingual unlearning behavior.
Localized-Attention-Guided Concept Erasure for Text-to-Image Diffusion Models