Geraldin Nanfack

geraldin.nanfack@mila.quebec

Postdoctorat - Concordia University

Superviseur⋅e principal⋅e

Eugene Belilovsky

Co-superviseur⋅e

Guy Wolf

Publications

Adversarial Attacks on the Interpretation of Neuron Activation Maximization

G'eraldin Nanfack

Alexander Fulleringer

Jonathan Marty

Michael Eickenberg

Eugene Belilovsky

Feature visualization is one of the most popular techniques used to interpret the internal behavior of individual units of trained deep neur… (voir plus)al networks. Based on activation maximization, they consist of finding synthetic or natural inputs that maximize neuron activations. This paper introduces an optimization framework that aims to deceive feature visualization through adversarial model manipulation. It consists of finetuning a pre-trained model with a specifically introduced loss that aims to maintain model performance, while also significantly changing feature visualization. We provide evidence of the success of this manipulation on several pre-trained models for the classification task with ImageNet.

2024-03-24

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

arxiv.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Geraldin Nanfack

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Geraldin Nanfack

Publications