Eugene Belilovsky

From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation

G'eraldin Nanfack

Michael Eickenberg

Understanding the inner working functionality of large-scale deep neural networks is challenging yet crucial in several high-stakes applicat… (voir plus)ions. Mechanistic inter- pretability is an emergent field that tackles this challenge, often by identifying human-understandable subgraphs in deep neural networks known as circuits. In vision-pretrained models, these subgraphs are usually interpreted by visualizing their node features through a popular technique called feature visualization. Recent works have analyzed the stability of different feature visualization types under the adversarial model manipulation framework. This paper starts by addressing limitations in existing works by proposing a novel attack called ProxPulse that simultaneously manipulates the two types of feature visualizations. Surprisingly, when analyzing these attacks under the umbrella of visual circuits, we find that visual circuits show some robustness to ProxPulse. We, therefore, introduce a new attack based on ProxPulse that unveils the manipulability of visual circuits, shedding light on their lack of robustness. The effectiveness of these attacks is validated using pre-trained AlexNet and ResNet-50 models on ImageNet.

2024-06-03

ArXiv (prépublication)

μLO: Compute-Efficient Meta-Generalization of Learned Optimizers

Benjamin Thérien

Charles-Étienne Joseph

Boris Knyazev

Edouard Oyallon

2024-05-31

ArXiv (prépublication)

μLO: Compute-Efficient Meta-Generalization of Learned Optimizers

Benjamin Thérien

Charles-Étienne Joseph

Boris Knyazev

Edouard Oyallon

2024-05-31

ArXiv (prépublication)

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Guy Wolf

Ensembling multiple models enhances predictive performance by utilizing the varied learned features of the different models but incurs signi… (voir plus)ficant computational and storage costs. Model fusion, which combines parameters from multiple models into one, aims to mitigate these costs but faces practical challenges due to the complex, non-convex nature of neural network loss landscapes, where learned minima are often separated by high loss barriers. Recent works have explored using permutations to align network features, reducing the loss barrier in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our method of aligning models leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder many models setting where more than 2 models are merged, and we find that CCA Merge works significantly better in this setting than past methods.

2024-05-01

ICML.cc/2024/Conference (poster)

openreview.net

Adversarial Attacks on the Interpretation of Neuron Activation Maximization

G'eraldin Nanfack

Alexander Fulleringer

Jonathan Marty

Michael Eickenberg

Feature visualization is one of the most popular techniques used to interpret the internal behavior of individual units of trained deep neur… (voir plus)al networks. Based on activation maximization, they consist of finding synthetic or natural inputs that maximize neuron activations. This paper introduces an optimization framework that aims to deceive feature visualization through adversarial model manipulation. It consists of finetuning a pre-trained model with a specifically introduced loss that aims to maintain model performance, while also significantly changing feature visualization. We provide evidence of the success of this manipulation on several pre-trained models for the classification task with ImageNet.

2024-03-24

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes ava… (voir plus)ilable. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English

2024-03-13

ArXiv (prépublication)

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Anthony

Timothee LESORT

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes ava… (voir plus)ilable. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English

2024-03-13

ArXiv (prépublication)

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes ava… (voir plus)ilable. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English

2024-03-13

ArXiv (prépublication)

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes ava… (voir plus)ilable. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English

2024-03-13

ArXiv (prépublication)

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes ava… (voir plus)ilable. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English

2024-03-13

ArXiv (prépublication)

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho