Publications

Sample Compression Hypernetworks: From Generalization Bounds to Meta-Learning

Benjamin Leblanc

Mathieu Bazinet

Nathaniel D'Amours

Alexandre Drouin

Pascal Germain

Reconstruction functions are pivotal in sample compression theory, a framework for deriving tight generalization bounds. From a small sample… (see more) of the training set (the compression set) and an optional stream of information (the message), they recover a predictor previously learned from the whole training set. While usually fixed, we propose to learn reconstruction functions. To facilitate the optimization and increase the expressiveness of the message, we derive a new sample compression generalization bound for real-valued messages. From this theoretical analysis, we then present a new hypernetwork architecture that outputs predictors with tight generalization guarantees when trained using an original meta-learning framework. The results of promising preliminary experiments are then reported.

2024-10-09

NeurIPS.cc/2024/Workshop/Compression (published)

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

Fazl Barez

2024-10-09

ArXiv (preprint)

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

Fazl Barez

We investigate feature universality in large language models (LLMs), a research field that aims to understand how different models similarly… (see more) represent concepts in the latent spaces of their intermediate layers. Demonstrating feature universality allows discoveries about latent representations to generalize across several models. However, comparing features across LLMs is challenging due to polysemanticity, in which individual neurons often correspond to multiple features rather than distinct ones. This makes it difficult to disentangle and match features across different models. To address this issue, we employ a method known as dictionary learning by using sparse autoencoders (SAEs) to transform LLM activations into more interpretable spaces spanned by neurons corresponding to individual features. After matching feature neurons across models via activation correlation, we apply representational space similarity metrics like Singular Value Canonical Correlation Analysis to analyze these SAE features across different LLMs. Our experiments reveal significant similarities in SAE feature spaces across various LLMs, providing new evidence for feature universality.

2024-10-09

ArXiv (preprint)

Spiral volumetric optoacoustic tomography of reduced oxygen saturation in the spinal cord of M83 mouse model of Parkinson’s disease

Benjamin F. Combes

Sandeep Kumar Kalva

Pierre-Louis Benveniste

Agathe Tournant

Man Hoi Law

Joshua Newton

Maik Krüger

Rebecca Z Weber

Inês Dias

Daniela Noain

Xose Luis Dean-Ben

Uwe Konietzko

Christian R. Baumann

Per-Göran Gillberg

Christoph Hock

Roger M. Nitsch

Julien Cohen-Adad

Daniel Razansky

Ruiqing Ni

2024-10-09

European Journal of Nuclear Medicine and Molecular Imaging (published)

Spiral volumetric optoacoustic tomography of reduced oxygen saturation in the spinal cord of M83 mouse model of Parkinson's disease.

Benjamin F. Combes

Sandeep Kumar Kalva

Pierre-Louis Benveniste

Agathe Tournant

Man Hoi Law

Joshua Newton

Maik Krüger

Rebecca Z. Weber

Inês Dias

Daniela Noain

Xose Luis Dean-Ben

Uwe Konietzko

Christian R. Baumann

Per-Göran Gillberg

Christoph Hock

Roger M. Nitsch

Julien Cohen-Adad

Daniel Razansky

Ruiqing Ni

2024-10-09

European Journal of Nuclear Medicine and Molecular Imaging (published)

Steering Clear: A Systematic Study of Activation Steering in a Toy Setup

Dmitrii Krasheninnikov

Activation steering is a promising family of methods for controlling LLM outputs via targeted interventions on model activations. We introdu… (see more)ce a toy multi-label classification setup to systematically study activation steering methods, and experiment with several types of steering adapters — from steering vectors (adding a fixed vector to activations) to more expressive adapters involving projections. We evaluate the adapters across steering tasks of different complexities, for three notions of complexity: 1) how densely the features are packed in the representation space (roughly, number of features divided by the dimensionality of the activations), 2) number of attributes steered, and 3) number of values the steered attribute can take. We find that as task complexity is increased, steering vector methods perform worse, while the more expressive methods only take a performance hit when there is not enough data. On the other hand, steering vectors usually outperform more expressive methods in the low-data regime, regardless of task complexity. We conclude by discussing this work's limitations, which include our toy setup not modeling features represented in superposition or continuous features, and the lack of experiments with LLMs.

2024-10-09

NeurIPS.cc/2024/Workshop/MINT (accepted)

Towards Interpreting Visual Information Processing in Vision-Language Models

Clement Neo

Luke Ong

Philip Torr

Mor Geva

Fazl Barez

Vision-Language Models (VLMs) are powerful tools for processing and understanding text and images. We study the processing of visual tokens … (see more)in the language model component of LLaVA, a prominent VLM. Our approach focuses on analyzing the localization of object information, the evolution of visual token representations across layers, and the mechanism of integrating visual information for predictions. Through ablation studies, we demonstrated that object identification accuracy drops by over 70\% when object-specific tokens are removed. We observed that visual token representations become increasingly interpretable in the vocabulary space across layers, suggesting an alignment with textual tokens corresponding to image content. Finally, we found that the model extracts object information from these refined representations at the last token position for prediction, mirroring the process in text-only language models for factual association tasks. These findings provide crucial insights into how VLMs process and integrate visual information, bridging the gap between our understanding of language and vision models, and paving the way for more interpretable and controllable multimodal systems.

2024-10-09

ArXiv (preprint)

Towards Interpreting Visual Information Processing in Vision-Language Models

Clement Neo

Luke Ong

Philip Torr

Mor Geva

Fazl Barez

Vision-Language Models (VLMs) are powerful tools for processing and understanding text and images. We study the processing of visual tokens … (see more)in the language model component of LLaVA, a prominent VLM. Our approach focuses on analyzing the localization of object information, the evolution of visual token representations across layers, and the mechanism of integrating visual information for predictions. Through ablation studies, we demonstrated that object identification accuracy drops by over 70\% when object-specific tokens are removed. We observed that visual token representations become increasingly interpretable in the vocabulary space across layers, suggesting an alignment with textual tokens corresponding to image content. Finally, we found that the model extracts object information from these refined representations at the last token position for prediction, mirroring the process in text-only language models for factual association tasks. These findings provide crucial insights into how VLMs process and integrate visual information, bridging the gap between our understanding of language and vision models, and paving the way for more interpretable and controllable multimodal systems.

2024-10-09

ArXiv (preprint)

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

Itamar Pres

Laura Ruis

Ekdeep Singh Lubana

Representation engineering methods have recently shown promise for enabling efficient steering of model behavior. However, evaluation pipeli… (see more)nes for these methods have primarily relied on subjective demonstrations, instead of quantitative, objective metrics. We aim to take a step towards addressing this issue by advocating for four properties missing from current evaluations: (i) contexts sufficiently similar to downstream tasks should be used for assessing intervention quality; (ii) model likelihoods should be accounted for; (iii) evaluations should allow for standardized comparisons across different target behaviors; and (iv) baseline comparisons should be offered. We introduce an evaluation pipeline grounded in these criteria, offering both a quantitative and visual analysis of how effectively a given method works. We use this pipeline to evaluate two representation engineering methods on how effectively they can steer behaviors such as truthfulness and corrigibility, finding that some interventions are less effective than previously reported.

2024-10-09

NeurIPS.cc/2024/Workshop/MINT (accepted)

VCR: Visual Caption Restoration

Tianyu Zhang

Suyuchen Wang

Lu Li

Ge Zhang

Perouz Taslakian

Sai Rajeswar

Jie Fu

Bang Liu

Yoshua Bengio

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (see more)texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images. While numerous works have integrated text embedded in images into visual question-answering tasks, approaches to these tasks generally rely on optical character recognition or masked language modeling, thus reducing the task to mainly text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct a dataset for VCR called VCR-Wiki using images with captions from Wikipedia, comprising 2.11M English and 346K Chinese entities in both easy and hard split variants. Our results reveal that current vision language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-Wiki and the data construction code to facilitate future research.

2024-10-09

NeurIPS.cc/2024/Workshop/Sys2-Reasoning (poster)

VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning

Amirhossein Kazemnejad

Milad Aghajohari

Large language models (LLMs) are increasingly required to solve complex reasoning tasks, like mathematical problems, that involve multiple r… (see more)easoning steps before feedback is received. Effectively identifying and prioritizing key steps by accurately assigning credit to these intermediate steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a state-of-the-art reinforcement learning algorithm for finetuning LLMs, addresses the credit assignment problem by employing value networks to predict the expected cumulative rewards of intermediate states. In this work, we identify significant limitations with this value estimation method. To address this, we propose \methodname that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates of the intermediate values. VinePPO consistently outperforms standard PPO, doing so more efficiently and with lower divergence from the reference model. Our findings underscore the critical importance of accurate credit assignment in LLM post-training and present a simple, yet effective solution.

2024-10-09

NeurIPS.cc/2024/Workshop/MATH-AI (accepted)