LLMs and Personalities: Inconsistencies Across Scales
Tosato Tommaso
Mahmood Hegazy
David Lemay
Mohammed Abukalam
This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and m… (see more)alleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." Larger models demonstrated more consistent responses, while persona prompts significantly influenced trait scores. Notably, the assistant persona led to more predictable scaling, with larger models exhibiting more socially desirable and less variable traits. In contrast, non-conventional personas displayed unpredictable behaviors, sometimes extending personality trait scores beyond the typical human range. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.
LLMs and Personalities: Inconsistencies Across Scales
Tosato Tommaso
Mahmood Hegazy
David Lemay
Mohammed Abukalam
This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and m… (see more)alleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." Larger models demonstrated more consistent responses, while persona prompts significantly influenced trait scores. Notably, the assistant persona led to more predictable scaling, with larger models exhibiting more socially desirable and less variable traits. In contrast, non-conventional personas displayed unpredictable behaviors, sometimes extending personality trait scores beyond the typical human range. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.
Mitigating Downstream Model Risks via Model Provenance
Keyu Wang
Abdullah Norozi Iranzad
Scott Schaffter
Jonathan Lebensold
Meg Risdal
Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these … (see more)models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing model genealogy, enabling machine readability, offering reliable centralized management systems, and fostering consistent creation incentives. This challenge mirrors issues in software supply chain security, but AI/ML remains at an earlier stage of maturity. Addressing these gaps requires industry-standard tooling that can be adopted by foundation model publishers, open-source model innovators, and major distribution platforms. We propose a machine-readable model specification format to simplify the creation of model records, thereby reducing error-prone human effort, notably when a new model inherits most of its design from a foundation model. Our solution explicitly traces relationships between upstream and downstream models, enhancing transparency and traceability across the model lifecycle. To facilitate the adoption, we introduce the unified model record (UMR) repository , a semantically versioned system that automates the publication of model records to multiple formats (PDF, HTML, LaTeX) and provides a hosted web interface (https://modelrecord.com/). This proof of concept aims to set a new standard for managing foundation models, bridging the gap between innovation and responsible model management.
Not All LLM Reasoners Are Created Equal
Arian Hosseini
Daniel Toyama
Rejecting Hallucinated State Targets during Planning
Mingde Zhao
Tristan Sylvain
Romain Laroche
Retrieval-Augmented Decision Transformer: External Memory for In-context RL
Thomas Schmied
Fabian Paischer
Vihang P. Patil
Markus Hofmarcher
Sepp Hochreiter
In-context learning (ICL) is the ability of a model to learn a new task by observing a few exemplars in its context. While prevalent in NLP,… (see more) this capability has recently also been observed in Reinforcement Learning (RL) settings. Prior in-context RL methods, however, require entire episodes in the agent's context. Given that complex environments typically lead to long episodes with sparse rewards, these methods are constrained to simple environments with short episodes. To address these challenges, we introduce Retrieval-Augmented Decision Transformer (RA-DT). RA-DT employs an external memory mechanism to store past experiences from which it retrieves only sub-trajectories relevant for the current situation. The retrieval component in RA-DT does not require training and can be entirely domain-agnostic. We evaluate the capabilities of RA-DT on grid-world environments, robotics simulations, and procedurally-generated video games. On grid-worlds, RA-DT outperforms baselines, while using only a fraction of their context length. Furthermore, we illuminate the limitations of current in-context RL methods on complex environments and discuss future directions. To facilitate future research, we release datasets for four of the considered environments.
Sample Compression Hypernetworks: From Generalization Bounds to Meta-Learning
Benjamin Leblanc
Mathieu Bazinet
Nathaniel D'Amours
Reconstruction functions are pivotal in sample compression theory, a framework for deriving tight generalization bounds. From a small sample… (see more) of the training set (the compression set) and an optional stream of information (the message), they recover a predictor previously learned from the whole training set. While usually fixed, we propose to learn reconstruction functions. To facilitate the optimization and increase the expressiveness of the message, we derive a new sample compression generalization bound for real-valued messages. From this theoretical analysis, we then present a new hypernetwork architecture that outputs predictors with tight generalization guarantees when trained using an original meta-learning framework. The results of promising preliminary experiments are then reported.
Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Michael Lan
Philip Torr
Austin Meek
Ashkan Khakzar
Fazl Barez
We investigate feature universality in large language models (LLMs), a research field that aims to understand how different models similarly… (see more) represent concepts in the latent spaces of their intermediate layers. Demonstrating feature universality allows discoveries about latent representations to generalize across several models. However, comparing features across LLMs is challenging due to polysemanticity, in which individual neurons often correspond to multiple features rather than distinct ones. This makes it difficult to disentangle and match features across different models. To address this issue, we employ a method known as dictionary learning by using sparse autoencoders (SAEs) to transform LLM activations into more interpretable spaces spanned by neurons corresponding to individual features. After matching feature neurons across models via activation correlation, we apply representational space similarity metrics like Singular Value Canonical Correlation Analysis to analyze these SAE features across different LLMs. Our experiments reveal significant similarities in SAE feature spaces across various LLMs, providing new evidence for feature universality.
Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models
Michael Lan
Philip Torr
Austin Meek
Ashkan Khakzar
Fazl Barez
Spiral volumetric optoacoustic tomography of reduced oxygen saturation in the spinal cord of M83 mouse model of Parkinson's disease.
Benjamin F. Combes
Sandeep Kumar Kalva
Pierre-Louis Benveniste
Agathe Tournant
Man Hoi Law
Joshua Newton
Maik Krüger
Rebecca Z. Weber
Inês Dias
Daniela Noain
Xose Luis Dean-Ben
Uwe Konietzko
Christian R. Baumann
Per-Göran Gillberg
Christoph Hock
Roger M. Nitsch
Daniel Razansky
Ruiqing Ni
Spiral volumetric optoacoustic tomography of reduced oxygen saturation in the spinal cord of M83 mouse model of Parkinson’s disease
Benjamin F. Combes
Sandeep Kumar Kalva
Pierre-Louis Benveniste
Agathe Tournant
Man Hoi Law
Joshua Newton
Maik Krüger
Rebecca Z Weber
Inês Dias
Daniela Noain
Xose Luis Dean-Ben
Uwe Konietzko
Christian R. Baumann
Per-Göran Gillberg
Christoph Hock
Roger M. Nitsch
Daniel Razansky
Ruiqing Ni
Steering Clear: A Systematic Study of Activation Steering in a Toy Setup
Dmitrii Krasheninnikov
Activation steering is a promising family of methods for controlling LLM outputs via targeted interventions on model activations. We introdu… (see more)ce a toy multi-label classification setup to systematically study activation steering methods, and experiment with several types of steering adapters — from steering vectors (adding a fixed vector to activations) to more expressive adapters involving projections. We evaluate the adapters across steering tasks of different complexities, for three notions of complexity: 1) how densely the features are packed in the representation space (roughly, number of features divided by the dimensionality of the activations), 2) number of attributes steered, and 3) number of values the steered attribute can take. We find that as task complexity is increased, steering vector methods perform worse, while the more expressive methods only take a performance hit when there is not enough data. On the other hand, steering vectors usually outperform more expressive methods in the low-data regime, regardless of task complexity. We conclude by discussing this work's limitations, which include our toy setup not modeling features represented in superposition or continuous features, and the lack of experiments with LLMs.