Publications

LLMs and Personalities: Inconsistencies Across Scales

Tosato Tommaso

Mahmood Hegazy

This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and m… (see more)alleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." Larger models demonstrated more consistent responses, while persona prompts significantly influenced trait scores. Notably, the assistant persona led to more predictable scaling, with larger models exhibiting more socially desirable and less variable traits. In contrast, non-conventional personas displayed unpredictable behaviors, sometimes extending personality trait scores beyond the typical human range. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.

2024-10-09

NeurIPS.cc/2024/Workshop/Behavioral_ML (oral)

openreview.net

LLMs and Personalities: Inconsistencies Across Scales

Tosato Tommaso

Mahmood Hegazy

This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and m… (see more)alleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." Larger models demonstrated more consistent responses, while persona prompts significantly influenced trait scores. Notably, the assistant persona led to more predictable scaling, with larger models exhibiting more socially desirable and less variable traits. In contrast, non-conventional personas displayed unpredictable behaviors, sometimes extending personality trait scores beyond the typical human range. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.

2024-10-09

NeurIPS.cc/2024/Workshop/Behavioral_ML (oral)

openreview.net

Mitigating Downstream Model Risks via Model Provenance

Keyu Wang

Abdullah Norozi Iranzad

Scott Schaffter

Doina Precup

Jonathan Lebensold

Meg Risdal

Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these … (see more)models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing model genealogy, enabling machine readability, offering reliable centralized management systems, and fostering consistent creation incentives. This challenge mirrors issues in software supply chain security, but AI/ML remains at an earlier stage of maturity. Addressing these gaps requires industry-standard tooling that can be adopted by foundation model publishers, open-source model innovators, and major distribution platforms. We propose a machine-readable model specification format to simplify the creation of model records, thereby reducing error-prone human effort, notably when a new model inherits most of its design from a foundation model. Our solution explicitly traces relationships between upstream and downstream models, enhancing transparency and traceability across the model lifecycle. To facilitate the adoption, we introduce the unified model record (UMR) repository , a semantically versioned system that automates the publication of model records to multiple formats (PDF, HTML, LaTeX) and provides a hosted web interface (https://modelrecord.com/). This proof of concept aims to set a new standard for managing foundation models, bridging the gap between innovation and responsible model management.

2024-10-09

NeurIPS.cc/2024/Workshop/SoLaR (poster)

doi.org

openreview.net

Not All LLM Reasoners Are Created Equal

Daniel Toyama

2024-10-09

NeurIPS.cc/2024/Workshop/Sys2-Reasoning (poster)

doi.org

openreview.net

Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

David Scott Krueger

Fazl Barez

2024-10-09

ArXiv (preprint)

arxiv.org

Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

David Scott Krueger

Fazl Barez

The Universality Hypothesis in large language models (LLMs) claims that different models converge towards similar concept representations in… (see more) their latent spaces. Providing evidence for this hypothesis would enable researchers to exploit universal properties, facilitating the generalization of mechanistic interpretability techniques across models. Previous works studied if LLMs learned the same features, which are internal representations that activate on specific concepts. Since comparing features across LLMs is challenging due to polysemanticity, in which LLM neurons often correspond to multiple unrelated features rather than to distinct concepts, sparse autoencoders (SAEs) have been employed to disentangle LLM neurons into SAE features corresponding to distinct concepts. In this paper, we introduce a new variation of the universality hypothesis called Analogous Feature Universality: we hypothesize that even if SAEs across different models learn different feature representations, the spaces spanned by SAE features are similar, such that one SAE space is similar to another SAE space under rotation-invariant transformations. Evidence for this hypothesis would imply that interpretability techniques related to latent spaces, such as steering vectors, may be transferred across models via certain transformations. To investigate this hypothesis, we first pair SAE features across different models via activation correlation, and then measure spatial relation similarities between paired features via representational similarity measures, which transform spaces into representations that reveal hidden relational similarities. Our experiments demonstrate high similarities for SAE feature spaces across various LLMs, providing evidence for feature space universality.

2024-10-09

ArXiv (preprint)

arxiv.org

Rejecting Hallucinated State Targets during Planning

Mingde Zhao

Tristan Sylvain

Romain Laroche

Doina Precup

Yoshua Bengio

2024-10-09

ArXiv (preprint)

arxiv.org

Retrieval-Augmented Decision Transformer: External Memory for In-context RL

Thomas Schmied

Fabian Paischer

Vihang P. Patil

Markus Hofmarcher

Razvan Pascanu

Sepp Hochreiter

In-context learning (ICL) is the ability of a model to learn a new task by observing a few exemplars in its context. While prevalent in NLP,… (see more) this capability has recently also been observed in Reinforcement Learning (RL) settings. Prior in-context RL methods, however, require entire episodes in the agent's context. Given that complex environments typically lead to long episodes with sparse rewards, these methods are constrained to simple environments with short episodes. To address these challenges, we introduce Retrieval-Augmented Decision Transformer (RA-DT). RA-DT employs an external memory mechanism to store past experiences from which it retrieves only sub-trajectories relevant for the current situation. The retrieval component in RA-DT does not require training and can be entirely domain-agnostic. We evaluate the capabilities of RA-DT on grid-world environments, robotics simulations, and procedurally-generated video games. On grid-worlds, RA-DT outperforms baselines, while using only a fraction of their context length. Furthermore, we illuminate the limitations of current in-context RL methods on complex environments and discuss future directions. To facilitate future research, we release datasets for four of the considered environments.

2024-10-09

ArXiv (preprint)

doi.org

arxiv.org

Sample Compression Hypernetworks: From Generalization Bounds to Meta-Learning

Benjamin Leblanc

Mathieu Bazinet

Nathaniel D'Amours

Alexandre Drouin

Pascal Germain

Reconstruction functions are pivotal in sample compression theory, a framework for deriving tight generalization bounds. From a small sample… (see more) of the training set (the compression set) and an optional stream of information (the message), they recover a predictor previously learned from the whole training set. While usually fixed, we propose to learn reconstruction functions. To facilitate the optimization and increase the expressiveness of the message, we derive a new sample compression generalization bound for real-valued messages. From this theoretical analysis, we then present a new hypernetwork architecture that outputs predictors with tight generalization guarantees when trained using an original meta-learning framework. The results of promising preliminary experiments are then reported.

2024-10-09

NeurIPS.cc/2024/Workshop/Compression (published)

openreview.net

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

David Scott Krueger

Fazl Barez

We investigate feature universality in large language models (LLMs), a research field that aims to understand how different models similarly… (see more) represent concepts in the latent spaces of their intermediate layers. Demonstrating feature universality allows discoveries about latent representations to generalize across several models. However, comparing features across LLMs is challenging due to polysemanticity, in which individual neurons often correspond to multiple features rather than distinct ones. This makes it difficult to disentangle and match features across different models. To address this issue, we employ a method known as dictionary learning by using sparse autoencoders (SAEs) to transform LLM activations into more interpretable spaces spanned by neurons corresponding to individual features. After matching feature neurons across models via activation correlation, we apply representational space similarity metrics like Singular Value Canonical Correlation Analysis to analyze these SAE features across different LLMs. Our experiments reveal significant similarities in SAE feature spaces across various LLMs, providing new evidence for feature universality.

2024-10-09

ArXiv (preprint)

doi.org

arxiv.org

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

David Scott Krueger

Fazl Barez

2024-10-09

ArXiv (preprint)

doi.org

openreview.net

Spiral volumetric optoacoustic tomography of reduced oxygen saturation in the spinal cord of M83 mouse model of Parkinson's disease.

Benjamin F. Combes

Sandeep Kumar Kalva

Pierre-Louis Benveniste

Agathe Tournant

Man Hoi Law

Joshua Newton

Maik Krüger

Rebecca Z. Weber

Inês Dias

Daniela Noain

Xose Luis Dean-Ben

Uwe Konietzko

Christian R. Baumann

Per-Göran Gillberg

Christoph Hock

Roger M. Nitsch

Julien Cohen-Adad

Daniel Razansky

Ruiqing Ni

2024-10-09

European Journal of Nuclear Medicine and Molecular Imaging (published)

doi.org

Opening Conference | Building Safer AI for Youth Mental Health

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

Indigenous Pathfinders in AI

Publications

Opening Conference | Building Safer AI for Youth Mental Health

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

Indigenous Pathfinders in AI

Popular keywords:

Publications