Publications

Library Learning Doesn’t: The Curious Case of the Single-Use “Library”

Ian Berlot-Attwell

Frank Rudzicz

Advances in Large Language Models (LLMs) have spurred a wave of LLM library learning systems for mathematical reasoning. These systems aim … (voir plus)to learn a reusable library of *tools*, such as formal Isabelle lemmas or Python programs that are tailored to a family of tasks. Many of these systems are inspired by the human structuring of knowledge into reusable and extendable concepts, but do current methods actually learn reusable libraries of tools? We study two library learning systems for mathematics which both reported increased accuracy: LEGO-Prover and TroVE. We find that function reuse is extremely infrequent on miniF2F and MATH. Our followup ablation experiments suggest that, rather than reuse, self-correction and self-consistency are the primary drivers of the observed performance gains.

2024-10-09

NeurIPS.cc/2024/Workshop/MATH-AI (accepté)

openreview.net

LLMs and Personalities: Inconsistencies Across Scales

Tosato Tommaso

Mahmood Hegazy

David Lemay

Mohammed Abukalam

Irina Rish

Guillaume Dumas

This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and m… (voir plus)alleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." Larger models demonstrated more consistent responses, while persona prompts significantly influenced trait scores. Notably, the assistant persona led to more predictable scaling, with larger models exhibiting more socially desirable and less variable traits. In contrast, non-conventional personas displayed unpredictable behaviors, sometimes extending personality trait scores beyond the typical human range. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.

2024-10-09

NeurIPS.cc/2024/Workshop/Behavioral_ML (présentation orale)

openreview.net

LLMs and Personalities: Inconsistencies Across Scales

Tosato Tommaso

Mahmood Hegazy

David Lemay

Mohammed Abukalam

Irina Rish

Guillaume Dumas

This study investigates the application of human psychometric assessments to large language models (LLMs) to examine their consistency and m… (voir plus)alleability in exhibiting personality traits. We administered the Big Five Inventory (BFI) and the Eysenck Personality Questionnaire-Revised (EPQ-R) to various LLMs across different model sizes and persona prompts. Our results reveal substantial variability in responses due to question order shuffling, challenging the notion of a stable LLM "personality." Larger models demonstrated more consistent responses, while persona prompts significantly influenced trait scores. Notably, the assistant persona led to more predictable scaling, with larger models exhibiting more socially desirable and less variable traits. In contrast, non-conventional personas displayed unpredictable behaviors, sometimes extending personality trait scores beyond the typical human range. These findings have important implications for understanding LLM behavior under different conditions and reflect on the consequences of scaling.

2024-10-09

NeurIPS.cc/2024/Workshop/Behavioral_ML (présentation orale)

openreview.net

Mitigating Downstream Model Risks via Model Provenance

Keyu Wang

Abdullah Norozi Iranzad

Scott Schaffter

Doina Precup

Jonathan Lebensold

Meg Risdal

Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these … (voir plus)models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing model genealogy, enabling machine readability, offering reliable centralized management systems, and fostering consistent creation incentives. This challenge mirrors issues in software supply chain security, but AI/ML remains at an earlier stage of maturity. Addressing these gaps requires industry-standard tooling that can be adopted by foundation model publishers, open-source model innovators, and major distribution platforms. We propose a machine-readable model specification format to simplify the creation of model records, thereby reducing error-prone human effort, notably when a new model inherits most of its design from a foundation model. Our solution explicitly traces relationships between upstream and downstream models, enhancing transparency and traceability across the model lifecycle. To facilitate the adoption, we introduce the unified model record (UMR) repository , a semantically versioned system that automates the publication of model records to multiple formats (PDF, HTML, LaTeX) and provides a hosted web interface (https://modelrecord.com/). This proof of concept aims to set a new standard for managing foundation models, bridging the gap between innovation and responsible model management.

2024-10-09

NeurIPS.cc/2024/Workshop/SoLaR (poster)

doi.org

openreview.net

Not All LLM Reasoners Are Created Equal

Arian Hosseini

Alessandro Sordoni

Daniel Toyama

Aaron Courville

Rishabh Agarwal

2024-10-09

NeurIPS.cc/2024/Workshop/Sys2-Reasoning (poster)

doi.org

openreview.net

Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

David Scott Krueger

Fazl Barez

The Universality Hypothesis in large language models (LLMs) claims that different models converge towards similar concept representations in… (voir plus) their latent spaces. Providing evidence for this hypothesis would enable researchers to exploit universal properties, facilitating the generalization of mechanistic interpretability techniques across models. Previous works studied if LLMs learned the same features, which are internal representations that activate on specific concepts. Since comparing features across LLMs is challenging due to polysemanticity, in which LLM neurons often correspond to multiple unrelated features rather than to distinct concepts, sparse autoencoders (SAEs) have been employed to disentangle LLM neurons into SAE features corresponding to distinct concepts. In this paper, we introduce a new variation of the universality hypothesis called Analogous Feature Universality: we hypothesize that even if SAEs across different models learn different feature representations, the spaces spanned by SAE features are similar, such that one SAE space is similar to another SAE space under rotation-invariant transformations. Evidence for this hypothesis would imply that interpretability techniques related to latent spaces, such as steering vectors, may be transferred across models via certain transformations. To investigate this hypothesis, we first pair SAE features across different models via activation correlation, and then measure spatial relation similarities between paired features via representational similarity measures, which transform spaces into representations that reveal hidden relational similarities. Our experiments demonstrate high similarities for SAE feature spaces across various LLMs, providing evidence for feature space universality.

2024-10-09

ArXiv (prépublication)

arxiv.org

Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

David Scott Krueger

Fazl Barez

2024-10-09

ArXiv (prépublication)

arxiv.org

Rejecting Hallucinated State Targets during Planning

Mingde Zhao

Tristan Sylvain

Romain Laroche

Doina Precup

Yoshua Bengio

2024-10-09

ArXiv (prépublication)

arxiv.org

Retrieval-Augmented Decision Transformer: External Memory for In-context RL

Thomas Schmied

Fabian Paischer

Vihang P. Patil

Markus Hofmarcher

Razvan Pascanu

Sepp Hochreiter

In-context learning (ICL) is the ability of a model to learn a new task by observing a few exemplars in its context. While prevalent in NLP,… (voir plus) this capability has recently also been observed in Reinforcement Learning (RL) settings. Prior in-context RL methods, however, require entire episodes in the agent's context. Given that complex environments typically lead to long episodes with sparse rewards, these methods are constrained to simple environments with short episodes. To address these challenges, we introduce Retrieval-Augmented Decision Transformer (RA-DT). RA-DT employs an external memory mechanism to store past experiences from which it retrieves only sub-trajectories relevant for the current situation. The retrieval component in RA-DT does not require training and can be entirely domain-agnostic. We evaluate the capabilities of RA-DT on grid-world environments, robotics simulations, and procedurally-generated video games. On grid-worlds, RA-DT outperforms baselines, while using only a fraction of their context length. Furthermore, we illuminate the limitations of current in-context RL methods on complex environments and discuss future directions. To facilitate future research, we release datasets for four of the considered environments.

2024-10-09

ArXiv (prépublication)

doi.org

arxiv.org

Sample Compression Hypernetworks: From Generalization Bounds to Meta-Learning

Benjamin Leblanc

Mathieu Bazinet

Nathaniel D'Amours

Alexandre Drouin

Pascal Germain

Reconstruction functions are pivotal in sample compression theory, a framework for deriving tight generalization bounds. From a small sample… (voir plus) of the training set (the compression set) and an optional stream of information (the message), they recover a predictor previously learned from the whole training set. While usually fixed, we propose to learn reconstruction functions. To facilitate the optimization and increase the expressiveness of the message, we derive a new sample compression generalization bound for real-valued messages. From this theoretical analysis, we then present a new hypernetwork architecture that outputs predictors with tight generalization guarantees when trained using an original meta-learning framework. The results of promising preliminary experiments are then reported.

2024-10-09

NeurIPS.cc/2024/Workshop/Compression (publié)

openreview.net

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

David Scott Krueger

Fazl Barez

2024-10-09

ArXiv (prépublication)

doi.org

openreview.net

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models

Michael Lan

Philip Torr

Austin Meek

Ashkan Khakzar

David Scott Krueger

Fazl Barez

We investigate feature universality in large language models (LLMs), a research field that aims to understand how different models similarly… (voir plus) represent concepts in the latent spaces of their intermediate layers. Demonstrating feature universality allows discoveries about latent representations to generalize across several models. However, comparing features across LLMs is challenging due to polysemanticity, in which individual neurons often correspond to multiple features rather than distinct ones. This makes it difficult to disentangle and match features across different models. To address this issue, we employ a method known as dictionary learning by using sparse autoencoders (SAEs) to transform LLM activations into more interpretable spaces spanned by neurons corresponding to individual features. After matching feature neurons across models via activation correlation, we apply representational space similarity metrics like Singular Value Canonical Correlation Analysis to analyze these SAE features across different LLMs. Our experiments reveal significant similarities in SAE feature spaces across various LLMs, providing new evidence for feature universality.

2024-10-09

ArXiv (prépublication)

doi.org

arxiv.org

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications