Pascal Vincent

Compositional Risk Minimization

Charles Arnal

Compositional generalization is a crucial step towards developing data-efficient intelligent machines that generalize in human-like ways. In… (voir plus) this work, we tackle a challenging form of distribution shift, termed compositional shift, where some attribute combinations are completely absent at training but present in the test distribution. This shift tests the model’s ability to generalize compositionally to novel attribute combinations in discriminative tasks. We model the data with flexible additive energy distributions, where each energy term represents an attribute, and derive a simple alternative to empirical risk minimization termed compositional risk minimization (CRM). We first train an additive energy classifier to predict the multiple attributes and then adjust this classifier to tackle compositional shifts. We provide an extensive theoretical analysis of CRM, where we show that our proposal extrapolates to special affine hulls of seen attribute combinations. Empirical evaluations on benchmark datasets confirms the improved robustness of CRM compared to other methods from the literature designed to tackle various forms of subpopulation shifts.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (publié)

proceedings.mlr.press

Compositional Risk Minimization

Charles Arnal

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (publié)

doi.org

openreview.net

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat

Ali Rahimi-Kalahroudi

Mohammad Pezeshki

Sarath Chandar

Pascal Vincent

2025-07-07

colmweb.org/COLM/2025/Conference (accepté)

doi.org

openreview.net

Quantized Disentanglement: A Practical Approach

Vitória Barin-Pacela

Kartik Ahuja

Simon Lacoste-Julien

Pascal Vincent

2025-06-09

ICML.cc/2025/Workshop/SIM (poster)

openreview.net

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat

Ali Rahimi-Kalahroudi

Mohammad Pezeshki

Sarath Chandar

Pascal Vincent

A key challenge in AI alignment is guiding large language models (LLMs) to follow desired behaviors at test time. Activation steering, which… (voir plus) modifies internal model activations during inference, offers a potential solution. However, prior work in dense activation spaces struggles with superposition, wherein multiple features become entangled, limiting interpretability and precise control. In contrast, sparse representations provide an untapped opportunity for more interpretable behavior modulation. In this work, we introduce sparse activation steering (SAS), a method that leverages sparse autoencoders (SAEs) to steer LLM behavior in sparse spaces. By isolating behavior-specific features through a contrastive prompt-pairing approach, we define a set of features that can selectively reinforce or suppress behaviors. Experiments on Gemma 2 LLMs show that SAS vectors enable nuanced behavioral modulation and finer-grained control. Furthermore, scaling SAEs improves monosemanticity of SAS vectors, suggesting more reliable and interpretable interventions.

2025-02-28

ArXiv (prépublication)

doi.org

arxiv.org

Steering Large Language Model Activations in Sparse Spaces

Reza Bayat

Ali Rahimi-Kalahroudi

Mohammad Pezeshki

Sarath Chandar

Pascal Vincent

A key challenge in AI alignment is guiding large language models (LLMs) to follow desired behaviors at test time. Activation steering, which… (voir plus) modifies internal model activations during inference, offers a potential solution. However, prior work in dense activation spaces struggles with superposition, wherein multiple features become entangled, limiting interpretability and precise control. In contrast, sparse representations provide an untapped opportunity for more interpretable behavior modulation. In this work, we introduce sparse activation steering (SAS), a method that leverages sparse autoencoders (SAEs) to steer LLM behavior in sparse spaces. By isolating behavior-specific features through a contrastive prompt-pairing approach, we define a set of features that can selectively reinforce or suppress behaviors. Experiments on Gemma 2 LLMs show that SAS vectors enable nuanced behavioral modulation and finer-grained control. Furthermore, scaling SAEs improves monosemanticity of SAS vectors, suggesting more reliable and interpretable interventions.

2025-02-28

ArXiv (prépublication)

arxiv.org

MaestroMotif: Skill Design from Artificial Intelligence Feedback

Martin Klissarov

Mikael Henaff

Roberta Raileanu

Marlos C. Machado

Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an… (voir plus) AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.

2025-01-22

ICLR.cc/2025/Conference (présentation orale)

doi.org

openreview.net

The Pitfalls of Memorization: When Memorization Hurts Generalization

David Lopez-Paz

Neural networks often learn simple explanations that fit the majority of the data while memorizing exceptions that deviate from these explan… (voir plus)ations.This behavior leads to poor generalization when the learned explanations rely on spurious correlations. In this work, we formalize the interplay between memorization and generalization, showing that spurious correlations would particularly lead to poor generalization when are combined with memorization. Memorization can reduce training loss to zero, leaving no incentive to learn robust, generalizable patterns. To address this, we propose memorization-aware training (MAT), which uses held-out predictions as a signal of memorization to shift a model's logits. MAT encourages learning robust patterns invariant across distributions, improving generalization under distribution shifts.

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

MaestroMotif: Skill Design from Artificial Intelligence Feedback

Martin Klissarov

Mikael Henaff

Roberta Raileanu

Marlos C. Machado

Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an… (voir plus) AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.

2024-12-11

ArXiv (prépublication)

doi.org

arxiv.org

MaestroMotif: Skill Design from Artificial Intelligence Feedback

Martin Klissarov

Mikael Henaff

Roberta Raileanu

Marlos C. Machado

Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an… (voir plus) AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.

2024-12-11

ArXiv (prépublication)

arxiv.org

The Pitfalls of Memorization: When Memorization Hinders Generalization

David Lopez-Paz

Neural networks often learn simple explanations that fit the majority of the data while memorizing exceptions that deviate from these explan… (voir plus)ations. This leads to poor generalization when the learned explanations are spurious. In this work, we formalize

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (poster)

openreview.net

The Pitfalls of Memorization: When Memorization Hinders Generalization

David Lopez-Paz

Neural networks often learn simple explanations that fit the majority of the data while memorizing exceptions that deviate from these explan… (voir plus)ations. This leads to poor generalization when the learned explanations are spurious. In this work, we formalize

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (poster)

openreview.net

Science éclair

À l’avant-garde d’une nouvelle ère

Demandes de supervision

Pascal Vincent

Biographie

Étudiants actuels

Publications

Science éclair

À l’avant-garde d’une nouvelle ère

Demandes de supervision

Mots-clés populaires:

Pascal Vincent

Biographie

Étudiants actuels

Publications