Publications

Did I Faithfully Say What I Thought? Bridging the Gap Between Neural Activity and Self-Explanations in Large Language Models

Milan Bhan

Jean-Noël Vittaut

Nicolas Chesneau

Sarath Chandar

Marie-Jeanne Lesot

Large Language Models (LLM) have demonstrated the capability of generating free text self Natural Language Explanation (self-NLE) to justify… (voir plus) their answers. Despite their logical appearance, self-NLE do not necessarily reflect the LLM actual decision-making process, making such explanations unfaithful. While existing methods for measuring self-NLE faithfulness mostly rely on behavioral tests or computational block identification, none of them examines the neural activity underlying the model's reasoning. This work introduces a novel flexible framework for quantitatively measuring the faithfulness of LLM-generated self-NLE by directly comparing the latter with interpretations of the model's internal hidden states. The proposed framework is versatile and provides deep insights into self-NLE faithfulness by establishing a direct connection between self-NLE and model reasoning. This approach advances the understanding of self-NLE faithfulness and provides building blocks for generating more faithful self-NLE.

2025-06-10

ArXiv (prépublication)

arxiv.org

Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models

Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering … (voir plus)methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, resulting in inefficient use of compute. Inspired by the success of Monte Carlo Tree Search, we address these limitations by casting inference-time alignment as a search problem that reuses past computations. We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain and iteratively refining value estimates with each additional generation. Our proposed method, Diffusion Tree Sampling (DTS), produces asymptotically exact samples from the target distribution in the limit of infinite rollouts, and its greedy variant, Diffusion Tree Search (DTS

2025-06-10

ICML.cc/2025/Workshop/PUT (poster)

doi.org

openreview.net

Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models

Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering … (voir plus)methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, leading to inefficient use of compute. Inspired by the success of Monte Carlo Tree Search, we address these limitations by casting inference-time alignment as a search problem that reuses past computations. We introduce a tree-based approach that _samples_ from the reward-aligned target density by propagating terminal rewards back through the diffusion chain and iteratively refining value estimates with each additional generation. Our proposed method, Diffusion Tree Sampling (DTS), produces asymptotically exact samples from the target distribution in the limit of infinite rollouts, and its greedy variant Diffusion Tree Search (DTS*) performs a robust search for high reward samples. On MNIST and CIFAR-10 class-conditional generation, DTS matches the FID of the best-performing baseline with up to

2025-06-10

ICML.cc/2025/Workshop/PUT (poster)

openreview.net

Emergent brain-like representations in a goal-directed neural network model of visual search

Motahareh Pourrahimi

Pouya Bashivan

2025-06-10

bioRxiv (prépublication)

doi.org

Geometry-Aware Preference Learning for 3D Texture Generation

AmirHossein Zamani

Tianhao Xie

Amir Aghdam

Tiberiu Popa

Eugene Belilovsky

Recent advances in 3D generative models have achieved impressive results but 3D contents generated by these models may not align with subjec… (voir plus)tive human preferences or task-specific criteria. Moreover, a core challenge in the 3D texture generation domain remains: most existing approaches rely on repeated calls to 2D text-to-image generative models, which lack an inherent understanding of the 3D structure of the input 3D mesh object. To address this, we propose an end-to-end differentiable preference learning framework that back-propagates human preferences, represented by differentiable reward functions, through the entire 3D generative pipeline, making the process inherently geometry-aware. We demonstrate the effectiveness of our framework using four proposed novel geometry-aware reward functions, offering a more controllable and interpretable pathway for high-quality 3D content creation from natural language.

2025-06-10

ICML.cc/2025/Workshop/MoFA (poster)

openreview.net

Language models’ activations linearly encode training-order recency

Dmitrii Krasheninnikov

Richard E. Turner

David Scott Krueger

2025-06-10

ICML.cc/2025/Workshop/MemFM (publié)

openreview.net

Low-Rank Adaptation Secretly Imitates Differentially Private SGD

Saber Malekmohammadi

Golnoosh Farnadi

As pre-trained language models grow in size, full fine-tuning their parameters on task adaptation data becomes increasingly impractical. To … (voir plus)address this challenge, some methods for low-rank adaptation of language models have been proposed, e.g. LoRA, which incorporates trainable low-rank decomposition matrices into only some parameters of the pre-trained model, called adapters. This approach significantly reduces the number of trainable parameters compared to fine-tuning all parameters or adapters. In this work, we look at low-rank adaptation method from the lens of data privacy. We show theoretically that the low-rank adaptation used in LoRA is equivalent to fine-tuning adapters with noisy batch gradients - just like what DPSGD algorithm does. We also quantify the variance of the injected noise as a decreasing function of adaptation rank. By establishing a Berry-Esseen type bound on the total variation distance between the injected noise distribution and a Gaussian noise distribution with the same variance, we show that the dynamics of low-rank adaptation is very close to when DPSGD is performed w.r.t the adapters. Following our theoretical findings and approved by our experimental results, we show that low-rank adaptation provides robustness to membership inference attacks w.r.t the fine-tuning data.

2025-06-10

ICML.cc/2025/Workshop/MemFM (publié)

openreview.net

mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks

Luel Hagos Beyene

Vivek Verma

Min Ma

Jesujoba Oluwadara Alabi

Fabian David Schmidt

Joyce Nakatumba-Nabende

David Ifeoluwa Adelani

Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as spe… (voir plus)ech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our findings show that more investment is needed to address their under-representation in LLMs coverage.

2025-06-10

ArXiv (prépublication)

arxiv.org

mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks

Luel Hagos Beyene

Vivek Verma

Min Ma

Jesujoba Oluwadara Alabi

Fabian David Schmidt

Joyce Nakatumba-Nabende

David Ifeoluwa Adelani

Large Language models (LLMs) have demonstrated impressive performance on a wide range of tasks, including in multimodal settings such as spe… (voir plus)ech. However, their evaluation is often limited to English and a few high-resource languages. For low-resource languages, there is no standardized evaluation benchmark. In this paper, we address this gap by introducing mSTEB, a new benchmark to evaluate the performance of LLMs on a wide range of tasks covering language identification, text classification, question answering, and translation tasks on both speech and text modalities. We evaluated the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B. Our evaluation shows a wide gap in performance between high-resource and low-resource languages, especially for languages spoken in Africa and Americas/Oceania. Our findings show that more investment is needed to address their under-representation in LLMs coverage.

2025-06-10

ArXiv (prépublication)

doi.org

arxiv.org

Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs

Behnoush Khavari

Jayesh Khullar

2025-06-10

ICML.cc/2025/Workshop/MOSS (publié)

openreview.net

Preservice Teachers’ Computational Thinking Profiles

Tanya Chichekian

Maria Cutumisu

Annie Savard

2025-06-10

Proceedings (International Conference of the Learning Sciences) (publié)

doi.org

Robust Reward Modeling via Causal Rubrics

Pragya Srivastava

Harman Singh

Rahul Madhavan

Gandharv Patil

Sravanti Addepalli

Arun Suggala

Rengarajan Aravamudhan

Soumya Sharma

Anirban Laha

Aravindan Raghuveer

Karthikeyan Shanmugam

Doina Precup

Reward models (RMs) for LLM alignment often exhibit reward hacking, mistaking spurious correlates (e.g., length, format) for causal quality … (voir plus)drivers (e.g., factuality, relevance), leading to brittle RMs. We introduce CROME (Causally Robust Reward Modeling), a causally-grounded framework using targeted augmentations to mitigate this. CROME employs: (1) Causal Augmentations, pairs isolating specific causal attribute changes, to enforce sensitivity, and (2) Neutral Augmentations, tie-labeled pairs varying spurious attributes while preserving causal content, to enforce invariance. Crucially, augmentations target LLM-identified causal rubrics, requiring no prior knowledge of spurious factors. CROME significantly outperforms baselines on RewardBench (Avg +5.4\%, Safety +13.2\%, Reasoning +7.2\%) and demonstrates enhanced robustness via improved Best-of-N performance across RewardBench, WildGuardTest, and GSM8k.

2025-06-10

ICML.cc/2025/Workshop/MoFA (poster)

doi.org

openreview.net

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications