Publications

NeuroFaith: Evaluating LLM Self-Explanation Faithfulness via Internal Representation Alignment

Jean-Noël Vittaut

Nicolas Chesneau

A. Chandar

Marie-Jeanne Lesot

Large Language Models (LLMs) can generate plausible free text self-explanations to justify their answers. However, these natural language ex… (see more)planations may not accurately reflect the model's actual reasoning process, pinpointing a lack of faithfulness. Existing faithfulness evaluation methods rely primarily on behavioral tests or computational block analysis without examining the semantic content of internal neural representations. This paper proposes NeuroFaith, a flexible framework that measures the faithfulness of LLM free text self-explanation by identifying key concepts within explanations and mechanistically testing whether these concepts actually influence the model's predictions. We show the versatility of NeuroFaith across 2-hop reasoning and classification tasks. Additionally, we develop a linear faithfulness probe based on NeuroFaith to detect unfaithful self-explanations from representation space and improve faithfulness through steering. NeuroFaith provides a principled approach to evaluating and enhancing the faithfulness of LLM free text self-explanations, addressing critical needs for trustworthy AI systems.

2025-06-09

ArXiv (preprint)

arxiv.org

Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs

Jayesh Khullar

Franccois Rivest

A. Chandar

2025-06-09

ICML.cc/2025/Workshop/MOSS (published)

doi.org

openreview.net

Preservice Teachers’ Computational Thinking Profiles

Tanya Chichekian

Maria Cutumisu

Annie Savard

2025-06-09

Proceedings (International Conference of the Learning Sciences) (published)

doi.org

Robust Reward Modeling via Causal Rubrics

Pragya Srivastava

Harman Singh

Rahul Madhavan

Gandharv Patil

Sravanti Addepalli

Arun Suggala

Rengarajan Aravamudhan

Soumya Sharma

Anirban Laha

Aravindan Raghuveer

Karthikeyan Shanmugam

Doina Precup

Reward models (RMs) for LLM alignment often exhibit reward hacking, mistaking spurious correlates (e.g., length, format) for causal quality … (see more)drivers (e.g., factuality, relevance), leading to brittle RMs. We introduce CROME (Causally Robust Reward Modeling), a causally-grounded framework using targeted augmentations to mitigate this. CROME employs: (1) Causal Augmentations, pairs isolating specific causal attribute changes, to enforce sensitivity, and (2) Neutral Augmentations, tie-labeled pairs varying spurious attributes while preserving causal content, to enforce invariance. Crucially, augmentations target LLM-identified causal rubrics, requiring no prior knowledge of spurious factors. CROME significantly outperforms baselines on RewardBench (Avg +5.4\%, Safety +13.2\%, Reasoning +7.2\%) and demonstrates enhanced robustness via improved Best-of-N performance across RewardBench, WildGuardTest, and GSM8k.

2025-06-09

ICML.cc/2025/Workshop/MoFA (poster)

doi.org

openreview.net

Spectral State Space Model for Rotation-Invariant Visual Representation Learning

Sahar Dastani

Ali Bahri

Moslem Yazdanpanah

Mehrdad Noori

David Osowiechi

Gustavo Adolfo Vargas Hakim

Farzad Beizaee

Milad Cheraghalikhani

Arnab Kumar Mondal

Hervé Lombaert

Christian Desrosiers

State Space Models (SSMs) have recently emerged as an alternative to Vision Transformers (ViTs) due to their unique ability of modeling glob… (see more)al relationships with linear complexity. SSMs are specifically designed to capture spatially proximate relationships of image patches. However, they fail to identify relationships between conceptually related yet not adjacent patches. This limitation arises from the non-causal nature of image data, which lacks inherent directional relationships. Additionally, current vision-based SSMs are highly sensitive to transformations such as rotation. Their predefined scanning directions depend on the original image orientation, which can cause the model to produce inconsistent patch-processing sequences after rotation. To address these limitations, we introduce Spectral VMamba, a novel approach that effectively captures the global structure within an image by leveraging spectral information derived from the graph Laplacian of image patches. Through spectral decomposition, our approach encodes patch relationships independently of image orientation, achieving rotation invariance with the aid of our Rotational Feature Normalizer (RFN) module. Our experiments on classification tasks show that Spectral VMamba outperforms the leading SSM models in vision, such as VMamba, while maintaining invariance to rotations and a providing a similar runtime efficiency.

2025-06-09

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

doi.org

arxiv.org

A Systematic Literature Review of Large Language Model Applications in the Algebra Domain

Yajie Song

Yimei Zhang

Doina Precup

Reihaneh Rabbany

Maria Cutumisu

2025-06-09

Proceedings (International Conference of the Learning Sciences) (published)

doi.org

Test Time Adaptation Using Adaptive Quantile Recalibration

2025-06-09

ICML.cc/2025/Workshop/PUT (poster)

doi.org

openreview.net

What Matters when Modeling Human Behavior using Imitation Learning?

As AI systems become increasingly embedded in human decision-making process, aligning their behavior with human values is critical to ensuri… (see more)ng safe and trustworthy deployment. A central approach to AI Alignment called Imitation Learning (IL), trains a learner to directly mimic desirable human behaviors from expert demonstrations. However, standard IL methods assume that (1) experts act to optimize expected returns; (2) expert policies are Markovian. Both assumptions are inconsistent with empirical findings from behavioral economics, according to which humans are (1) risk-sensitive; and (2) make decisions based on past experience. In this work, we examine the implications of risk sensitivity for IL and show that standard approaches do not capture all optimal policies under risk-sensitive decision criteria. By characterizing these expert policies, we identify key limitations of existing IL algorithms in replicating expert performance in risk-sensitive settings. Our findings underscore the need for new IL frameworks that account for both risk-aware preferences and temporal dependencies to faithfully align AI behavior with human experts.

2025-06-09

ICML.cc/2025/Workshop/MoFA (poster)

openreview.net

AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models

Jacob Chmura

Shahrad Mohammadzadeh

Taz Scott-Talib

Nishanth Anand

2025-06-08

CODEML @ International Conference on Machine Learning (published)

openreview.net

Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Suyuchen Wang

Jinlin Wang

Xinyu Wang

Shiqi Li

Xiangru Tang

Sirui Hong

Xiao-Wen Chang

Chenglin Wu

Bang Liu

Large language models (LLMs) often struggle with context fidelity, producing inconsistent answers when responding to questions based on prov… (see more)ided information. Existing approaches either rely on expensive supervised fine-tuning to generate evidence post-answer or train models to perform web searches without necessarily improving utilization of the given context. We propose CARE, a novel native retrieval-augmented reasoning framework that teaches LLMs to explicitly integrate in-context evidence within their reasoning process with the model's own retrieval capabilities. Our method requires minimal labeled evidence data while significantly enhancing both retrieval accuracy and answer generation performance through strategically retrieved in-context tokens in the reasoning chain. Extensive experiments on multiple real-world and counterfactual QA benchmarks demonstrate that our approach substantially outperforms supervised fine-tuning, traditional retrieval-augmented generation methods, and external retrieval solutions. This work represents a fundamental advancement in making LLMs more accurate, reliable, and efficient for knowledge-intensive tasks.

2025-06-08

ICML.cc/2025/Workshop/LCFM (accepted)

openreview.net

A Meta-Learning Approach to Causal Inference

Dragos Cristian Manta

Philippe Brouillard

Dhanya Sridhar

Predicting the effect of unseen interventions is at the heart of many scientific endeavours. While causal discovery is often used to answer … (see more)these causal questions, it involves learning a full causal model, not tailored to the specific goal of predicting unseen interventions, and operates under stringent assumptions. We introduce a novel method based on meta-learning that predicts interventional effects without explicitly assuming a causal model. Our preliminary results on synthetic data show that it can provide good generalization to unseen interventions, and it even compares favorably to a causal discovery method. Our model-agnostic method opens up many avenues for future exploration, particularly for settings where causal discovery cannot be applied.

2025-06-08

ICML.cc/2025/Workshop/SIM (poster)

openreview.net

PyLO: Towards Accessible Learned Optimizers in PyTorch

Paul Janson

Benjamin Therien

Quentin Gregory Anthony

Xiaolong Huang

Abhinav Moudgil

Eugene Belilovsky

Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optim… (see more)izers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances -- such as VeLO, which was meta-trained for 4000 TPU-months -- remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for applying the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the broader machine learning community through familiar, widely adopted workflows. Unlike prior work focused on synthetic or convex tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our release includes a CUDA-accelerated version of the small_fc_lopt learned optimizer architecture from (Metz et al., 2022a), delivering substantial speedups -- from 39.36 to 205.59 samples/sec throughput for training ViT B/16 with batch size 32. PyLO also allows us to easily combine learned optimizers with existing optimization tools such as learning rate schedules and weight decay. When doing so, we find that learned optimizers can substantially benefit. Our code is available at https://github.com/Belilovsky-Lab/pylo

2025-06-08

ICML.cc/2025/Workshop/CODEML (published)

doi.org

openreview.net

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications