Publications

NeuroFaith: Evaluating Mechanistic Faithfulness of LLM Free Text Self-Explanation at the Concept Level

Jean-Noël Vittaut

Nicolas Chesneau

Marie-Jeanne Lesot

Large Language Models (LLMs) can generate plausible free text self-explanations to justify their answers. However, these natural language ex… (see more)planations may not accurately reflect the model's actual reasoning process, indicating a lack of faithfulness. Existing faithfulness evaluation methods rely primarily on behavioral tests or computational block analysis without examining the semantic content of internal neural representations. This paper proposes NeuroFaith, a flexible framework that measures the faithfulness of LLM free text self-explanation by identifying key concepts within explanations and mechanistically testing whether these concepts actually influence the model's predictions. We show the versatility of NeuroFaith across 2-hop reasoning and classification tasks. Additionally, we develop a linear faithfulness probe based on NeuroFaith to detect unfaithful self-explanations from representation space and improve faithfulness through steering. NeuroFaith provides a principled approach to evaluating and enhancing the faithfulness of LLM free text self-explanations, addressing critical needs for trustworthy AI systems.

2026-06-10

ICML.cc/2026/Workshop/Mech_Interp (poster)

openreview.net

Sirop: A Small IR for HLS with Parallel Patterns

L.T. Hildebrand

Christophe Dubach

Designers of custom streaming accelerators traditionally use HDLs (Hardware Description Languages), but this is time-consuming and requires … (see more)advanced hardware expertise. C-based HLS (High-Level Synthesis) offers a higher level of abstraction and faster design time, but still requires some hardware expertise and performance is often left on the table. A promising direction is to use HLS with high-level functional parallel patterns such as map and reduce. Prior works have shown that high performance is achievable this way. However, designing such compiler systems is challenging because the optimizer must handle a large number of language primitives and interactions between them.

2026-06-10

Proceedings of the 27th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (published)

doi.org

The strength of flow refueling location problem formulations and an extension to cyclic routing

Nagisa Sugishita

Margarida Carvalho

Ribal Atallah

2026-06-10

4OR (published)

doi.org

When Does Interleaving Prevent Emergent Misalignment?

Chen Sun

Large language models finetuned on narrow harmful tasks are prone to emergent misalignment (EM), where harmful behavior generalizes beyond t… (see more)he training distribution. Interleaving benign data during finetuning has been proposed as a mitigation, but recent work disagrees on whether it prevents EM. In this paper, we investigate this disagreement on Qwen-2.5 7B and 32B, and find that no single property of the interleaved data, taken in isolation, accounts for the gap. Instead, much of it traces to the evaluation itself, as the standard EM benchmark is sensitive to the length of the prompts it uses, and lengthening the evaluation prompts substantially shifts measured misalignment across model sizes. We then identify a region in the model's activations that predicts whether a given interleaved set will prevent EM, and show that reformulating benign data to fall within it substantially reduces EM on both 7B and 32B. This suggests that the standard EM benchmark, which relies on short prompts, may misrepresent the effectiveness of proposed mitigations.

2026-06-10

ICML.cc/2026/Workshop/Mech_Interp (poster)

openreview.net

AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages

Happy Buzaaba

Cheikh Mouhamadou Bamba Dione

David Ifeoluwa Adelani

Sylvain Kahane

Kim Gerdes

Bruno Guillaume

Kevin Guan

Aremu Anuoluwapo

Naome A. Etori

Shamsuddeen Hassan Muhammad

Utitofon Inyang

Peter Nabende

David Sabiiti Bamutura

Andiswa Bukula

Chinedu Uchechukwu

Rooweither Mabuya

Idris Akinade

Christiane Fellbaum

Despite their linguistic diversity and global significance, African languages remain underrepresented in research and resources to support N… (see more)LP. We aim to bridge this gap by introducing AfriSUD, the first large-scale collection of syntactically annotated treebanks for nine diverse African languages spanning major language families and regions across Sub-Saharan Africa. Using the Surface-Syntactic Universal Dependencies (SUD) framework, our community-led effort provides high-quality, native-speaker verified data that capture typological key features such as agglutination and tone. We evaluate a range of models on AfriSUD for part-of-speech tagging and dependency parsing including non-transformer baselines, multilingual pretrained encoders, and LLMs. Our results reveal a significant syntax gap, where models still show clear limitations across the nine languages, suggesting that existing architectures may not fully capture the structural diversity of African-language syntax.

2026-06-09

arXiv (preprint)

doi.org

arxiv.org

Dynamic Neural Graph Encoding of Inference Processes in Deep Weight Space

Di Wu

Huan Liu

Zhixiang Chi

Yuanhao Yu

Konstantinos N. Plataniotis

Yang Wang

The rapid advancements in using neural networks as implicit data representations have attracted significant interest in developing machine l… (see more)earning methods that analyze and process the weight spaces of other neural networks. However, efficiently handling these high-dimensional weight spaces remains challenging. Existing methods often overlook the sequential nature of layer-by-layer processing in neural network inference. In this work, we propose a novel approach using dynamic graphs to represent neural network parameters, capturing the temporal dynamics of inference. Our Dynamic Neural Graph Encoder (DNG-Encoder) processes these graphs, preserving the sequential nature of neural processing. Additionally, we also leverage DNG-Encoder to develop INR2JLS (Implicit Neural Representation to Joint Latent Space) for facilitate downstream applications, such as classifying Implicit Neural Representations (INRs). Our approach demonstrates significant improvements across multiple tasks, surpassing the state-of-the-art INR classification accuracy by approximately 10\% on the CIFAR-100-INR. Our code is available at https://github.com/dddiowww/DNG.

2026-06-09

Transactions on Machine Learning Research (accepted)

openreview.net

Establishment of a tissue culture system with adventitious bud regeneration for the new raspberry germplasm 'autumn–winter yellow raspberry'

Zihan Zhu

Hanqing Zhao

Jinyu Liu

Ye Guo

Chenxing Zhang

Yingyue Li

2026-06-09

Plant (published)

doi.org

Hierarchical Integration of Predictive Representations of State from General Value Functions

Sonny Jones

Doina Precup

Patrick M. Pilarski

Ashley N Dalrymple

In this work, we investigate how predictive representations of state in the form of continually learned General Value Functions (GVFs) inter… (see more)act with downstream policy networks. Intelligent agents deployed in real-world environments need to adapt to changing conditions in their environment. Adapting to one’s environment requires a model or representation of the environment on which to base decision-making. Models that take the form of predictions and GVFs have been shown to provide temporally abstracted predictive representations of state that can forecast useful elements of an agent's or environment's future behaviour. While GVFs have been concretely deployed in rehabilitation and robotic domains, existing approaches treat predictions as input features into model frameworks, without examining or comparing how best to integrate them into downstream learning processes. In this work, we compare multiple strategies for integrating observations and GVF predictions into another learning architecture: 1) actual observations solely in the input layer, 2) predictions solely in the input layer, 3) actual observations and predictions in the input layer, and 4) actual observations in the input layer and predictions in the later latent representations. We evaluate these strategies in a rehabilitation setting, using GVFs to learn predictive representations of kinetic and kinematic signals collected from wearable sensors on the lower limb during ambulation across varied terrains, and policy networks to classify walking terrain.

2026-06-09

rl-conference.cc/RLC/2026/Workshop/RL_in_Big_Worlds (poster)

openreview.net

Optimizing Dec-POMDP Agent-State Policies via Risk-Seeking Utility

Amit Sinha

Matthieu Geist

Aditya Mahajan

Solving decentralized decision-making problems modeled as Dec-POMDPs is notoriously NEXP-complete, as optimal solutions require policies con… (see more)ditioned on an agent's entire action-observation history. To maintain tractability, it is common to restrict agents to finite-memory models, known as agent-state policies. Although this constrained policy class may not contain the globally optimal solution, finding the highest-performing agent-state policy remains a critical objective for practical applications. Addressing the challenge of planning under bounded memory, we introduce an iterated best-response algorithm that converges monotonically to a local optimum in polynomial runtime relative to the Dec-POMDP model size. To discover superior policies within this restricted memory space, we employ a novel objective that pairs a risk-seeking incentive with conservative policy updates. Our experiments on standard Dec-POMDP benchmarks demonstrate that this approach is competitive with state-of-the-art methods, delivering near-optimal results despite the limited memory.

2026-06-09

rl-conference.cc/RLC/2026/Workshop/RL_in_Big_Worlds (poster)

openreview.net

Representing Time Series as Structured Programs for LLM Reasoning

Jaeho Kim

Changhun Oh

Seokhyun Lee

Irina Rish

Changhee Lee

Large language models (LLMs) have demonstrated strong reasoning and instruction-following capabilities, making them potentially powerful too… (see more)ls for time-series analysis. However, time series lie outside their native textual modality, raising a fundamental question: how should time series be represented so that LLMs can reason about them effectively? Existing work typically serializes raw numerical sequences or fine-tunes pre-trained LLMs on time-series data. These approaches place the burden of extracting temporal structure directly on the LLM, creating a modality mismatch that often degrades performance on long sequences and introduces substantial computational overhead. In this work, we introduce Time-Series-to-Structured-Program representation (T2SP), a deterministic, training-free method that represents a time series as a structured symbolic program. T2SP decomposes time series into trends, periods, and salient events, expressing them in a program-friendly format aligned with the textual and code-like modalities on which LLMs are natively trained. By shifting temporal-structure extraction from the model to the representation itself, T2SP enables off-the-shelf LLMs to leverage their existing reasoning capabilities for time-series understanding. We evaluate T2SP on three reasoning tasks -- editing, captioning, and question answering -- where it consistently improves performance, reduces reasoning time, and lowers failure rates compared with raw-string representations. Our results demonstrate that T2SP provides an effective interface between time series and LLMs.

2026-06-09

arXiv (preprint)

doi.org

arxiv.org

SLowRL: Safe Low-Rank Adaptation for Bridging the Sim-to-Real Gap in Legged Locomotion

Elham Daneshmand

Shafeef Omar

Glen Berseth

Majid Khadiv

Hsiu-Chin Lin

A simulator is, at best, a coarse low-fidelity model of the real world the agent eventually has to act in. Closing this residual gap on hard… (see more)ware is a canonical instance of operating in a big world: the real environment exposes contact dynamics, latencies, and disturbances that the agent was never given the capacity (parameters or data) to model during pretraining. Naive on-hardware fine-tuning is risky --- the policy can damage the robot before it improves --- and full-parameter updates require prohibitive interaction time. We propose SLowRL, a continual fine-tuning framework that confronts this big-world adaptation problem with two complementary forms of capacity limitation: (i) a rank-1 LoRA adapter applied per layer to both actor and critic, restricting each layer's update to a single direction in its image space (

2026-06-09

rl-conference.cc/RLC/2026/Workshop/RL_in_Big_Worlds (poster)

openreview.net

The blueprint of human functional architecture shifts from cognition to anatomy during perturbations of consciousness

Andrea I. Luppi

Dragana Manasova

Justine Y. Hansen

Zhen-Qi Liu

Asa Farahani

Yonatan Sanz Perl

Jakub Vohryzek

Daniel Golkowski

Andreas Ranft

R. Ilg

Denis Jordan

Vincent Bonhomme

Audrey Vanhaudenhuyse

Athéna Demertzi

Océane Jaquet

Mohamed Ali Bahri

Naji Alnagger

Paolo Cardone

Lorina Naci

Adrian M. Owen … (see 9 more)

John Pickard

Guy Williams

Judith Allanson

Enrico Amico

Danilo Bzdok

Jacobo Sitt

David Menon

Emmanuel A. Stamatakis

Bratislav Misic

Consciousness and cognition arise from the ongoing interactions between brain regions. Synchronous fluctuations of fMRI signals may indicate… (see more) that two brain regions perform similar cognitive functions, but neural interactions are also constrained by anatomical connectivity and regions' molecular, cytoarchitectonic, and metabolic profiles. Here we disentangle the respective contributions of ongoing cognition and multimodal neurobiological constraints in shaping functional connectivity. We jointly contextualise haemodynamic FC against eight distinct multimodal representations of the human connectome: (i) structural connectivity from diffusion tractography; (ii) spatial embedding; (iii) similarity of transcriptional profiles from gene expression; (iv) similarity of receptor profiles from Positron Emission Tomography; (v) laminar profile similarity from histology; (vi) correlated electrophysiological activity from magnetoencephalography; (vii) correlated metabolic activity from PET glucose uptake; (viii) coordinated activation across 123 cognitive operations from the NeuroSynth meta-analytic engine. We demonstrate that cognitive co-activation is the dominant predictor of inter-regional fMRI synchrony in the awake human brain, even when quantified using intracranial electrical stimulation. Crucially, this predominance of cognitive co-activation for shaping functional connectivity is systematically obliterated across five datasets of pharmacological and pathological perturbations of consciousness (chronic disorders of consciousness; anaesthesia with sevoflurane, propofol, or ketamine) when cognition is disconnected from the environment or altogether abolished. Altogether, we show that multimodal predictors of functional architecture shift away from cognitive co-activation and toward anatomical-molecular constraints during pharmacological and pathological perturbations of consciousness.

2026-06-09

bioRxiv (accepted)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications