Publications

Soft Mellowmax Monte Carlo Planning

Soft mellowmax (SMM) recently emerged as an alternative operator in Q-learning, achieving impressive performance in games and scientific dis… (see more)covery tasks. Despite SMM's ability to achieve high returns and its enticing robustness, diversity, and sample efficiency characteristics, SMM has not yet been translated into a Monte Carlo tree search algorithm. To address this gap, a soft mellowmax-based Monte Carlo tree search algorithm, SMM-TS, is proposed and theoretically justified. It is empirically demonstrated that SMM-TS converges significantly faster than other tree search methods in synthetic environments, while maintaining competitive performance in games. The fast convergence of SMM-TS makes recursive self-improvement loops more scalable, while the stability gained via planning and the robustness of the operator make SMM-TS more practical for agents operating in uncertain and changing environments.

2026-03-04

RSI @ International Conference on Learning Representations (poster)

openreview.net

WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

Luca Della Libera

Cem Subakan

Mirco Ravanelli

Large language models show that simple autoregressive training can yield scalable and coherent generation, but extending this paradigm to sp… (see more)eech remains challenging due to the entanglement of semantic and acoustic information. Most existing speech language models rely on text supervision, hierarchical token streams, or complex hybrid architectures, departing from the single-stream generative pretraining paradigm that has proven effective in text. In this work, we introduce WavSLM, a speech language model trained by quantizing and distilling self-supervised WavLM representations into a single codebook and optimizing an autoregressive next-chunk prediction objective. WavSLM jointly models semantic and acoustic information within a single token stream without text supervision or text pretraining. Despite its simplicity, it achieves competitive performance on consistency benchmarks and speech generation while using fewer parameters, less training data, and supporting streaming inference. Demo samples are available at https://lucadellalib.github.io/wavslm-web/.

2026-03-04

arXiv (preprint)

doi.org

arxiv.org

AMP2026: A Multi-Platform Marine Robotics Dataset for Tracking and Mapping

Edwin Meriaux

Shuo Wen

David Widhalm

Zhizun Wang

Junming Shi

Mariana Sosa Guzmán

Kalvik Jakkala

Bennett Carley

Elias Sokolova

Yogesh Girdhar

Monika Roznere

Jason O’Kane

Junaed Sattar

Gregory Dudek

Marine environments present significant challenges for perception and autonomy due to dynamic surfaces, limited visibility, and complex inte… (see more)ractions between aerial, surface, and submerged sensing modalities. This paper introduces the Aerial Marine Perception Dataset (AMP2026), a multi-platform marine robotics dataset collected across multiple field deployments designed to support research in two primary areas: multi-view tracking and marine environment mapping. The dataset includes synchronized data from aerial drones, boat-mounted cameras, and submerged robotic platforms, along with associated localization and telemetry information. The goal of this work is to provide a publicly available dataset enabling research in marine perception and multi-robot observation scenarios. This paper describes the data collection methodology, sensor configurations, dataset organization, and intended research tasks supported by the dataset.

2026-03-03

arXiv (preprint)

doi.org

arxiv.org

Efficient Refusal Ablation in LLM through Optimal Transport

Geraldin Nanfack

Eugene Belilovsky

Elvis Dohmatob

Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent a… (see more)ctivation-based jailbreaking methods circumvent these safety mechanisms by applying orthogonal projections to remove refusal directions, but these approaches treat refusal as a one-dimensional phenomenon and ignore the rich distributional structure of model activations. We introduce a principled framework based on optimal transport theory that transforms the entire distribution of harmful activations to match harmless ones. By combining PCA with closed-form Gaussian optimal transport, we achieve efficient computation in high-dimensional representation spaces while preserving essential geometric structure. Across six models (Llama-2, Llama-3.1, Qwen-2.5; 7B-32B parameters), our method achieves up to 11% higher attack success rates than state-of-the-art baselines while maintaining comparable perplexity, demonstrating superior preservation of model capabilities. Critically, we discover that layer-selective intervention (applying optimal transport to 1-2 carefully chosen layers at approximately 40-60% network depth) substantially outperforms full-network interventions, revealing that refusal mechanisms may be localized rather than distributed. Our analysis provides new insights into the geometric structure of safety representations and suggests that current alignment methods may be vulnerable to distributional attacks beyond simple direction removal.

2026-03-03

arXiv (preprint)

doi.org

openreview.net

A Fast Generative Framework for High-dimensional Posterior Sampling: Application to CMB Delensing

Hadi Sotoudeh

Pablo Lemos

Laurence Perreault-Levasseur

We introduce a deep generative framework for high-dimensional Bayesian inference that enables efficient posterior sampling. As telescopes an… (see more)d simulations rapidly expand the volume and resolution of astrophysical data, fast simulation-based inference methods are increasingly needed to extract scientific insights. While diffusion-based approaches offer high-quality generative capabilities, they are hindered by slow sampling speeds. Our method performs posterior sampling an order of magnitude faster than a diffusion baseline. Applied to the problem of CMB delensing, it successfully recovers the unlensed CMB power spectrum from simulated observations. The model also remains robust to shifts in cosmological parameters, demonstrating its potential for out-of-distribution generalization and application to observational cosmological data.

2026-03-03

arXiv (preprint)

doi.org

arxiv.org

Multimodal Manifold Learning for Clonally Constrained Trajectory Inference

Irene Bonafonte Pardàs

Myriam Lizotte

Guy Wolf

Benjamin Schubert

A central goal of single-cell transcriptomics is to reconstruct dynamic cellular processes from static scRNA-seq snapshots, yet most traject… (see more)ory inference methods rely on transcriptomic similarity as a proxy for developmental linkage — an assumption that frequently fails. While lineage tracing overcomes this limitation, it requires genetic perturbations and specialized longitudinal designs. In adaptive immune cells, T and B cell receptors (AIRs) naturally encode clonal ancestry and are routinely sequenced alongside the transcriptome, providing lineage information in standard snapshot datasets, but existing trajectory methods are not adapted to exploit this signal. Here, we lay the foundation for incorporating AIR-encoded lineage information into trajectory inference by biasing RNA-based diffusion maps toward AIR-consistent paths, thereby integrating lineage constraints into learned cell-state representations. Across simulations of increasing complexity, our multimodal approach recovers more biologically plausible trajectories than RNA-only baselines. While optimized for lymphocyte differentiation, the framework generalizes to other endogenous lineage barcodes, such as mitochondrial mutations.

2026-03-03

LMRL @ International Conference on Learning Representations (poster)

openreview.net

On Closed-Form Couplings

Tobias Höppe

Stefan Bauer

Qiang Liu

Andrea Dittadi

Kirill Neklyudov

Few-step generative modelling is an open challenge for flow models. Rectified flows tackle it by distilling a pre-trained “teacher” into… (see more) a few-step “student”, using strong noise–data couplings supplied by the teacher. For a finite dataset and a Gaussian probability path, the probability-flow vector field induced by the empirical distribution is available in closed form, which would allow us to skip training a teacher model. Surprisingly, these couplings turn out to be poor teachers and significantly reduce the performance of the student. We analyse this phenomenon empirically and theoretically, arguing that it stems from intrinsic ambiguity in the induced couplings caused by the strong sensitivity of terminal states to small initialisation perturbations. Under symmetry assumptions, we further prove that the closed-form probability-flow vector field preserves dataset symmetries and induces invariant Voronoi partitions.

2026-03-02

DeLTa @ International Conference on Learning Representations (poster)

openreview.net

GAM: Hierarchical Graph Memory for LLM-based Agents

Zhaofen Wu

Hanrong Zhang

Fulin Lin

Wujiang Xu

Xinran Xu

Yankai Chen

Henry Peng Zou

Shaowen Chen

Weizhi Zhang

Xue Liu

Philip S. Yu

Hongwei Wang

To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information an… (see more)d retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete structured memory architectures provide robust knowledge retention but often struggle to adapt to fluid narrative evolution. To address this, we propose \textbf{\textsc{GAM}}, a hierarchical \textbf{G}raph-based \textbf{A}gentic \textbf{M}emory framework that explicitly decouples memory encoding from consolidation to effectively resolve the conflict between rapid context perception and stable knowledge retention. By isolating ongoing dialogue in a event progression graph and integrating it into a topic associative network only upon semantic shifts, our approach minimizes interference while preserving long-term consistency. Additionally, we introduce a Graph-guided, Multi-factor Retrieval strategy to enhance context precision. Experiments on LoCoMo and LongDialQA benchmarks indicate that our method consistently outperforms state-of-the-art baselines in both reasoning accuracy and efficiency.

2026-03-02

MemAgent @ International Conference on Learning Representations (published)

openreview.net

Latent Action Reparameterization for Efficient Agent Inference

Qingwen Zeng

Wenhao Huang

Zerui Xu

Zijie Guo

Yu Sun

Cheng Yang

Siru Ouyang

Jiri Gesi

Fang Wu

Jiayi Zhang

Bang Liu

Chenglin Wu

Xiangru Tang

Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons … (see more)and high inference cost. While prior work has focused on improving inference efficiency through system-level optimizations or prompt engineering, we argue that a key bottleneck lies in the representation of the action space itself. We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations. Across a range of LLM-based agent benchmarks, LAR significantly reduces the effective action horizon and improves inference efficiency under fixed compute budgets. As a consequence, our approach achieves substantial reductions in action tokens and corresponding wall-clock inference time, while maintaining or improving task success rates. These results suggest that action representation learning is a critical and underexplored factor in scaling efficient LLM agent inference, complementary to advances in model architecture and hardware.

2026-03-02

MemAgent @ International Conference on Learning Representations (published)

openreview.net

Resonant Motion of Echo I: A Case Study of SRP and $$J_2$$ Coupling

Catherine Massé

Inna Sharf

2026-03-02

Journal of the Astronautical Sciences (published)

doi.org

Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization

Yonghan Yang

Bowei He

Can Chen

Xue Liu

Offline black-box optimization aims to discover novel designs with high property scores using only a static dataset, a task fundamentally ch… (see more)allenged by the out-of-distribution (OOD) extrapolation problem. Existing approaches typically bifurcate into inverse methods, which struggle with the ill-posed nature of mapping scores to designs, and forward methods, which often lack the distributional expressivity to quantify uncertainty effectively. In this work, we propose \textbf{SPADE} (\textbf{S}upport-\textbf{P}roximity \textbf{A}ugmented \textbf{D}iffusion \textbf{E}stimation), a novel framework that reimagines forward surrogate modeling through the lens of conditional generative modeling. SPADE models the forward likelihood

2026-03-02

DeLTa @ International Conference on Learning Representations (poster)

doi.org

openreview.net

Align and Filter: Improving Performance in Asynchronous On-Policy RL

Homayoun Honari

Roger Creus Castanyer

Distributed training and increasing the gradient update frequency are practical strategies to accelerate learning and improve performance, b… (see more)ut both exacerbate a central challenge: \textit{policy lag}, which is the mismatch between the behavior policy generating data and the learning policy being updated. Policy lag can hinder the scaling of on-policy learning algorithms to larger problems. In this paper, we identify the sources of policy lag caused by distributed learning and high update frequency. We use the findings to propose \textit{total Variation-based Advantage aligned Constrained policy Optimization (\methodacronym)} as a practical approach to mitigate policy lag. We empirically validate our method and show that it offers better robustness to policy lag in classic RL tasks and a modern RL for LLM math reasoning task.

2026-03-01

arXiv (preprint)

doi.org

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Publications