Publications

Emergent Reasoning via Recursive Latent Reinforcement Pretraining
Large language models (LLMs) often rely on explicit chain-of-thought (CoT) traces to solve multi-step reasoning problems, but these traces i… (see more)ncrease inference cost, expose brittle prompt dependence, and complicate training objectives. We study an alternative: \emph{latent deliberation} implemented as a small recurrent refinement module that performs multiple internal ``thinking`` steps while keeping the external sequence length fixed. We introduce \textbf{Recursive Latent Reinforcement Pretraining (RLRP)}, a training recipe that augments a base causal LLM with a shared latent head executed for
Generative Recursive Reasoning Models
We introduce Generative Recursive reAsoning Models (GRAM), a recursion-based generative model that is effective for complex planning and rea… (see more)soning problems. GRAM reformulates recent latent recursive architectures as a stochastic generative process with probabilistic latent transitions, enabling efficient and stable computation entirely in latent space without relying on token-level sequences as in chain-of-thought (CoT) prompting. We optimize this generative recursion via amortized variational inference, allowing the model to represent and explore multiple plausible latent trajectories conditioned on the input. This formulation supports both conditional reasoning through
LSD Relaxes Structural Constraints on Brain Dynamics and Default Mode Decoupling Tracks Ego Dissolution
Venkatesh Subramani
Annalisa Pascarella
Jérémy Brunel
Yann Harel
Suresh Muthukumaraswamy
Robin Carhart-Harris
Giulia Lioi
Nicolas Farrugia
Abstract Psychedelics profoundly alter conscious experience, yet how they reshape the relationship between brain anatomy and function remain… (see more)s unclear. In particular, it is unknown whether psychedelic states reflect a global disruption of structure–function organization or a frequency– and network-specific reconfiguration of neural dynamics relative to the structural connectome. Here we address this question using source-localized magnetoencephalography mapped onto connectome harmonics to quantify structure–function coupling in humans under lysergic acid diethylamide (LSD) and placebo. LSD induces a robust decoupling of low-frequency (theta, alpha and beta) activity from anatomical constraints, indicating a global loosening of structure-aligned large-scale dynamics. In contrast, high-frequency gamma activity shows selective reorganization rather than uniform disruption. Greater gamma-band decoupling within core default-mode network regions predicts the intensity of ego dissolution across individuals, demonstrating that while LSD broadly alters large-scale dynamics, subjective loss of self is specifically linked to frequency-selective reorganization of the default-mode network. Functional decoding reveals that LSD does not produce indiscriminate disintegration but instead drives system-specific rebalancing, with preferential decoupling of visual and attentional systems and strengthened coupling within auditory networks. Together, these findings provide electrophysiological evidence that psychedelic states emerge from a frequency-dependent relaxation of structural constraints on brain activity and identify default-mode reorganization as a neural correlate of ego dissolution. These results offer a mechanistic framework for understanding how LSD may exert therapeutic effects by transiently relaxing rigid structural constraints and enhancing dynamical flexibility within networks involved in self-related processing.
Soft Mellowmax Monte Carlo Planning
Soft mellowmax (SMM) recently emerged as an alternative operator in Q-learning, achieving impressive performance in games and scientific dis… (see more)covery tasks. Despite SMM's ability to achieve high returns and its enticing robustness, diversity, and sample efficiency characteristics, SMM has not yet been translated into a Monte Carlo tree search algorithm. To address this gap, a soft mellowmax-based Monte Carlo tree search algorithm, SMM-TS, is proposed and theoretically justified. It is empirically demonstrated that SMM-TS converges significantly faster than other tree search methods in synthetic environments, while maintaining competitive performance in games. The fast convergence of SMM-TS makes recursive self-improvement loops more scalable, while the stability gained via planning and the robustness of the operator make SMM-TS more practical for agents operating in uncertain and changing environments.
AMP2026: A Multi-Platform Marine Robotics Dataset for Tracking and Mapping
Shuo Wen
David Widhalm
Zhizun Wang
Junming Shi
Mariana Sosa Guzmán
Kalvik Jakkala
Bennett Carley
Elias Sokolova
Yogesh Girdhar
Monika Roznere
Jason O’Kane
Junaed Sattar
Marine environments present significant challenges for perception and autonomy due to dynamic surfaces, limited visibility, and complex inte… (see more)ractions between aerial, surface, and submerged sensing modalities. This paper introduces the Aerial Marine Perception Dataset (AMP2026), a multi-platform marine robotics dataset collected across multiple field deployments designed to support research in two primary areas: multi-view tracking and marine environment mapping. The dataset includes synchronized data from aerial drones, boat-mounted cameras, and submerged robotic platforms, along with associated localization and telemetry information. The goal of this work is to provide a publicly available dataset enabling research in marine perception and multi-robot observation scenarios. This paper describes the data collection methodology, sensor configurations, dataset organization, and intended research tasks supported by the dataset.
Efficient Refusal Ablation in LLM through Optimal Transport
Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent a… (see more)ctivation-based jailbreaking methods circumvent these safety mechanisms by applying orthogonal projections to remove refusal directions, but these approaches treat refusal as a one-dimensional phenomenon and ignore the rich distributional structure of model activations. We introduce a principled framework based on optimal transport theory that transforms the entire distribution of harmful activations to match harmless ones. By combining PCA with closed-form Gaussian optimal transport, we achieve efficient computation in high-dimensional representation spaces while preserving essential geometric structure. Across six models (Llama-2, Llama-3.1, Qwen-2.5; 7B-32B parameters), our method achieves up to 11% higher attack success rates than state-of-the-art baselines while maintaining comparable perplexity, demonstrating superior preservation of model capabilities. Critically, we discover that layer-selective intervention (applying optimal transport to 1-2 carefully chosen layers at approximately 40-60% network depth) substantially outperforms full-network interventions, revealing that refusal mechanisms may be localized rather than distributed. Our analysis provides new insights into the geometric structure of safety representations and suggests that current alignment methods may be vulnerable to distributional attacks beyond simple direction removal.
Multimodal Manifold Learning for Clonally Constrained Trajectory Inference
A central goal of single-cell transcriptomics is to reconstruct dynamic cellular processes from static scRNA-seq snapshots, yet most traject… (see more)ory inference methods rely on transcriptomic similarity as a proxy for developmental linkage — an assumption that frequently fails. While lineage tracing overcomes this limitation, it requires genetic perturbations and specialized longitudinal designs. In adaptive immune cells, T and B cell receptors (AIRs) naturally encode clonal ancestry and are routinely sequenced alongside the transcriptome, providing lineage information in standard snapshot datasets, but existing trajectory methods are not adapted to exploit this signal. Here, we lay the foundation for incorporating AIR-encoded lineage information into trajectory inference by biasing RNA-based diffusion maps toward AIR-consistent paths, thereby integrating lineage constraints into learned cell-state representations. Across simulations of increasing complexity, our multimodal approach recovers more biologically plausible trajectories than RNA-only baselines. While optimized for lymphocyte differentiation, the framework generalizes to other endogenous lineage barcodes, such as mitochondrial mutations.
GAM: Hierarchical Graph Memory for LLM-based Agents
Zhaofen Wu
Hanrong Zhang
Fulin Lin
Wujiang Xu
Xinran Xu
Yankai Chen
Henry Peng Zou
Shaowen Chen
Weizhi Zhang
Xue Liu
Philip S. Yu
Hongwei Wang
To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information an… (see more)d retaining prior knowledge. Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise. Conversely, discrete structured memory architectures provide robust knowledge retention but often struggle to adapt to fluid narrative evolution. To address this, we propose \textbf{\textsc{GAM}}, a hierarchical \textbf{G}raph-based \textbf{A}gentic \textbf{M}emory framework that explicitly decouples memory encoding from consolidation to effectively resolve the conflict between rapid context perception and stable knowledge retention. By isolating ongoing dialogue in a event progression graph and integrating it into a topic associative network only upon semantic shifts, our approach minimizes interference while preserving long-term consistency. Additionally, we introduce a Graph-guided, Multi-factor Retrieval strategy to enhance context precision. Experiments on LoCoMo and LongDialQA benchmarks indicate that our method consistently outperforms state-of-the-art baselines in both reasoning accuracy and efficiency.
Latent Action Reparameterization for Efficient Agent Inference
Qingwen Zeng
Zerui Xu
Zijie Guo
Yu Sun
Siru Ouyang
Jiri Gesi
Fang Wu
Jiayi Zhang
Chenglin Wu
Xiangru Tang
Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons … (see more)and high inference cost. While prior work has focused on improving inference efficiency through system-level optimizations or prompt engineering, we argue that a key bottleneck lies in the representation of the action space itself. We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations. Across a range of LLM-based agent benchmarks, LAR significantly reduces the effective action horizon and improves inference efficiency under fixed compute budgets. As a consequence, our approach achieves substantial reductions in action tokens and corresponding wall-clock inference time, while maintaining or improving task success rates. These results suggest that action representation learning is a critical and underexplored factor in scaling efficient LLM agent inference, complementary to advances in model architecture and hardware.
Resonant Motion of Echo I: A Case Study of SRP and $$J_2$$ Coupling
Catherine Massé
Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization
Yonghan Yang
Bowei He
Can Chen
Xue Liu
Offline black-box optimization aims to discover novel designs with high property scores using only a static dataset, a task fundamentally ch… (see more)allenged by the out-of-distribution (OOD) extrapolation problem. Existing approaches typically bifurcate into inverse methods, which struggle with the ill-posed nature of mapping scores to designs, and forward methods, which often lack the distributional expressivity to quantify uncertainty effectively. In this work, we propose \textbf{SPADE} (\textbf{S}upport-\textbf{P}roximity \textbf{A}ugmented \textbf{D}iffusion \textbf{E}stimation), a novel framework that reimagines forward surrogate modeling through the lens of conditional generative modeling. SPADE models the forward likelihood
Align and Filter: Improving Performance in Asynchronous On-Policy RL
Distributed training and increasing the gradient update frequency are practical strategies to accelerate learning and improve performance, b… (see more)ut both exacerbate a central challenge: \textit{policy lag}, which is the mismatch between the behavior policy generating data and the learning policy being updated. Policy lag can hinder the scaling of on-policy learning algorithms to larger problems. In this paper, we identify the sources of policy lag caused by distributed learning and high update frequency. We use the findings to propose \textit{total Variation-based Advantage aligned Constrained policy Optimization (\methodacronym)} as a practical approach to mitigate policy lag. We empirically validate our method and show that it offers better robustness to policy lag in classic RL tasks and a modern RL for LLM math reasoning task.