Join us on the Venture Scientist Bootcamp, a full time, 4-month incubator at Mila, built specifically for deep tech founders with elite STEM backgrounds.
Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Emergent Reasoning via Recursive Latent Reinforcement Pretraining
Large language models (LLMs) often rely on explicit chain-of-thought (CoT) traces to solve multi-step reasoning problems, but these traces i… (see more)ncrease inference cost, expose brittle prompt dependence, and complicate training objectives. We study an alternative: \emph{latent deliberation} implemented as a small recurrent refinement module that performs multiple internal ``thinking`` steps while keeping the external sequence length fixed. We introduce \textbf{Recursive Latent Reinforcement Pretraining (RLRP)}, a training recipe that augments a base causal LLM with a shared latent head executed for
2026-03-04
LLM_Reasoning @ International Conference on Learning Representations (published)
We introduce Generative Recursive reAsoning Models (GRAM), a recursion-based generative model that is effective for complex planning and rea… (see more)soning problems. GRAM reformulates recent latent recursive architectures as a stochastic generative process with probabilistic latent transitions, enabling efficient and stable computation entirely in latent space without relying on token-level sequences as in chain-of-thought (CoT) prompting. We optimize this generative recursion via amortized variational inference, allowing the model to represent and explore multiple plausible latent trajectories conditioned on the input. This formulation supports both conditional reasoning through
2026-03-04
RSI @ International Conference on Learning Representations (poster)
Abstract Psychedelics profoundly alter conscious experience, yet how they reshape the relationship between brain anatomy and function remain… (see more)s unclear. In particular, it is unknown whether psychedelic states reflect a global disruption of structure–function organization or a frequency– and network-specific reconfiguration of neural dynamics relative to the structural connectome. Here we address this question using source-localized magnetoencephalography mapped onto connectome harmonics to quantify structure–function coupling in humans under lysergic acid diethylamide (LSD) and placebo. LSD induces a robust decoupling of low-frequency (theta, alpha and beta) activity from anatomical constraints, indicating a global loosening of structure-aligned large-scale dynamics. In contrast, high-frequency gamma activity shows selective reorganization rather than uniform disruption. Greater gamma-band decoupling within core default-mode network regions predicts the intensity of ego dissolution across individuals, demonstrating that while LSD broadly alters large-scale dynamics, subjective loss of self is specifically linked to frequency-selective reorganization of the default-mode network. Functional decoding reveals that LSD does not produce indiscriminate disintegration but instead drives system-specific rebalancing, with preferential decoupling of visual and attentional systems and strengthened coupling within auditory networks. Together, these findings provide electrophysiological evidence that psychedelic states emerge from a frequency-dependent relaxation of structural constraints on brain activity and identify default-mode reorganization as a neural correlate of ego dissolution. These results offer a mechanistic framework for understanding how LSD may exert therapeutic effects by transiently relaxing rigid structural constraints and enhancing dynamical flexibility within networks involved in self-related processing.
Soft mellowmax (SMM) recently emerged as an alternative operator in Q-learning, achieving impressive performance in games and scientific dis… (see more)covery tasks. Despite SMM's ability to achieve high returns and its enticing robustness, diversity, and sample efficiency characteristics, SMM has not yet been translated into a Monte Carlo tree search algorithm. To address this gap, a soft mellowmax-based Monte Carlo tree search algorithm, SMM-TS, is proposed and theoretically justified. It is empirically demonstrated that SMM-TS converges significantly faster than other tree search methods in synthetic environments, while maintaining competitive performance in games. The fast convergence of SMM-TS makes recursive self-improvement loops more scalable, while the stability gained via planning and the robustness of the operator make SMM-TS more practical for agents operating in uncertain and changing environments.
2026-03-04
RSI @ International Conference on Learning Representations (poster)
Marine environments present significant challenges for perception and autonomy due to dynamic surfaces, limited visibility, and complex inte… (see more)ractions between aerial, surface, and submerged sensing modalities. This paper introduces the Aerial Marine Perception Dataset (AMP2026), a multi-platform marine robotics dataset collected across multiple field deployments designed to support research in two primary areas: multi-view tracking and marine environment mapping. The dataset includes synchronized data from aerial drones, boat-mounted cameras, and submerged robotic platforms, along with associated localization and telemetry information. The goal of this work is to provide a publicly available dataset enabling research in marine perception and multi-robot observation scenarios. This paper describes the data collection methodology, sensor configurations, dataset organization, and intended research tasks supported by the dataset.
Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent a… (see more)ctivation-based jailbreaking methods circumvent these safety mechanisms by applying orthogonal projections to remove refusal directions, but these approaches treat refusal as a one-dimensional phenomenon and ignore the rich distributional structure of model activations. We introduce a principled framework based on optimal transport theory that transforms the entire distribution of harmful activations to match harmless ones. By combining PCA with closed-form Gaussian optimal transport, we achieve efficient computation in high-dimensional representation spaces while preserving essential geometric structure. Across six models (Llama-2, Llama-3.1, Qwen-2.5; 7B-32B parameters), our method achieves up to 11% higher attack success rates than state-of-the-art baselines while maintaining comparable perplexity, demonstrating superior preservation of model capabilities. Critically, we discover that layer-selective intervention (applying optimal transport to 1-2 carefully chosen layers at approximately 40-60% network depth) substantially outperforms full-network interventions, revealing that refusal mechanisms may be localized rather than distributed. Our analysis provides new insights into the geometric structure of safety representations and suggests that current alignment methods may be vulnerable to distributional attacks beyond simple direction removal.
A central goal of single-cell transcriptomics is to reconstruct dynamic cellular processes from static scRNA-seq snapshots, yet most traject… (see more)ory inference methods rely on transcriptomic similarity as a proxy for developmental linkage — an assumption that frequently fails. While lineage tracing overcomes this limitation, it requires genetic perturbations and specialized longitudinal designs. In adaptive immune cells, T and B cell receptors (AIRs) naturally encode clonal ancestry and are routinely sequenced alongside the transcriptome, providing lineage information in standard snapshot datasets, but existing trajectory methods are not adapted to exploit this signal. Here, we lay the foundation for incorporating AIR-encoded lineage information into trajectory inference by biasing RNA-based diffusion maps toward AIR-consistent paths, thereby integrating lineage constraints into learned cell-state representations. Across simulations of increasing complexity, our multimodal approach recovers more biologically plausible trajectories than RNA-only baselines. While optimized for lymphocyte differentiation, the framework generalizes to other endogenous lineage barcodes, such as mitochondrial mutations.
2026-03-03
LMRL @ International Conference on Learning Representations (poster)
GAM: Hierarchical Graph Memory for LLM-based Agents
Zhaofen Wu
Hanrong Zhang
Fulin Lin
Wujiang Xu
Xinran Xu
Yankai Chen
Henry Peng Zou
Shaowen Chen
Weizhi Zhang
Xue Liu
Philip S. Yu
Hongwei Wang
To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information an… (see more)d retaining prior knowledge.
Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise.
Conversely, discrete structured memory architectures provide robust knowledge retention but often struggle to adapt to fluid narrative evolution.
To address this, we propose \textbf{\textsc{GAM}}, a hierarchical \textbf{G}raph-based \textbf{A}gentic \textbf{M}emory framework that explicitly decouples memory encoding from consolidation to effectively resolve the conflict between rapid context perception and stable knowledge retention.
By isolating ongoing dialogue in a event progression graph and integrating it into a topic associative network only upon semantic shifts, our approach minimizes interference while preserving long-term consistency. Additionally, we introduce a Graph-guided, Multi-factor Retrieval strategy to enhance context precision. Experiments on LoCoMo and LongDialQA benchmarks indicate that our method consistently outperforms state-of-the-art baselines in both reasoning accuracy and efficiency.
2026-03-02
MemAgent @ International Conference on Learning Representations (published)
Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons … (see more)and high inference cost. While prior work has focused on improving inference efficiency through system-level optimizations or prompt engineering, we argue that a key bottleneck lies in the representation of the action space itself. We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations. Across a range of LLM-based agent benchmarks, LAR significantly reduces the effective action horizon and improves inference efficiency under fixed compute budgets. As a consequence, our approach achieves substantial reductions in action tokens and corresponding wall-clock inference time, while maintaining or improving task success rates. These results suggest that action representation learning is a critical and underexplored factor in scaling efficient LLM agent inference, complementary to advances in model architecture and hardware.
2026-03-02
MemAgent @ International Conference on Learning Representations (published)
Offline black-box optimization aims to discover novel designs with high property scores using only a static dataset, a task fundamentally ch… (see more)allenged by the out-of-distribution (OOD) extrapolation problem. Existing approaches typically bifurcate into inverse methods, which struggle with the ill-posed nature of mapping scores to designs, and forward methods, which often lack the distributional expressivity to quantify uncertainty effectively. In this work, we propose \textbf{SPADE} (\textbf{S}upport-\textbf{P}roximity \textbf{A}ugmented \textbf{D}iffusion \textbf{E}stimation), a novel framework that reimagines forward surrogate modeling through the lens of conditional generative modeling. SPADE models the forward likelihood
2026-03-02
DeLTa @ International Conference on Learning Representations (poster)
Distributed training and increasing the gradient update frequency are practical strategies to accelerate learning and improve performance, b… (see more)ut both exacerbate a central challenge: \textit{policy lag}, which is the mismatch between the behavior policy generating data and the learning policy being updated. Policy lag can hinder the scaling of on-policy learning algorithms to larger problems. In this paper, we identify the sources of policy lag caused by distributed learning and high update frequency. We use the findings to propose \textit{total Variation-based Advantage aligned Constrained policy Optimization (\methodacronym)} as a practical approach to mitigate policy lag. We empirically validate our method and show that it offers better robustness to policy lag in classic RL tasks and a modern RL for LLM math reasoning task.