The Mila AI Policy Fellowship translates deep AI expertise into rigorous, public-interest policy. Read the newest publication Bridging the Expertise Gap: Knowledge Transfer Mechanisms for AI Regulation by Moritz von Knebel
This program supports AI startups at any time of the year. Benefit from cutting-edge resources and tailored support to accelerate your technology's development.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Soft mellowmax (SMM) recently emerged as an alternative operator in Q-learning, achieving impressive performance in games and scientific dis… (see more)covery tasks. Despite SMM's ability to achieve high returns and its enticing robustness, diversity, and sample efficiency characteristics, SMM has not yet been translated into a Monte Carlo tree search algorithm. To address this gap, a soft mellowmax-based Monte Carlo tree search algorithm, SMM-TS, is proposed and theoretically justified. It is empirically demonstrated that SMM-TS converges significantly faster than other tree search methods in synthetic environments, while maintaining competitive performance in games. The fast convergence of SMM-TS makes recursive self-improvement loops more scalable, while the stability gained via planning and the robustness of the operator make SMM-TS more practical for agents operating in uncertain and changing environments.
2026-03-04
RSI @ International Conference on Learning Representations (poster)
Large language models show that simple autoregressive training can yield scalable and coherent generation, but extending this paradigm to sp… (see more)eech remains challenging due to the entanglement of semantic and acoustic information. Most existing speech language models rely on text supervision, hierarchical token streams, or complex hybrid architectures, departing from the single-stream generative pretraining paradigm that has proven effective in text. In this work, we introduce WavSLM, a speech language model trained by quantizing and distilling self-supervised WavLM representations into a single codebook and optimizing an autoregressive next-chunk prediction objective. WavSLM jointly models semantic and acoustic information within a single token stream without text supervision or text pretraining. Despite its simplicity, it achieves competitive performance on consistency benchmarks and speech generation while using fewer parameters, less training data, and supporting streaming inference. Demo samples are available at https://lucadellalib.github.io/wavslm-web/.
Marine environments present significant challenges for perception and autonomy due to dynamic surfaces, limited visibility, and complex inte… (see more)ractions between aerial, surface, and submerged sensing modalities. This paper introduces the Aerial Marine Perception Dataset (AMP2026), a multi-platform marine robotics dataset collected across multiple field deployments designed to support research in two primary areas: multi-view tracking and marine environment mapping. The dataset includes synchronized data from aerial drones, boat-mounted cameras, and submerged robotic platforms, along with associated localization and telemetry information. The goal of this work is to provide a publicly available dataset enabling research in marine perception and multi-robot observation scenarios. This paper describes the data collection methodology, sensor configurations, dataset organization, and intended research tasks supported by the dataset.
Safety-aligned language models refuse harmful requests through learned refusal behaviors encoded in their internal representations. Recent a… (see more)ctivation-based jailbreaking methods circumvent these safety mechanisms by applying orthogonal projections to remove refusal directions, but these approaches treat refusal as a one-dimensional phenomenon and ignore the rich distributional structure of model activations. We introduce a principled framework based on optimal transport theory that transforms the entire distribution of harmful activations to match harmless ones. By combining PCA with closed-form Gaussian optimal transport, we achieve efficient computation in high-dimensional representation spaces while preserving essential geometric structure. Across six models (Llama-2, Llama-3.1, Qwen-2.5; 7B-32B parameters), our method achieves up to 11% higher attack success rates than state-of-the-art baselines while maintaining comparable perplexity, demonstrating superior preservation of model capabilities. Critically, we discover that layer-selective intervention (applying optimal transport to 1-2 carefully chosen layers at approximately 40-60% network depth) substantially outperforms full-network interventions, revealing that refusal mechanisms may be localized rather than distributed. Our analysis provides new insights into the geometric structure of safety representations and suggests that current alignment methods may be vulnerable to distributional attacks beyond simple direction removal.
We introduce a deep generative framework for high-dimensional Bayesian inference that enables efficient posterior sampling. As telescopes an… (see more)d simulations rapidly expand the volume and resolution of astrophysical data, fast simulation-based inference methods are increasingly needed to extract scientific insights. While diffusion-based approaches offer high-quality generative capabilities, they are hindered by slow sampling speeds. Our method performs posterior sampling an order of magnitude faster than a diffusion baseline. Applied to the problem of CMB delensing, it successfully recovers the unlensed CMB power spectrum from simulated observations. The model also remains robust to shifts in cosmological parameters, demonstrating its potential for out-of-distribution generalization and application to observational cosmological data.
A central goal of single-cell transcriptomics is to reconstruct dynamic cellular processes from static scRNA-seq snapshots, yet most traject… (see more)ory inference methods rely on transcriptomic similarity as a proxy for developmental linkage — an assumption that frequently fails. While lineage tracing overcomes this limitation, it requires genetic perturbations and specialized longitudinal designs. In adaptive immune cells, T and B cell receptors (AIRs) naturally encode clonal ancestry and are routinely sequenced alongside the transcriptome, providing lineage information in standard snapshot datasets, but existing trajectory methods are not adapted to exploit this signal. Here, we lay the foundation for incorporating AIR-encoded lineage information into trajectory inference by biasing RNA-based diffusion maps toward AIR-consistent paths, thereby integrating lineage constraints into learned cell-state representations. Across simulations of increasing complexity, our multimodal approach recovers more biologically plausible trajectories than RNA-only baselines. While optimized for lymphocyte differentiation, the framework generalizes to other endogenous lineage barcodes, such as mitochondrial mutations.
2026-03-03
LMRL @ International Conference on Learning Representations (poster)
Few-step generative modelling is an open challenge for flow models. Rectified flows tackle it by distilling a pre-trained “teacher” into… (see more) a few-step “student”, using strong noise–data couplings supplied by the teacher. For a finite dataset and a Gaussian probability path, the probability-flow vector field induced by the empirical distribution is available in closed form, which would allow us to skip training a teacher model. Surprisingly, these couplings turn out to be poor teachers and significantly reduce the performance of the student. We analyse this phenomenon empirically and theoretically, arguing that it stems from intrinsic ambiguity in the induced couplings caused by the strong sensitivity of terminal states to small initialisation perturbations. Under symmetry assumptions, we further prove that the closed-form probability-flow vector field preserves dataset symmetries and induces invariant Voronoi partitions.
2026-03-02
DeLTa @ International Conference on Learning Representations (poster)
GAM: Hierarchical Graph Memory for LLM-based Agents
Zhaofen Wu
Hanrong Zhang
Fulin Lin
Wujiang Xu
Xinran Xu
Yankai Chen
Henry Peng Zou
Shaowen Chen
Weizhi Zhang
Xue Liu
Philip S. Yu
Hongwei Wang
To sustain coherent long-term interactions, Large Language Model (LLM) agents must navigate the tension between acquiring new information an… (see more)d retaining prior knowledge.
Current unified stream-based memory systems facilitate context updates but remain vulnerable to interference from transient noise.
Conversely, discrete structured memory architectures provide robust knowledge retention but often struggle to adapt to fluid narrative evolution.
To address this, we propose \textbf{\textsc{GAM}}, a hierarchical \textbf{G}raph-based \textbf{A}gentic \textbf{M}emory framework that explicitly decouples memory encoding from consolidation to effectively resolve the conflict between rapid context perception and stable knowledge retention.
By isolating ongoing dialogue in a event progression graph and integrating it into a topic associative network only upon semantic shifts, our approach minimizes interference while preserving long-term consistency. Additionally, we introduce a Graph-guided, Multi-factor Retrieval strategy to enhance context precision. Experiments on LoCoMo and LongDialQA benchmarks indicate that our method consistently outperforms state-of-the-art baselines in both reasoning accuracy and efficiency.
2026-03-02
MemAgent @ International Conference on Learning Representations (published)
Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons … (see more)and high inference cost. While prior work has focused on improving inference efficiency through system-level optimizations or prompt engineering, we argue that a key bottleneck lies in the representation of the action space itself. We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations. Across a range of LLM-based agent benchmarks, LAR significantly reduces the effective action horizon and improves inference efficiency under fixed compute budgets. As a consequence, our approach achieves substantial reductions in action tokens and corresponding wall-clock inference time, while maintaining or improving task success rates. These results suggest that action representation learning is a critical and underexplored factor in scaling efficient LLM agent inference, complementary to advances in model architecture and hardware.
2026-03-02
MemAgent @ International Conference on Learning Representations (published)
Offline black-box optimization aims to discover novel designs with high property scores using only a static dataset, a task fundamentally ch… (see more)allenged by the out-of-distribution (OOD) extrapolation problem. Existing approaches typically bifurcate into inverse methods, which struggle with the ill-posed nature of mapping scores to designs, and forward methods, which often lack the distributional expressivity to quantify uncertainty effectively. In this work, we propose \textbf{SPADE} (\textbf{S}upport-\textbf{P}roximity \textbf{A}ugmented \textbf{D}iffusion \textbf{E}stimation), a novel framework that reimagines forward surrogate modeling through the lens of conditional generative modeling. SPADE models the forward likelihood
2026-03-02
DeLTa @ International Conference on Learning Representations (poster)
Distributed training and increasing the gradient update frequency are practical strategies to accelerate learning and improve performance, b… (see more)ut both exacerbate a central challenge: \textit{policy lag}, which is the mismatch between the behavior policy generating data and the learning policy being updated. Policy lag can hinder the scaling of on-policy learning algorithms to larger problems. In this paper, we identify the sources of policy lag caused by distributed learning and high update frequency. We use the findings to propose \textit{total Variation-based Advantage aligned Constrained policy Optimization (\methodacronym)} as a practical approach to mitigate policy lag. We empirically validate our method and show that it offers better robustness to policy lag in classic RL tasks and a modern RL for LLM math reasoning task.