Publications

Modular Memory is the Key to Continual Learning Agents

Vaggelis Dorovatas

Malte Schwerin

Andrew D. Bagdanov

Lucas Caccia

Antonio Carta

Laurent Charlin

Barbara Hammer

Tyler L. Hayes

Timm Hess

Christopher Kanan

Dhireesha Kudithipudi

Xialei Liu

Vincenzo Lomonaco

Jorge Mendez-Mendez

Darshan Patil

Ameya Prabhu

Elisa Ricci

Tinne Tuytelaars

Gido M. van de Ven

Liyuan Wang … (see 4 more)

Joost van de Weijer

Jonghyun Choi

Martin Mundt

Rahaf Aljundi

Foundation models have transformed machine learning through large-scale pretraining and increased test-time compute. Despite surpassing huma… (see more)n performance in several domains, these models remain fundamentally limited in continuous operation, experience accumulation, and personalization, capabilities that are central to adaptive intelligence. While continual learning research has long targeted these goals, its historical focus on in-weight learning (IWL), i.e., updating a single model's parameters to absorb new knowledge, has rendered catastrophic forgetting a persistent challenge. Our position is that combining the strengths of In-Weight Learning (IWL) and the newly emerged capabilities of In-Context Learning (ICL) through the design of modular memory is the missing piece for continual adaptation at scale. We outline a conceptual framework for modular memory-centric architectures that leverage ICL for rapid adaptation and knowledge accumulation, and IWL for stable updates to model capabilities, charting a practical roadmap toward continually learning agents.

2026-03-01

arXiv (preprint)

doi.org

arxiv.org

Molecule property prediction with molecular orbitals

Yan Zhang

Khang Ngo

Sékou-Oumar Kaba

Daniel T. Levy

Siamak Ravanbakhsh

Aristide Baratin

Kisoo Kwon

MiYoung Jang

Eun Hyun Cho

Sangha Park

Sanghyun Yoo

Young-Seok Kim

Hasup Lee

Molecular orbitals describe the distribution of electrons in a molecule and are frequently used by chemists to understand properties of mole… (see more)cules, yet machine learning has neglected them so far. If atom coordinates are obtained through DFT anyway, they can be obtained for free at the same time and are thus a useful source of additional data, particularly when data is scarce We give an introduction to molecular orbitals for a machine learning audience and propose models to process three different representations of them. Experiments on a dataset with experimental properties show that including MOs significantly improves performance and sample efficiency over a pretrained molecular foundation model on this real-world task.

2026-03-01

AI4Mat @ International Conference on Learning Representations (poster)

openreview.net

Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

Valliappan CA

David Meger

Sai Rajeswar

Pietro Mazzaglia

Goal-conditioned reinforcement learning (GCRL) requires agents to learn effective state and goal representations, which represents a challen… (see more)ging problem, especially in high-dimensional vision-based environments, as differences in the observations can be uncorrelated with dynamical distances. Classical deep reinforcement learning techniques often fail to capture the alignment between state and goal spaces, requiring additional representation learning techniques. To address this, we propose

2026-03-01

World Models @ International Conference on Learning Representations (published)

openreview.net

SCOPE: Selective Cross-modal Orchestration of Visual Perception Experts

Tianyu Zhang

Suyuchen Wang

Chao Wang

Juan A. Rodriguez

Ahmed Masry

Xiangru Jian

Yoshua Bengio

Perouz Taslakian

Vision-language models (VLMs) benefit from multiple vision encoders, but naively stacking them yields diminishing returns while multiplying … (see more)inference costs. We propose SCOPE, a Mixture-of-Encoders (MoEnc) framework that dynamically selects one specialized encoder per image-text pair via instance-level routing, unlike token-level routing in traditional MoE. SCOPE maintains a shared encoder and a pool of routed encoders. A lightweight router uses cross-attention between text prompts and shared visual features to select the optimal encoder from the routed encoders. To train this router, we introduce dual entropy regularization with auxiliary losses to balance dataset-level load distribution with instance-level routing confidence. Remarkably, SCOPE with one shared plus one routed encoder outperforms models using all four extra encoders simultaneously, while reducing compute by 24-49%. This demonstrates that intelligent encoder selection beats brute-force aggregation, challenging the prevailing paradigm in multi-encoder VLMs.

2026-03-01

MM_Intelligence @ International Conference on Learning Representations (poster)

doi.org

openreview.net

Spatial distribution of spinal cord fMRI activity with electrocutaneous stimulation

Sandrine Bédard

Merve Kaptan

Teresa Indriolo

Christine SW Law

Dario Pfyffer

Lindsay Lee

John K Ratliff

Serena S. Hu

Suzanne Tharin

Zachary A. Smith

GARY GLOVER

Sean C Mackey

Julien Cohen-Adad

Kenneth A. Weber

Sensory organization at the spinal segment level is commonly inferred from dermatomal maps that assume a fixed correspondence between cutane… (see more)ous regions and spinal segments. However, based on the complexities of spinal neuroanatomy and neurophysiology, the distribution of sensory signals within the cord may be broader and less segment-specific than dermatomal maps suggest, leaving the segment-level localization of sensory-evoked activity in humans uncertain. Spinal cord functional magnetic resonance imaging (fMRI) is currently the only technique capable of noninvasively mapping sensory activity with high spatial resolution in the human spinal cord. However, its application remains technically challenging and is limited by the uncertainty in segmental localization. In this study, we leveraged recent advancements in spinal cord fMRI, including spinal nerve rootlet-based spatial normalization, to investigate how sensory information is represented and distributed within the human spinal cord during electrocutaneous stimulation of the third digit of the right hand (i.e., C7 dermatome). Forty healthy adults were scanned with electrocutaneous stimulation at four individualized intensities across multiple runs to quantify (i) the rostrocaudal distribution of sensory-evoked activity, (ii) intensity-dependent changes in detectability and localization, and (iii) the effect of normalization strategy on segmental localization. Across participants, stimulation produced activation localized in the lower cervical cord (e.g., C6-C8), with the most consistent segmental localization near C7. Stronger stimulation increased detectability and produced more consistent segmental localization across participants. Importantly, normalization that incorporated nerve rootlet landmarks sharpened localization and improved sensitivity relative to conventional intervertebral disc-based alignment. This highlights the value of functionally relevant anatomical landmarks for group inference in the spinal cord. Responses were strongest in the initial run and attenuated with repetition, suggesting habituation or adaptation that can bias multi-run paradigms if unmodeled. Together, our results define practical acquisition and analysis conditions (e.g., stimulation strength, anatomical alignment strategy, and run structure) under which segment-level spinal sensory responses can be detected, thereby supporting more reliable studies of human spinal cord future basic and translational studies, including pain mechanisms, sensory function, and spinal injury.

2026-03-01

medRxiv (preprint)

doi.org

Temporal Representations for Exploration: Learning Complex Exploratory Behavior without Extrinsic Rewards

Faisal Mohamed

Catherine Ji

Benjamin Eysenbach

Glen Berseth

Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent per… (see more)ceives and represents the world. To learn powerful representations, an agent should actively explore states that contribute to its knowledge of the environment. Temporal representations can capture the information necessary to solve a wide range of potential tasks while avoiding the computational cost associated with full state reconstruction. In this paper, we propose an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes. We demonstrate that such representations can enable the learning of complex exploratory x in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards. Unlike approaches that rely on explicit distance learning or episodic memory mechanisms (e.g., quasimetric-based methods), our method builds directly on temporal similarities, yielding a simpler yet effective strategy for exploration.

2026-03-01

arXiv (preprint)

doi.org

arxiv.org

The Geometry of Spectral Gradient Descent: Layerwise Criteria for SignSGD vs SpecSGD

Hiroki Naganuma

Laura Gomezjurado

Mahdi Ghaznavi

Ioannis Mitliagkas

Optimization in deep learning has expanded beyond Euclidean methods to include entrywise sign updates (SignSGD) and spectral sign updates (S… (see more)pecGD/Muon). While both can be viewed as steepest descent under non-Euclidean geometries (

2026-03-01

GRaM @ International Conference on Learning Representations (poster)

openreview.net

Titanium nanotube arrays promote the activity of anastomotic healing-related cells by increasing fibronectin adsorption and activating the RGD–integrin pathway

Pengyu Chen

Bang Liu

Yijia Li

Yahui Hu

Weihua Fu

The smooth titanium staples of stapling devices cannot reduce the incidence of gastrointestinal anastomotic leakage due to their bioinert na… (see more)ture and lack of active wound-healing promotion capability. This study aims to investigate whether titanium nanotube arrays (TNTs) can enhance the activity of cells involved in gastrointestinal anastomotic healing and further explore the potential mechanisms. TNTs were fabricated on pure titanium sheets via anodic oxidation, and characterized using scanning electron microscopy, roughness analysis, contact angle measurement, and x-ray photoelectron spectroscopy. Cell adhesion, proliferation, spreading, collagen secretion, and integrin expression were evaluated using methods such as CCK-8, immunofluorescence, qPCR, enzyme-linked immunosorbent assay (ELISA), and Western blot. Fibronectin (FN) adsorption and Arg-Gly-Asp tripeptide sequence (RGD domain) exposure were detected via bicinchoninic acid assay, fluorescent staining, and ELISA. The role of the RGD-integrin pathway was further investigated by supplementing serum-reduced medium with exogenous FN and using RGD-specific antagonists. The results showed that TNTs increased the roughness, hydrophilicity, and surface free energy of titanium surfaces. Compared with smooth pure titanium, TNTs promoted the adhesion, proliferation, spreading, and integrin expression of gastric mucosal epithelial cells and fibroblasts, while enhancing the collagen secretion capacity of fibroblasts. Moreover, TNTs adsorbed more FN and exposed more RGD domains, thereby upregulating integrin α5β1 expression. The RGD antagonist could reverse these enhanced cellular responses, confirming the pivotal role of the FN–RGD–integrin pathway. The conclusion indicates that TNTs enhance the adhesion, proliferation, and functional activity of gastrointestinal anastomosis-related cells by promoting FN adsorption and activating the RGD–integrin pathway, which demonstrates that TNT-modified titanium materials hold significant potential for developing bioactive anastomotic devices and promoting tissue healing.

2026-03-01

Biomedical Materials (published)

doi.org

Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models

Zhilong Zhang

Haoxiang Ren

Yihao Sun

Yifei Sheng

Haonan Wang

Haoxin Lin

Zhichao Wu

Pierre-Luc Bacon

Yang Yu

Vision-Language-Action (VLA) models show strong generalization for robotic control, but finetuning them with reinforcement learning (RL) is … (see more)constrained by the high cost and safety risks of real-world interaction. Training VLA models in interactive world models avoids these issues but introduces several challenges, including pixel-level world modeling, multi-view consistency, and compounding errors under sparse rewards. Building on recent advances across multimodal models and model-based RL, we propose **VLA-MBPO**, a practical world model-based RL framework to tackle these problems in VLA finetuning. Our approach is guided by three key design choices: (i) adapting *unified multimodal models (UMMs)* to VLA settings, leveraging rich multimodal priors to enable world modeling with limited data; (ii) introducing an *interleaved view decoding* mechanism to enforce consistency across views; and (iii) employing *chunk-level branched rollout* to limit rollout horizons and mitigate error compounding during policy optimization. Our theoretical analysis shows a reduction in value gap of VLA-MBPO, and experiments in both simulated and real-world tasks demonstrate that our method effectively improves policy performance and sample efficiency for VLA finetuning.

2026-03-01

World Models @ International Conference on Learning Representations (published)

doi.org

openreview.net

Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection

Fatemeh Pesaran zadeh

Seyeon Choi

Xing Han Lu

Siva Reddy

Gunhee Kim

Large language models (LLMs) have enabled web agents that follow natural language goals through multi-step browser interactions. However, ag… (see more)ents fine-tuned on specific trajectories and domain often struggle to generalize out of domain, and offline training can be compute-inefficient due to noisy, redundant trajectories and long accessibility-tree (AXTree) states. To address both issues, we propose Weasel, a trajectory selection method for offline training of web agents. Weasel selects a fixed-budget subset of trajectory steps by optimizing an objective that balances unary importance with pairwise diversity over states, websites, and interaction patterns, solving efficiently with a greedy algorithm. We further improve efficiency with action-centered AXTree pruning that keeps only content around the ground-truth action target, and we mitigate style mismatch for reasoning-native models by replacing expert traces with model-generated, style-consistent rationales. Across AgentTrek and NNetNav training datasets, evaluations in WebArena, WorkArena, and MiniWob, and experiments with Qwen2.5-7B, Gemma3-4B, and Qwen3-8B, Weasel improves out-of-domain performance while reducing training cost, producing roughly 9.7-12.5

2026-03-01

LLA @ International Conference on Learning Representations (poster)

openreview.net

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression

Wenjie Du

Li Jiang

Keda Tao

Xue Liu

Huan Wang

Reasoning large language models exhibit complex reasoning behaviors via extended chain-of-thought generation that are highly fragile to info… (see more)rmation loss during decoding, creating critical challenges for KV cache compression. Existing token-dropping methods directly disrupt reasoning chains by removing intermediate steps, while head-reallocation methods, designed for retrieval tasks, fail to preserve the heads essential for generative reasoning. However, no existing method can identify which attention heads genuinely maintain reasoning consistency and control generation termination. To address this, we propose RLKV, which uses reinforcement learning as a probe to discover which heads contribute to reasoning quality by directly optimizing their cache usage against actual generation outcomes. This discovery naturally leads to an efficient compression strategy: we allocate full KV cache to reasoning-critical heads while aggressively compressing others with constant-size KV cache. Experiments reveal that a fraction of heads proves essential for reasoning, enabling 20--50% cache reduction with near-lossless performance across diverse tasks and models.

2026-03-01

LIT @ International Conference on Learning Representations (accepted)

doi.org

openreview.net

Chromatin landscape and enhancer-gene interaction differences between three cardiac cell types

Yan Zhu

Jean‐Christophe Grenier

Raphaël Poujol

Olivier Tastet

Caroline Lee

Svenja Koslowski

Marouane Benzaki

Talal Fawaz

Julie Hussin

Roger Foo

Chukwuemeka George Anene-Nzelu

Matthew Ackers-Johnson

2026-02-28

Journal of Molecular and Cellular Cardiology Plus (published)

doi.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Publications