Publications

Generalizable, real-time neural decoding with hybrid state-space models

Avery Hee-Woon Ryoo

Nanda H Krishna

Mehdi Azabou

Eva L Dyer

Matthew G Perich

Real-time decoding of neural activity is central to neuroscience and neurotechnology applications, from closed-loop experiments to brain-com… (voir plus)puter interfaces, where models are subject to strict latency constraints. Traditional methods, including simple recurrent neural networks, are fast and lightweight but often struggle to generalize to unseen data. In contrast, recent Transformer-based approaches leverage large-scale pretraining for strong generalization performance, but typically have much larger computational requirements and are not always suitable for low-resource or real-time settings. To address these shortcomings, we present POSSM, a novel hybrid architecture that combines individual spike tokenization via a cross-attention module with a recurrent state-space model (SSM) backbone to enable (1) fast and causal online prediction on neural activity and (2) efficient generalization to new sessions, individuals, and tasks through multi-dataset pretraining. We evaluate POSSM's decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications, namely handwriting and speech decoding in human subjects. Notably, we demonstrate that pretraining on monkey motor-cortical recordings improves decoding performance on the human handwriting task, highlighting the exciting potential for cross-species transfer. In all of these tasks, we find that POSSM achieves decoding accuracy comparable to state-of-the-art Transformers, at a fraction of the inference cost (up to 9x faster on GPU). These results suggest that hybrid SSMs are a promising approach to bridging the gap between accuracy, inference speed, and generalization when training neural decoders for real-time, closed-loop applications.

2025-09-17

Neural Information Processing Systems (poster)

doi.org

openreview.net

GreenHyperSpectra: A multi-source hyperspectral dataset for global vegetation trait prediction

Eya Cherif

Arthur Ouaknine

Luke A. Brown

Phuong D. Dao

Kyle R. Kovach

Bing Lu

Daniel Mederer

Hannes Feilhauer

Teja Kattenborn

David Rolnick

Plant traits such as leaf carbon content and leaf mass are essential variables in the study of biodiversity and climate change. However, con… (voir plus)ventional field sampling cannot feasibly cover trait variation at ecologically meaningful spatial scales. Machine learning represents a valuable solution for plant trait prediction across ecosystems, leveraging hyperspectral data from remote sensing. Nevertheless, trait prediction from hyperspectral data is challenged by label scarcity and substantial domain shifts (\eg across sensors, ecological distributions), requiring robust cross-domain methods. Here, we present GreenHyperSpectra, a pretraining dataset encompassing real-world cross-sensor and cross-ecosystem samples designed to benchmark trait prediction with semi- and self-supervised methods. We adopt an evaluation framework encompassing in-distribution and out-of-distribution scenarios. We successfully leverage GreenHyperSpectra to pretrain label-efficient multi-output regression models that outperform the state-of-the-art supervised baseline. Our empirical analyses demonstrate substantial improvements in learning spectral representations for trait prediction, establishing a comprehensive methodological framework to catalyze research at the intersection of representation learning and plant functional traits assessment. All code and data are available at: https://github.com/echerif18/HyspectraSSL.

2025-09-17

NeurIPS.cc/2025/Datasets_and_Benchmarks_Track (poster)

doi.org

openreview.net

How to Train Your LLM Web Agent: A Statistical Diagnosis

Dheeraj Vattikonda

Santhoshi Ravichandran

Emiliano Penaloza

Hadi Nekoei

Megh Thakkar

Thibault Le Sellier De Chezelles

Nicolas Gontier

Miguel Muñoz-Mármol

Sahar Omidi Shayegan

Stefania Raimondo

Xue Liu

Alexandre Drouin

Alexandre Piché

Alexandre Lacoste

Massimo Caccia

LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with op… (voir plus)en-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents. To address this, we present the first statistically grounded study on compute allocation for LLM web-agent post-training. Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT), followed by on-policy reinforcement learning. We find this process highly sensitive to hyperparameter choices, making exhaustive sweeps impractical. To spare others from expensive trial-and-error, we sample 1,370 configurations and use bootstrapping to estimate effective hyperparameters. Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++. Further, this strategy requires only 55% of the compute to match the peak performance of pure SFT on MiniWob++, effectively pushing the compute-performance Pareto frontier, and is the only strategy that can close the gap with closed-source models.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

doi.org

openreview.net

Increasing the Utility of Synthetic Images through Chamfer Guidance

Nicola Dall'Asen

Xiaofeng Zhang

Reyhane Askari-Hemmat

Melissa Hall

Jakob Verbeek

Adriana Romero-Soriano

Michal Drozdzal

Conditional image generative models hold considerable promise to produce infinite amounts of synthetic training data. Yet, recent progress i… (voir plus)n generation quality has come at the expense of generation diversity, limiting the utility of these models as a source of synthetic training data. Although guidance-based approaches have been introduced to improve the utility of generated data by focusing on quality or diversity, the (implicit or explicit) utility functions oftentimes disregard the potential distribution shift between synthetic and real data. In this work, we introduce Chamfer Guidance: a training-free guidance approach which leverages a handful of real exemplar images to characterize the quality and diversity of synthetic data. We show that by leveraging the proposed Chamfer Guidance, we can boost the diversity of the generations w.r.t. a dataset of real images while maintaining or improving the generation quality on ImageNet-1k and standard geo-diversity benchmarks. Our approach achieves state-of-the-art few-shot performance with as little as 2 exemplar real images, obtaining 96.4% in terms of precision, and 86.4% in terms of distributional coverage, which increase to 97.5% and 92.7%, respectively, when using 32 real images. We showcase the benefits of the Chamfer Guidance generation by training downstream image classifiers on synthetic data, achieving accuracy boost of up to 15% for in-distribution over the baselines, and up to 16% in out-of-distribution. Furthermore, our approach does not require using the unconditional model, and thus obtains a 31% reduction in FLOPs w.r.t. classifier-free-guidance-based approaches at sampling time.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

doi.org

openreview.net

Know Thyself by Knowing Others: Learning Neuron Identity from Population Context

Vinam Arora

Divyansha Lachi

Ian J. Knight

Mehdi Azabou

Blake Richards

Cole Hurwitz

Joshua H. Siegle

Eva L. Dyer

Identifying the functional identity of individual neurons is essential for interpreting circuit dynamics, yet it remains a major challenge i… (voir plus)n large-scale _in vivo_ recordings where anatomical and molecular labels are often unavailable. Here we introduce NuCLR, a self-supervised framework that learns context-aware representations of neuron identity by modeling each neuron's role within the broader population. NuCLR employs a spatio-temporal transformer that captures both within-neuron dynamics and across-neuron interactions. It is trained with a sample-wise contrastive objective that encourages temporally-stable and discriminative embeddings. Across multiple open-access datasets, NuCLR outperforms prior methods in both cell type and brain region classification. Critically, it exhibits strong zero-shot generalization to entirely new populations, without any retraining or access to stimulus labels. Furthermore, we demonstrate that our framework scales effectively with data size. Overall, our results demonstrate that modeling population context is crucial for understanding neuron identity and that rich signal for cell-typing and neuron localization is present in neural activity alone.Code available at: https://github.com/nerdslab/nuclr.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

openreview.net

Learning to Solve Complex Problems via Dataset Decomposition

WANRU ZHAO

Lucas Caccia

Zhengyan Shi

Minseon Kim

Weijia Xu

Xingdi Yuan

Alessandro Sordoni

Marc-Alexandre Côté

2025-09-17

NeurIPS.cc/2025/Conference (poster)

openreview.net

Learning Task-Agnostic Representations through Multi-Teacher Distillation

Philippe Formont

Maxime Darrin

Banafsheh Karimian

Eric Granger

Jackie CK Cheung

Ismail Ben Ayed

Mohammadhadi Shateri

Pablo Piantanida

Casting complex inputs into tractable representations is a critical step across various fields. Diverse embedding models emerge from differe… (voir plus)nces in architectures, loss functions, input modalities and datasets, each capturing unique aspects of the input. Multi-teacher distillation leverages this diversity to enrich representations but often remains tailored to specific tasks. In this paper, we introduce a task-agnostic framework based on a ``majority vote" objective function. We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings, leading to a task-agnostic distillation loss that eliminates dependence on task-specific labels or prior knowledge. Our evaluations across text, vision models, and molecular modeling show that our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks such as classification, clustering, or regression. Additionally, we train and release state-of-the-art embedding models, enhancing downstream performance in various modalities.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

doi.org

openreview.net

Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

Jiashun Liu

Zihao Wu

Johan Obando-Ceron

Pablo Samuel Castro

Aaron Courville

Ling Pan

Deep reinforcement learning (RL) agents frequently suffer from neuronal activity loss, which impairs their ability to adapt to new data and … (voir plus)learn continually. A common method to quantify and address this issue is the tau-dormant neuron ratio, which uses activation statistics to measure the expressive ability of neurons. While effective for simple MLP-based agents, this approach loses statistical power in more complex architectures. To address this, we argue that in advanced RL agents, maintaining a neuron's learning capacity, its ability to adapt via gradient updates, is more critical than preserving its expressive ability. Based on this insight, we shift the statistical objective from activations to gradients, and introduce GraMa (Gradient Magnitude Neural Activity Metric), a lightweight, architecture-agnostic metric for quantifying neuron-level learning capacity. We show that GraMa effectively reveals persistent neuron inactivity across diverse architectures, including residual networks, diffusion models, and agents with varied activation functions. Moreover, resetting neurons guided by GraMa (ReGraMa) consistently improves learning performance across multiple deep RL algorithms and benchmarks, such as MuJoCo and the DeepMind Control Suite.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

doi.org

openreview.net

Meta-World+: An Improved, Standardized, RL Benchmark

Reginald McLean

Evangelos Chatzaroulas

Luc McCutcheon

Frank Röder

Tianhe Yu

Zhanpeng He

K.R. Zentner

Ryan Julian

Jordan Terry

Isaac Woungang

Nariman Farsad

Pablo Samuel Castro

Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills si… (voir plus)multaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release an open-source version of Meta-World that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.

2025-09-17

NeurIPS.cc/2025/Datasets_and_Benchmarks_Track (poster)

openreview.net

Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning

Ghada Sokar

Pablo Samuel Castro

2025-09-17

NeurIPS.cc/2025/Conference (poster)

doi.org

openreview.net

MiNT: Multi-Network Transfer Benchmark for Temporal Graph Learning

Kiarash Shamsi

Tran Gia Bao Ngo

Poupak Azad

Baris Coskunuzer

Cuneyt Gurcan Akcora

Temporal Graph Learning (TGL) aims to discover patterns in evolving networks or temporal graphs and leverage these patterns to predict futur… (voir plus)e interactions. However, most existing research focuses on learning from a single network in isolation, leaving the challenges of within-domain and cross-domain generalization largely unaddressed. In this study, we introduce a new benchmark of 84 real-world temporal transaction networks and propose **Temporal Multi-network Transfer (MiNT)**, a pre-training framework designed to capture transferable temporal dynamics across diverse networks. We train MiNT models on up to 64 transaction networks and evaluate their generalization ability on 20 held-out, unseen networks. Our results show that MiNT consistently outperforms individually trained models, revealing a strong relation between the number of pre-training networks and transfer performance. These findings highlight scaling trends in temporal graph learning and underscore the importance of network diversity in improving generalization. This work establishes the first large-scale benchmark for studying transferability in TGL and lays the groundwork for developing Temporal Graph Foundation Models. Our code is available at https://github.com/benjaminnNgo/ScalingTGNs

2025-09-17

Neural Information Processing Systems (poster)

openreview.net

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Sangmin Bae

Yujin Kim

Reza Bayat

Sungnyun Kim

Jiyoun Ha

Tal Schuster

Adam Fisch

Hrayr Harutyunyan

Ziwei Ji

Aaron Courville

Se-Young Yun

Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deploy… (voir plus)ment expensive. Existing efficiency efforts typically target either parameter sharing or adaptive computation, leaving open the question of how to attain both simultaneously. We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer. MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking by dynamically assigning different recursion depths to individual tokens. This allows MoR to focus quadratic attention computation only among tokens still active at a given recursion depth, further improving memory access efficiency by selectively caching only their key-value pairs. Beyond these core mechanisms, we also propose a KV sharing variant that reuses KV pairs from the first recursion, specifically designed to further decrease memory footprint. Across model scales ranging from 135M to 1.7B parameters, MoR forms a new Pareto frontier: at equal training FLOPs and smaller model sizes, it significantly lowers validation perplexity and improves few-shot accuracy, while delivering higher throughput compared with vanilla and existing recursive baselines. These gains demonstrate that MoR is an effective path towards large-model quality without incurring large-model cost.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

doi.org

openreview.net

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Publications

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Mots-clés populaires:

Publications