Publications

Know Thyself by Knowing Others: Learning Neuron Identity from Population Context
Vinam Arora
Ian J. Knight
Mehdi Azabou
Cole Hurwitz
Joshua H. Siegle
Eva L. Dyer
Identifying the functional identity of individual neurons is essential for interpreting circuit dynamics, yet it remains a major challenge i… (see more)n large-scale _in vivo_ recordings where anatomical and molecular labels are often unavailable. Here we introduce NuCLR, a self-supervised framework that learns context-aware representations of neuron identity by modeling each neuron's role within the broader population. NuCLR employs a spatio-temporal transformer that captures both within-neuron dynamics and across-neuron interactions. It is trained with a sample-wise contrastive objective that encourages temporally-stable and discriminative embeddings. Across multiple open-access datasets, NuCLR outperforms prior methods in both cell type and brain region classification. Critically, it exhibits strong zero-shot generalization to entirely new populations, without any retraining or access to stimulus labels. Furthermore, we demonstrate that our framework scales effectively with data size. Overall, our results demonstrate that modeling population context is crucial for understanding neuron identity and that rich signal for cell-typing and neuron localization is present in neural activity alone.Code available at: https://github.com/nerdslab/nuclr.
Learning to Solve Complex Problems via Dataset Decomposition
WANRU ZHAO
Lucas Caccia
Zhengyan Shi
Minseon Kim
Weijia Xu
Xingdi Yuan
Learning Task-Agnostic Representations through Multi-Teacher Distillation
Eric Granger
Jackie CK Cheung
Ismail Ben Ayed
Mohammadhadi Shateri
Casting complex inputs into tractable representations is a critical step across various fields. Diverse embedding models emerge from differe… (see more)nces in architectures, loss functions, input modalities and datasets, each capturing unique aspects of the input. Multi-teacher distillation leverages this diversity to enrich representations but often remains tailored to specific tasks. In this paper, we introduce a task-agnostic framework based on a ``majority vote" objective function. We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings, leading to a task-agnostic distillation loss that eliminates dependence on task-specific labels or prior knowledge. Our evaluations across text, vision models, and molecular modeling show that our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks such as classification, clustering, or regression. Additionally, we train and release state-of-the-art embedding models, enhancing downstream performance in various modalities.
Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning
Jiashun Liu
Zihao Wu
Johan Obando-Ceron
Deep reinforcement learning (RL) agents frequently suffer from neuronal activity loss, which impairs their ability to adapt to new data and … (see more)learn continually. A common method to quantify and address this issue is the tau-dormant neuron ratio, which uses activation statistics to measure the expressive ability of neurons. While effective for simple MLP-based agents, this approach loses statistical power in more complex architectures. To address this, we argue that in advanced RL agents, maintaining a neuron's learning capacity, its ability to adapt via gradient updates, is more critical than preserving its expressive ability. Based on this insight, we shift the statistical objective from activations to gradients, and introduce GraMa (Gradient Magnitude Neural Activity Metric), a lightweight, architecture-agnostic metric for quantifying neuron-level learning capacity. We show that GraMa effectively reveals persistent neuron inactivity across diverse architectures, including residual networks, diffusion models, and agents with varied activation functions. Moreover, resetting neurons guided by GraMa (ReGraMa) consistently improves learning performance across multiple deep RL algorithms and benchmarks, such as MuJoCo and the DeepMind Control Suite.
Meta-World+: An Improved, Standardized, RL Benchmark
Reginald McLean
Evangelos Chatzaroulas
Luc McCutcheon
Frank Röder
Tianhe Yu
Zhanpeng He
K.R. Zentner
Ryan Julian
Jordan Terry
Isaac Woungang
Nariman Farsad
Meta-World is widely used for evaluating multi-task and meta-reinforcement learning agents, which are challenged to master diverse skills si… (see more)multaneously. Since its introduction however, there have been numerous undocumented changes which inhibit a fair comparison of algorithms. This work strives to disambiguate these results from the literature, while also leveraging the past versions of Meta-World to provide insights into multi-task and meta-reinforcement learning benchmark design. Through this process we release an open-source version of Meta-World that has full reproducibility of past results, is more technically ergonomic, and gives users more control over the tasks that are included in a task set.
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Ghada Sokar
MiNT: Multi-Network Transfer Benchmark for Temporal Graph Learning
Kiarash Shamsi
Tran Gia Bao Ngo
Poupak Azad
Baris Coskunuzer
Cuneyt Gurcan Akcora
Temporal Graph Learning (TGL) aims to discover patterns in evolving networks or temporal graphs and leverage these patterns to predict futur… (see more)e interactions. However, most existing research focuses on learning from a single network in isolation, leaving the challenges of within-domain and cross-domain generalization largely unaddressed. In this study, we introduce a new benchmark of 84 real-world temporal transaction networks and propose **Temporal Multi-network Transfer (MiNT)**, a pre-training framework designed to capture transferable temporal dynamics across diverse networks. We train MiNT models on up to 64 transaction networks and evaluate their generalization ability on 20 held-out, unseen networks. Our results show that MiNT consistently outperforms individually trained models, revealing a strong relation between the number of pre-training networks and transfer performance. These findings highlight scaling trends in temporal graph learning and underscore the importance of network diversity in improving generalization. This work establishes the first large-scale benchmark for studying transferability in TGL and lays the groundwork for developing Temporal Graph Foundation Models. Our code is available at https://github.com/benjaminnNgo/ScalingTGNs
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Sangmin Bae
Yujin Kim
Sungnyun Kim
Jiyoun Ha
Tal Schuster
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Se-Young Yun
Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deploy… (see more)ment expensive. Existing efficiency efforts typically target either parameter sharing or adaptive computation, leaving open the question of how to attain both simultaneously. We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer. MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking by dynamically assigning different recursion depths to individual tokens. This allows MoR to focus quadratic attention computation only among tokens still active at a given recursion depth, further improving memory access efficiency by selectively caching only their key-value pairs. Beyond these core mechanisms, we also propose a KV sharing variant that reuses KV pairs from the first recursion, specifically designed to further decrease memory footprint. Across model scales ranging from 135M to 1.7B parameters, MoR forms a new Pareto frontier: at equal training FLOPs and smaller model sizes, it significantly lowers validation perplexity and improves few-shot accuracy, while delivering higher throughput compared with vanilla and existing recursive baselines. These gains demonstrate that MoR is an effective path towards large-model quality without incurring large-model cost.
NAVIX: Scaling Minigrid Environments with JAX
Eduardo Pignatelli
Jarek Liesen
Robert Tjarko Lange
Chris Lu
Laura Toni
As Deep Reinforcement Learning (Deep RL) research moves towards solving large-scale worlds, efficient environment simulations become crucial… (see more) for rapid experimentation. However, most existing environments struggle to scale to high throughput, setting back meaningful progress. Interactions are typically computed on the CPU, limiting training speed and throughput, due to slower computation and communication overhead when distributing the task across multiple machines. Ultimately, Deep RL training is CPU-bound, and developing batched, fast, and scalable environments has become a frontier for progress. Among the most used Reinforcement Learning (RL) environments, MiniGrid is at the foundation of several studies on exploration, curriculum learning, representation learning, diversity, meta-learning, credit assignment, and language-conditioned RL, and still suffers from the limitations described above. In this work, we introduce NAVIX, a re-implementation of MiniGrid in JAX. NAVIX achieves over
Object-centric Binding in Contrastive Language-Image Pretraining
Pietro Astolfi
Adriana Romero-Soriano
Recent advances in vision language models (VLM) have been driven by contrastive models such as CLIP, which learn to associate visual informa… (see more)tion with their corresponding text descriptions. However, these models have limitations in understanding complex compositional scenes involving multiple objects and their spatial relationships. To address these challenges, we propose a novel approach that diverges from commonly used strategies, which rely on the design of hard-negative augmentations. Instead, our work focuses on integrating inductive biases into pre-trained CLIP-like models to improve their compositional understanding without using any additional hard-negatives. To that end, we introduce a binding module that connects a scene graph, derived from a text description, with a slot-structured image representation, facilitating a structured similarity assessment between the two modalities. We also leverage relationships as text-conditioned visual constraints, thereby capturing the intricate interactions between objects and their contextual relationships more effectively. Our resulting model not only enhances the performance of CLIP-based models in multi-object compositional understanding but also paves the way towards more accurate and sample-efficient image-text matching of complex scenes.
Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring
Nico Lang
B. Christian Schmidt
Yves Basset
Sara Beery
Maxim Larrivée
Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are … (see more)changing. Indeed, some 90% of Earth's species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training -- the problem of open-set recognition (OSR) -- limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.
OpenLex3D: A Tiered Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
Christina Kassab
Martin Büchner
Matias Mattamala
Abhinav Valada
Maurice Fallon
3D scene understanding has been transformed by open-vocabulary language models that enable interaction via natural language. However, at pre… (see more)sent the evaluation of these representations is limited to datasets with closed-set semantics that do not capture the richness of language. This work presents OpenLex3D, a dedicated benchmark for evaluating 3D open-vocabulary scene representations. OpenLex3D provides entirely new label annotations for scenes from Replica, ScanNet++, and HM3D, which capture real-world linguistic variability by introducing synonymical object categories and additional nuanced descriptions. Our label sets provide 13 times more labels per scene than the original datasets. By introducing an open-set 3D semantic segmentation task and an object retrieval task, we evaluate various existing 3D open-vocabulary methods on OpenLex3D, showcasing failure cases, and avenues for improvement. Our experiments provide insights on feature precision, segmentation, and downstream capabilities. The benchmark is publicly available at: https://openlex3d.github.io/.