Publications

Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection

Denoising score matching (DSM) provides a way to learn data distributions by training a neural network to recover the score function, define… (see more)d as the gradient of the log density, from noise-corrupted samples. Once trained, the score magnitude at a test point reflects how consistent that point is with the learned distribution, making it a natural anomaly signal. The key practical challenge is selecting the perturbation scale: too little noise yields unstable score estimates in sparse regions, while too much erases local structure and weakens anomaly sensitivity. This is compounded by the difficulty of hyperparameter tuning when anomalies are unknown and no validation set is available. We introduce kurtosis-based noise scaling (K-DSM), a per-feature scheme that sets noise levels from the shape of each marginal distribution, improving coverage of low-density regions and precision in high-density regions without extra model complexity. Contrary to prior claims that multi-scale or noise-conditioned training is necessary, we find that a carefully trained single-scale model is already a strong anomaly detector. On standard tabular anomaly detection benchmarks, K-DSM achieves state-of-the-art performance in the semi-supervised setting. When combined with a lightweight EMA-teacher filtering rule that removes low-density training points before each gradient step, it also achieves strong performance in the fully unsupervised (contaminated) setting, suggesting that simple, data-adaptive noise scaling enables robust anomaly detection while reducing reliance on hyperparameter tuning.

2026-05-06

arXiv (preprint)

doi.org

arxiv.org

Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization

Tatsuhiro Nakamori

Laura Gomezjurado Gonzalez

Ganesh Talluri

Ansh Tiwari

Hideyuki Kawashima

Ioannis Mitliagkas

Guillaume Rabusseau

Hiroki Naganuma

Low-rank gradient compression reduces communication in distributed training by representing updates with rank-…

2026-05-06

arXiv (preprint)

doi.org

arxiv.org

Revisiting Adam for Streaming Reinforcement Learning

Florin Gogianu

Adrian Catalin Lutu

Razvan Pascanu

Learning from a sequence of interactions, as soon as observations are perceived and acted upon, without explicitly storing them, holds the p… (see more)romise of simpler, more efficient and adaptive algorithms. For over a decade, however, deep reinforcement learning walked the contrary path, augmenting agents with replay buffers or parallel sampling routines, in an effort to tame learning instability. Recently, this topic has been revisited by Elsayed et al. (2024), focusing on update computation through eligibility traces and modifications to the optimisation routine, resulting in the StreamQ algorithm. In this work we take a step back, investigating the efficacy of established updates, such as those implemented by DQN and C51 within this online setting. Not only do we find that they perform well, but through analysing how the optimisation algorithm generally, and Adam in particular, interacts with these updates, we contend that two properties are essential for robust performance: i) the derivative of the objective is to be bounded and ii) weight updates are variance-adjusted. Rigorous and exhaustive experimentation demonstrates that C51, which exhibits both characteristics, is competitive with StreamQ across a subset of 55 Atari games. Using these insights, we derive a variance-adjusted algorithm based on eligibility traces, termed Adaptive Q

2026-05-06

arXiv (preprint)

doi.org

arxiv.org

Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization

Andy Zeyi Liu

Elliot Paquette

John Sous

Training loss and throughput can hide distinct internal representation in language-model training. To examine these hidden mechanics, we use… (see more) spectral measurements as practical and operational diagnostics. Using a controlled family of decoder-only models adapted from the modded NanoGPT codebase, we introduce an empirical protocol based on activation covariance and per-sample gradient SVD spectra. This dual-view reveals three empirical findings and one mechanistic explanation. First, batch size acts as a latent determinant of representation geometry: runs that reach equal loss settle into systematically distinct activation spectra. Second, the activation covariance tail measured early in training reliably forecasts downstream token efficiency. Third, movement of the activation spectrum head (leading modes), together with gradient spectra, characterizes underlying learning-dynamics changes, separating learning-side architectural improvements from primarily execution-side gains. These predictive and diagnostic signals persist across the 12-, 36-, and 48-layer model tiers. Finally, a mechanistic model proves the main observations and explains how activation covariance spectra correlate with task-aligned feature learning.

2026-05-06

arXiv (preprint)

doi.org

arxiv.org

No Triangulation Without Representation: Generalization in Topological Deep Learning

Johannes S. Schmidt

Martin Carrasco

Ernst Röell

Guy Wolf

Nello Blaser

Bastian Rieck

Despite an ever-increasing interest in topological deep learning models that target higher-order datasets, there is no consensus on how to e… (see more)valuate such models. This is exacerbated by the fact that topological objects permit operations, such as structural refinements, that are not appropriate for graph data. In this work, we extend MANTRA, a benchmark dataset containing manifold triangulations, to a larger class of manifolds with more diverse homeomorphism types. We show that, unlike prior claims, both graph neural networks (GNNs) and higher-order message passing (HOMP) methods can saturate the benchmark. However, we find that this is contingent on the right representation and feature assignment, emphasizing their importance in baseline models. We thus provide a novel evaluation protocol based on representational diversity and triangulation refinement. Surprisingly, we find no indication that existing models are capable of generalizing beyond the combinatorial structure of the data. This points towards a research gap in developing models that understand topological structure independent of scale. Our work thus provides the necessary scaffolding to evaluate future models and enable the development of topology-aware inductive biases.

2026-05-06

arXiv (preprint)

doi.org

arxiv.org

Copy number variants reveal divergent genetic and diagnostic cortical signatures across psychiatric disorders

Kuldeep Kumar

Zhijie Liao

Jakub Kopal

Clara Moreau

Christopher Ching

Claudia Modenato

Will Snyder

Sayeh Kazem

Charles-Olivier Martin

Anne-Marie Bélanger

Valerie Fontaine

Khadije Jizi

Guillaume Huguet

Rune Boen

Leila Kushan

Ana Silva

Marianne van den Bree

David Linden

Michael Owen

Jeremy Hall … (see 14 more)

Sarah Lippé

Guillaume Dumas

Bodgan Draganski

Laura Almasy

Sophia Thomopoulos

Neda Jahanshad

Ida Sønderby

Ole Andreassen

David Glahn

Armin Raznahan

Carrie Bearden

Tomáš Paus

Paul Thompson

Sébastien Jacquemont

2026-05-05

Research Square (accepted)

doi.org

Decision Problems in Multilevel Linear Programming

Nagisa Sugishita

Margarida Carvalho

We study the computational complexity of decision problems in …

2026-05-05

arXiv (preprint)

doi.org

arxiv.org

Dissecting and steering cell dynamics using spatially-informed RNA velocity with veloAgent

Vishvak Raghavan

Brent Yoon

Gregory J Fonseca

Yue Li

Jun Ding

RNA velocity enables inference of cell state transitions from single-cell transcriptomics by modeling transcriptional dynamics from spliced … (see more)and unspliced mRNA. However, existing methods overlook spatial context and struggle to scale to large datasets, limiting insights into tissue organization and dynamic processes. We introduce veloAgent, a deep generative and agent-based framework that estimates gene- and cell-specific transcriptional kinetics while integrating spatial information through agent-based simulations of local microenvironments. By leveraging both molecular and spatial cues, veloAgent improves velocity accuracy and achieves sublinear memory scaling, enabling efficient analysis of large and multi-batch spatial datasets. A distinctive feature of veloAgent is its in silico perturbation module, which allows targeted manipulation of spatial velocity vectors to simulate regulatory interventions and predict their impact on cell fate dynamics. These capabilities position veloAgent as a scalable and versatile framework for dissecting spatially resolved cellular dynamics and guiding cell fate manipulation across diverse biological processes.

2026-05-05

Molecular Systems Biology (published)

doi.org

The utility of herbarium collections for genetic monitoring

Isaac Eckert

Lucas Eckert

Olivia Rahn

Cameron So

Simon Joly

Laura J. Pollock

Abstract Despite growing evidence of widespread genetic responses to anthropogenic activity, data shortfalls constrain genetic monitoring ef… (see more)forts and preclude the widespread use of genetic data to inform conservation. For flora, one option is to leverage the wealth of genetic material preserved in Earth’s vast herbarium collections, but the extent to which herbarium specimens can supply the population-level data required to monitor genetic change remains unclear. Using the Essential Biodiversity Variable (EBV) framework developed to monitor population-level genetic change, we show that digitized herbarium specimens could be used to quantify ∼162 K measures of genetic EBVs representing over 41 K species, 86% of regions on Earth, and spanning the past 250 years of global change. As such, we find that herbarium collections offer an invaluable source of historical genetic data, the mobilization of which could transform global efforts to monitor and conserve plant diversity.

2026-05-05

BioScience (published)

doi.org

Exploring Entropy-based Active Learning for Fair Brain Segmentation

Ghazal Danaee

Melanie Gaillochet

Christian Desrosiers

Hervé Lombaert

Sylvain Bouix

Active learning (AL) has emerged as a crucial strategy for reducing the prohibitive costs associated with medical image segmentation. Howeve… (see more)r, standard uncertainty-based AL methods typically focus on maximizing performance metrics, ignoring performance disparities or fairness across groups with sensitive attributes. While fair active learning has been explored in classification tasks, its intersection with medical image segmentation remains unaddressed. In this work, we introduced a fairness-aware active learning framework with a Weighted Entropy selection strategy that modulates uncertainty based on current group-specific performance estimates on the labeled set. To decouple true epistemic uncertainty from anatomical volume variances, we further utilized a masked, scaled entropy restricted to the region of interest. The framework was evaluated on synthetic T1-weighted brain MRIs with controlled left caudate bias in both strong and weak bias settings. A 3D U-Net was trained to segment the left caudate under several AL strategies, starting from both demographically balanced and strongly imbalanced initial labeled sets. Experiments demonstrated that our method markedly reduces performance disparities between groups compared to random sampling and standard uncertainty sampling. By prioritizing poorly segmented subgroups during the AL cycles, our method consistently achieved the highest equity-scaled performance and reduced the disparity metric by 75% (strong bias) and 86% (weak bias) relative to standard entropy at the final budget. Overall, this work is among the first studies on fair AL for medical image segmentation, offering an efficient strategy to train more equitable models in resource-constrained environments.

2026-05-04

Medical Imaging with Deep Learning (published)

doi.org

proceedings.mlr.press

One Sequence to Segment Them All: Efficient Data Augmentation for CT and MRI Cross-Domain 3D Spine Segmentation

Nathan Molinier

Hendrik Möller

Thomas Dagonneau

Anna Curto-Vilalta

Robert Graf

Matan Atad

Daniel Rueckert

Jan S. Kirschke

Julien Cohen-Adad

Deep learning-based medical image segmentation is increasingly used to support clinical diagnosis and develop new treatment strategies. Howe… (see more)ver, model performance remains limited by the scarcity of high-quality annotated data and insufficient generalization across imaging protocols. This limitation is particularly evident in MRI and CT, where models are typically trained on a single acquisition sequence and exhibit reduced robustness when applied to unseen sequences or contrasts. Although data augmentation is widely used to improve general robustness on medical images, its impact on cross-modality generalization has not been quantitatively explored. In this work, we study a targeted set of data augmentation techniques designed to improve cross-modality transfer. We train three spine segmentation models, each on a single-modality/sequence dataset, and evaluate them across seven out-of-distribution datasets (spanning CT and MRI), reflecting a realistic single-sequence training and multi-sequence/contrast/modality deployment scenario. Our results demonstrate substantial performance gains on unseen domains (average Dice gain of 155 %) while preserving in-domain accuracy (average Dice decrease of 0.008 %), including effective transfer between CT and MRI. To mitigate the computational cost typically associated with strong data augmentation, we implement GPU-optimized augmentations that maintain, and even improve, training efficiency by approximately 10 %. We release our approach as an open-source toolbox, enabling seamless integration into commonly used frameworks such as nnUNet and MONAI. These augmentations significantly enhance robustness to heterogeneous clinical imaging scenarios without compromising training speed.

2026-05-03

arXiv (preprint)

doi.org

arxiv.org

Cycles upon cycles - Temperature Scaling of Medaka Development

Sapna Chhabra

Victoria Mochulska

Carina B. Vibe

Anubhuti Anushree

Kristina S. Stapornwongkul

Thomas Thumberger

Joachim Wittbrodt

Paul François

Alexander Aulehla

ABSTRACT How organisms develop in dynamic environmental conditions is a fundamental question. We asked how day-night temperature cycles impa… (see more)ct embryonic axis elongation and segmentation, itself a cyclic process linked to the segmentation clock, using the Japanese rice fish medaka. We developed an unbiased dimensional reduction approach, based on Singular Value Decomposition (SVD), to reliably identify the dynamic modes of segmentation clock oscillations across all temperature conditions. We reveal that the two major dynamic modes show opposite temperature sensitivities: while the temporal oscillation (mode 1) varies strongly with temperature, the spatial phase gradient (mode 2) appears largely temperature invariant. In addition, we found developmental parameters with intermediate, sub-scaled temperature responses, such as axis elongation. We used theoretical modeling to understand how dynamic modes emerge from the underlying local oscillation dynamics and axis elongation. We then exposed embryos to circadian and ultradian temperature cycles to reveal dynamic response patterns of oscillations and axis elongation, and found how these responses are integrated into morphological features. Combined, our theoretical-experimental results support a model in which the dynamic integration of temporal (i.e. segmentation clock related) and spatial (i.e. axis elongation) processes, in particular their sub-scaled temperature response patterns, quantitatively compensate each other to yield a robust, temperature-invariant axis patterning outcome.

2026-05-01

bioRxiv (preprint)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications