Publications

Kurtosis-Guided Denoising Score Matching for Tabular Anomaly Detection
Denoising score matching (DSM) provides a way to learn data distributions by training a neural network to recover the score function, define… (see more)d as the gradient of the log density, from noise-corrupted samples. Once trained, the score magnitude at a test point reflects how consistent that point is with the learned distribution, making it a natural anomaly signal. The key practical challenge is selecting the perturbation scale: too little noise yields unstable score estimates in sparse regions, while too much erases local structure and weakens anomaly sensitivity. This is compounded by the difficulty of hyperparameter tuning when anomalies are unknown and no validation set is available. We introduce kurtosis-based noise scaling (K-DSM), a per-feature scheme that sets noise levels from the shape of each marginal distribution, improving coverage of low-density regions and precision in high-density regions without extra model complexity. Contrary to prior claims that multi-scale or noise-conditioned training is necessary, we find that a carefully trained single-scale model is already a strong anomaly detector. On standard tabular anomaly detection benchmarks, K-DSM achieves state-of-the-art performance in the semi-supervised setting. When combined with a lightweight EMA-teacher filtering rule that removes low-density training points before each gradient step, it also achieves strong performance in the fully unsupervised (contaminated) setting, suggesting that simple, data-adaptive noise scaling enables robust anomaly detection while reducing reliance on hyperparameter tuning.
Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization
Tatsuhiro Nakamori
Laura Gomezjurado Gonzalez
Ganesh Talluri
Ansh Tiwari
Hideyuki Kawashima
Low-rank gradient compression reduces communication in distributed training by representing updates with rank-…
Revisiting Adam for Streaming Reinforcement Learning
Florin Gogianu
Adrian Catalin Lutu
Learning from a sequence of interactions, as soon as observations are perceived and acted upon, without explicitly storing them, holds the p… (see more)romise of simpler, more efficient and adaptive algorithms. For over a decade, however, deep reinforcement learning walked the contrary path, augmenting agents with replay buffers or parallel sampling routines, in an effort to tame learning instability. Recently, this topic has been revisited by Elsayed et al. (2024), focusing on update computation through eligibility traces and modifications to the optimisation routine, resulting in the StreamQ algorithm. In this work we take a step back, investigating the efficacy of established updates, such as those implemented by DQN and C51 within this online setting. Not only do we find that they perform well, but through analysing how the optimisation algorithm generally, and Adam in particular, interacts with these updates, we contend that two properties are essential for robust performance: i) the derivative of the objective is to be bounded and ii) weight updates are variance-adjusted. Rigorous and exhaustive experimentation demonstrates that C51, which exhibits both characteristics, is competitive with StreamQ across a subset of 55 Atari games. Using these insights, we derive a variance-adjusted algorithm based on eligibility traces, termed Adaptive Q
Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization
Andy Zeyi Liu
John Sous
Training loss and throughput can hide distinct internal representation in language-model training. To examine these hidden mechanics, we use… (see more) spectral measurements as practical and operational diagnostics. Using a controlled family of decoder-only models adapted from the modded NanoGPT codebase, we introduce an empirical protocol based on activation covariance and per-sample gradient SVD spectra. This dual-view reveals three empirical findings and one mechanistic explanation. First, batch size acts as a latent determinant of representation geometry: runs that reach equal loss settle into systematically distinct activation spectra. Second, the activation covariance tail measured early in training reliably forecasts downstream token efficiency. Third, movement of the activation spectrum head (leading modes), together with gradient spectra, characterizes underlying learning-dynamics changes, separating learning-side architectural improvements from primarily execution-side gains. These predictive and diagnostic signals persist across the 12-, 36-, and 48-layer model tiers. Finally, a mechanistic model proves the main observations and explains how activation covariance spectra correlate with task-aligned feature learning.
No Triangulation Without Representation: Generalization in Topological Deep Learning
Johannes S. Schmidt
Martin Carrasco
Ernst Röell
Nello Blaser
Bastian Rieck
Despite an ever-increasing interest in topological deep learning models that target higher-order datasets, there is no consensus on how to e… (see more)valuate such models. This is exacerbated by the fact that topological objects permit operations, such as structural refinements, that are not appropriate for graph data. In this work, we extend MANTRA, a benchmark dataset containing manifold triangulations, to a larger class of manifolds with more diverse homeomorphism types. We show that, unlike prior claims, both graph neural networks (GNNs) and higher-order message passing (HOMP) methods can saturate the benchmark. However, we find that this is contingent on the right representation and feature assignment, emphasizing their importance in baseline models. We thus provide a novel evaluation protocol based on representational diversity and triangulation refinement. Surprisingly, we find no indication that existing models are capable of generalizing beyond the combinatorial structure of the data. This points towards a research gap in developing models that understand topological structure independent of scale. Our work thus provides the necessary scaffolding to evaluate future models and enable the development of topology-aware inductive biases.
Copy number variants reveal divergent genetic and diagnostic cortical signatures across psychiatric disorders
Kuldeep Kumar
Zhijie Liao
Clara Moreau
Christopher Ching
Claudia Modenato
Will Snyder
Sayeh Kazem
Charles-Olivier Martin
Anne-Marie Bélanger
Valerie Fontaine
Khadije Jizi
Rune Boen
Leila Kushan
Ana Silva
Marianne van den Bree
David Linden
Michael Owen
Jeremy Hall … (see 14 more)
Sarah Lippé
Bodgan Draganski
Laura Almasy
Sophia Thomopoulos
Neda Jahanshad
Ida Sønderby
Ole Andreassen
David Glahn
Armin Raznahan
Carrie Bearden
Tomáš Paus
Paul Thompson
Sébastien Jacquemont
Decision Problems in Multilevel Linear Programming
We study the computational complexity of decision problems in …
Dissecting and steering cell dynamics using spatially-informed RNA velocity with veloAgent
Brent Yoon
Gregory J Fonseca
RNA velocity enables inference of cell state transitions from single-cell transcriptomics by modeling transcriptional dynamics from spliced … (see more)and unspliced mRNA. However, existing methods overlook spatial context and struggle to scale to large datasets, limiting insights into tissue organization and dynamic processes. We introduce veloAgent, a deep generative and agent-based framework that estimates gene- and cell-specific transcriptional kinetics while integrating spatial information through agent-based simulations of local microenvironments. By leveraging both molecular and spatial cues, veloAgent improves velocity accuracy and achieves sublinear memory scaling, enabling efficient analysis of large and multi-batch spatial datasets. A distinctive feature of veloAgent is its in silico perturbation module, which allows targeted manipulation of spatial velocity vectors to simulate regulatory interventions and predict their impact on cell fate dynamics. These capabilities position veloAgent as a scalable and versatile framework for dissecting spatially resolved cellular dynamics and guiding cell fate manipulation across diverse biological processes.
The utility of herbarium collections for genetic monitoring
Isaac Eckert
Lucas Eckert
Olivia Rahn
Cameron So
Simon Joly
Abstract Despite growing evidence of widespread genetic responses to anthropogenic activity, data shortfalls constrain genetic monitoring ef… (see more)forts and preclude the widespread use of genetic data to inform conservation. For flora, one option is to leverage the wealth of genetic material preserved in Earth’s vast herbarium collections, but the extent to which herbarium specimens can supply the population-level data required to monitor genetic change remains unclear. Using the Essential Biodiversity Variable (EBV) framework developed to monitor population-level genetic change, we show that digitized herbarium specimens could be used to quantify ∼162 K measures of genetic EBVs representing over 41 K species, 86% of regions on Earth, and spanning the past 250 years of global change. As such, we find that herbarium collections offer an invaluable source of historical genetic data, the mobilization of which could transform global efforts to monitor and conserve plant diversity.
Exploring Entropy-based Active Learning for Fair Brain Segmentation
Ghazal Danaee
Christian Desrosiers
Sylvain Bouix
Active learning (AL) has emerged as a crucial strategy for reducing the prohibitive costs associated with medical image segmentation. Howeve… (see more)r, standard uncertainty-based AL methods typically focus on maximizing performance metrics, ignoring performance disparities or fairness across groups with sensitive attributes. While fair active learning has been explored in classification tasks, its intersection with medical image segmentation remains unaddressed. In this work, we introduced a fairness-aware active learning framework with a Weighted Entropy selection strategy that modulates uncertainty based on current group-specific performance estimates on the labeled set. To decouple true epistemic uncertainty from anatomical volume variances, we further utilized a masked, scaled entropy restricted to the region of interest. The framework was evaluated on synthetic T1-weighted brain MRIs with controlled left caudate bias in both strong and weak bias settings. A 3D U-Net was trained to segment the left caudate under several AL strategies, starting from both demographically balanced and strongly imbalanced initial labeled sets. Experiments demonstrated that our method markedly reduces performance disparities between groups compared to random sampling and standard uncertainty sampling. By prioritizing poorly segmented subgroups during the AL cycles, our method consistently achieved the highest equity-scaled performance and reduced the disparity metric by 75% (strong bias) and 86% (weak bias) relative to standard entropy at the final budget. Overall, this work is among the first studies on fair AL for medical image segmentation, offering an efficient strategy to train more equitable models in resource-constrained environments.
One Sequence to Segment Them All: Efficient Data Augmentation for CT and MRI Cross-Domain 3D Spine Segmentation
Hendrik Möller
Anna Curto-Vilalta
Robert Graf
Matan Atad
Daniel Rueckert
Jan S. Kirschke
Deep learning-based medical image segmentation is increasingly used to support clinical diagnosis and develop new treatment strategies. Howe… (see more)ver, model performance remains limited by the scarcity of high-quality annotated data and insufficient generalization across imaging protocols. This limitation is particularly evident in MRI and CT, where models are typically trained on a single acquisition sequence and exhibit reduced robustness when applied to unseen sequences or contrasts. Although data augmentation is widely used to improve general robustness on medical images, its impact on cross-modality generalization has not been quantitatively explored. In this work, we study a targeted set of data augmentation techniques designed to improve cross-modality transfer. We train three spine segmentation models, each on a single-modality/sequence dataset, and evaluate them across seven out-of-distribution datasets (spanning CT and MRI), reflecting a realistic single-sequence training and multi-sequence/contrast/modality deployment scenario. Our results demonstrate substantial performance gains on unseen domains (average Dice gain of 155 %) while preserving in-domain accuracy (average Dice decrease of 0.008 %), including effective transfer between CT and MRI. To mitigate the computational cost typically associated with strong data augmentation, we implement GPU-optimized augmentations that maintain, and even improve, training efficiency by approximately 10 %. We release our approach as an open-source toolbox, enabling seamless integration into commonly used frameworks such as nnUNet and MONAI. These augmentations significantly enhance robustness to heterogeneous clinical imaging scenarios without compromising training speed.
Cycles upon cycles - Temperature Scaling of Medaka Development
Sapna Chhabra
Carina B. Vibe
Anubhuti Anushree
Kristina S. Stapornwongkul
Thomas Thumberger
Joachim Wittbrodt
Alexander Aulehla
ABSTRACT How organisms develop in dynamic environmental conditions is a fundamental question. We asked how day-night temperature cycles impa… (see more)ct embryonic axis elongation and segmentation, itself a cyclic process linked to the segmentation clock, using the Japanese rice fish medaka. We developed an unbiased dimensional reduction approach, based on Singular Value Decomposition (SVD), to reliably identify the dynamic modes of segmentation clock oscillations across all temperature conditions. We reveal that the two major dynamic modes show opposite temperature sensitivities: while the temporal oscillation (mode 1) varies strongly with temperature, the spatial phase gradient (mode 2) appears largely temperature invariant. In addition, we found developmental parameters with intermediate, sub-scaled temperature responses, such as axis elongation. We used theoretical modeling to understand how dynamic modes emerge from the underlying local oscillation dynamics and axis elongation. We then exposed embryos to circadian and ultradian temperature cycles to reveal dynamic response patterns of oscillations and axis elongation, and found how these responses are integrated into morphological features. Combined, our theoretical-experimental results support a model in which the dynamic integration of temporal (i.e. segmentation clock related) and spatial (i.e. axis elongation) processes, in particular their sub-scaled temperature response patterns, quantitatively compensate each other to yield a robust, temperature-invariant axis patterning outcome.