Justin Szeto

Meritxell Bach Cuadra

Ujjwal Baid

Bhakti Baheti

Jaume Banús

Kamil Barbierik

Christoph Brune … (voir 64 de plus)

步岩松

Baptiste Callard

Yuhan Chen

Cornelius Crijnen

Corentin Dancette

Peter Drotár

Prasad Dutande

Nils D. Forkert

Saurabh K. Garg†

Jakub Gazda

Matej Gazda

Benoît Gérin

Partha Ghosh

Weikang Gong

Pedro M. Gordaliza

Sam Hashemi

Tobias Heimann

Fucang Jia

Jiexin Jiang

Emily Kaczmarek

Chris Kang

Seung Kwan Kang

Mohammad Khazaei

Julien Khlaut

Petros Koutsouvelis

Jae Sung Lee

Yuchong Li

Mengye Lyu

Mingchen Ma

Anant Madabhushi

Klaus H. Maier-Hein

Pierre Manceron

Andrés Martínez Mora

Moona Mazher

Felix Meister

Nataliia Molchanova

Steven A. Niederer

Leonard Nürnberg

Jinah Park

Abdul Qayyum

Jonas Richiardi

Antoine Saporta

Branislav Setlak

Ning Shen

Constantin Ulrich

Puru Vaish

Vibujithan Vigneshwaran

Leroy Volmer

Zihao Wang

Siqi Wei

Anthony Winder

Jelmer M. Wolterink

Maxence Wynen

Chang YANG

Si Young Yie

Mostafa Mehdipour Ghazi

Akshay Pai

Espen Jimenez‐Solem

Sebastian Nørgaard Llambias

Mikael Boesen

Michael Eriksen Benros

Juan Eugenio Iglesias

Mads Nielsen

Clinical deployment of automated brain MRI analysis faces a fundamental challenge: clinical data is heterogeneous and noisy, and high-qualit… (voir plus)y labels are prohibitively costly to obtain. Self-supervised learning (SSL) can address this by leveraging the vast amounts of unlabeled data produced in clinical workflows to train robust \textit{foundation models} that adapt out-of-domain with minimal supervision. However, the development of foundation models for brain MRI has been limited by small pretraining datasets and in-domain benchmarking focused on high-quality, research-grade data. To address this gap, we organized the FOMO25 challenge as a satellite event at MICCAI 2025. FOMO25 provided participants with a large pretraining dataset, FOMO60K, and evaluated models on data sourced directly from clinical workflows in few-shot and out-of-domain settings. Tasks covered infarct classification, meningioma segmentation, and brain age regression, and considered both models trained on FOMO60K (method track) and any data (open track). Nineteen foundation models from sixteen teams were evaluated using a standardized containerized pipeline. Results show that (a) self-supervised pretraining improves generalization on clinical data under domain shift, with the strongest models trained \textit{out-of-domain} surpassing supervised baselines trained \textit{in-domain}. (b) No single pretraining objective benefits all tasks: MAE favors segmentation, hybrid reconstruction-contrastive objectives favor classification, and (c) strong performance was achieved by small pretrained models, and improvements from scaling model size and training duration did not yield reliable benefits.

2026-04-12

arXiv (prépublication)

Exposing and Mitigating Calibration Biases and Demographic Unfairness in MLLM Few-Shot In-Context Learning for Medical Image Classification

Xing Shen

Mingyang Li

Hengguan Huang

2025-09-19

Lecture Notes in Computer Science (publié)

Building a General SimCLR Self-Supervised Foundation Model Across Neurological Diseases to Advance 3D Brain MRI Diagnoses

3D structural Magnetic Resonance Imaging (MRI) brain scans are commonly acquired in clinical settings to monitor a wide range of neurologica… (voir plus)l conditions, including neurodegenerative disorders and stroke. While deep learning models have shown promising results analyzing 3D MRI across a number of brain imaging tasks, most are highly tailored for specific tasks with limited labeled data, and are not able to generalize across tasks and/or populations. The development of self-supervised learning (SSL) has enabled the creation of large medical foundation models that leverage diverse, unlabeled datasets ranging from healthy to diseased data, showing significant success in 2D medical imaging applications. However, even the very few foundation models for 3D brain MRI that have been developed remain limited in resolution, scope, or accessibility. In this work, we present a general, high-resolution SimCLR-based SSL foundation model for 3D brain structural MRI, pre-trained on 18,759 patients (44,958 scans) from 11 publicly available datasets spanning diverse neurological diseases. We compare our model to Masked Autoencoders (MAE), as well as two supervised baselines, on four diverse downstream prediction tasks in both in-distribution and out-of-distribution settings. Our fine-tuned SimCLR model outperforms all other models across all tasks. Notably, our model still achieves superior performance when fine-tuned using only 20% of labeled training samples for predicting Alzheimer's disease. We use publicly available code and data, and release our trained model at https://github.com/emilykaczmarek/3D-Neuro-SimCLR, contributing a broadly applicable and accessible foundation model for clinical brain MRI analysis.

2025-09-11

ArXiv (prépublication)

SSL-AD: Spatiotemporal Self-Supervised Learning for Generalizability and Adaptability Across Alzheimer's Prediction Tasks and Datasets

Alzheimer's disease is a progressive, neurodegenerative disorder that causes memory loss and cognitive decline. While there has been extensi… (voir plus)ve research in applying deep learning models to Alzheimer's prediction tasks, these models remain limited by lack of available labeled data, poor generalization across datasets, and inflexibility to varying numbers of input scans and time intervals between scans. In this study, we adapt three state-of-the-art temporal self-supervised learning (SSL) approaches for 3D brain MRI analysis, and add novel extensions designed to handle variable-length inputs and learn robust spatial features. We aggregate four publicly available datasets comprising 3,161 patients for pre-training, and show the performance of our model across multiple Alzheimer's prediction tasks including diagnosis classification, conversion detection, and future conversion prediction. Importantly, our SSL model implemented with temporal order prediction and contrastive learning outperforms supervised learning on six out of seven downstream tasks. It demonstrates adaptability and generalizability across tasks and number of input images with varying time intervals, highlighting its capacity for robust performance across clinical applications. We release our code and model publicly at https://github.com/emilykaczmarek/SSL-AD.

2025-09-11

ArXiv (prépublication)

Spatio-Temporal Conditional Diffusion Models for Forecasting Future Multiple Sclerosis Lesion Masks Conditioned on Treatments

Gian Mario Favero

Ge Ya Luo

Douglas Arnold

Christopher Pal

2025-08-08

ArXiv (prépublication)

Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis

Changjian Shui

Raghav Mehta

Douglas L. Arnold

2023-10-07

OpenReview (publié)

openreview.net

Rethinking Generalization: The Impact of Annotation Style on Medical Image Segmentation

Brennan Nichyporuk

Jillian Cardinell

Raghav Mehta

Jean-Pierre R. Falet

Douglas L. Arnold

Sotirios A. Tsaftaris

Generalization is an important attribute of machine learning models, particularly for those that are to be deployed in a medical context, wh… (voir plus)ere unreliable predictions can have real world consequences. While the failure of models to generalize across datasets is typically attributed to a mismatch in the data distributions, performance gaps are often a consequence of biases in the "ground-truth" label annotations. This is particularly important in the context of medical image segmentation of pathological structures (e.g. lesions), where the annotation process is much more subjective, and affected by a number underlying factors, including the annotation protocol, rater education/experience, and clinical aims, among others. In this paper, we show that modeling annotation biases, rather than ignoring them, poses a promising way of accounting for differences in annotation style across datasets. To this end, we propose a generalized conditioning framework to (1) learn and account for different annotation styles across multiple datasets using a single model, (2) identify similar annotation styles across different datasets in order to permit their effective aggregation, and (3) fine-tune a fully trained model to a new annotation style with just a few samples. Next, we present an image-conditioning approach to model annotation styles that correlate with specific image features, potentially enabling detection biases to be more easily identified.

2022-10-30

ArXiv (prépublication)

Cohort Bias Adaptation in Aggregated Datasets for Lesion Segmentation

Brennan Nichyporuk

Jillian Cardinell

Raghav Mehta

Sotirios Tsaftaris

Douglas L. Arnold

Many automatic machine learning models developed for focal pathology (e.g. lesions, tumours) detection and segmentation perform well, but do… (voir plus) not generalize as well to new patient cohorts, impeding their widespread adoption into real clinical contexts. One strategy to create a more diverse, generalizable training set is to naively pool datasets from different cohorts. Surprisingly, training on this \it{big data} does not necessarily increase, and may even reduce, overall performance and model generalizability, due to the existence of cohort biases that affect label distributions. In this paper, we propose a generalized affine conditioning framework to learn and account for cohort biases across multi-source datasets, which we call Source-Conditioned Instance Normalization (SCIN). Through extensive experimentation on three different, large scale, multi-scanner, multi-centre Multiple Sclerosis (MS) clinical trial MRI datasets, we show that our cohort bias adaptation method (1) improves performance of the network on pooled datasets relative to naively pooling datasets and (2) can quickly adapt to a new cohort by fine-tuning the instance normalization parameters, thus learning the new cohort bias with only 10 labelled samples.

2021-09-20

Domain Adaptation and Representation Transfer, and Affordable Healthcare and AI for Resource Diverse Global Health (publié)

Optimizing Operating Points for High Performance Lesion Detection and Segmentation Using Lesion Size Reweighting

Brennan Nichyporuk

Douglas Arnold

There are many clinical contexts which require accurate detection and segmentation of all focal pathologies (e.g. lesions, tumours) in patie… (voir plus)nt images. In cases where there are a mix of small and large lesions, standard binary cross entropy loss will result in better segmentation of large lesions at the expense of missing small ones. Adjusting the operating point to accurately detect all lesions generally leads to oversegmentation of large lesions. In this work, we propose a novel reweighing strategy to eliminate this performance gap, increasing small pathology detection performance while maintaining segmentation accuracy. We show that our reweighing strategy vastly outperforms competing strategies based on experiments on a large scale, multi-scanner, multi-center dataset of Multiple Sclerosis patient images.

2021-05-10

MIDL.io/2021/Conference/Short (poster)

openreview.net

Accounting for Variance in Machine Learning Benchmarks

Mirko Bronzi

Naz Sepah

Edward Raff

Kanika Madan

Vikram Voleti

Samira Ebrahimi Kahou

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the l… (voir plus)earning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.

2020-12-31

MLSys (publié)