Publications

A Multimodal and Multi-centric Head and Neck Cancer Dataset for Tumor Segmentation and Outcome Prediction
Numan Saeed
Salma Hassan
Shahad Hardan
Ahmed Aly
Darya Taratynova
Umair Nawaz
Ufaq Khan
Muhammad Ridzuan
Vincent Andrearczyk
Adrien Depeursinge
Yutong Xie
Thomas Eugene
Raphael Metz
M'elanie Dore
G. Delpon
V. Papineni
K. Wahid
Cem Dede
A. M. Ali
Carlos Sjogreen … (see 23 more)
Mohamed A. Naser
Clifton D Fuller
Valentin Oreiller
Mario Jreige
J. Prior
Catherine Cheze Le Rest
Olena Tankyevych
P. Decazes
Su Ruan
Stephanie Tanadini-Lang
Hesham M. Elhalawani
R. Abgral
R. Floch
K. Kerleguer
Ulrike Schick
M. Mauguen
D. Bourhis
J. Leclère
M'elanie Dore
Arman Rahmim
Mathieu Hatt
Mohammad Yaqub
We describe a publicly available multimodal dataset of annotated Positron Emission Tomography/Computed Tomography (PET/CT) studies for head … (see more)and neck cancer research. The dataset includes 1123 FDG-PET/CT studies from patients with histologically confirmed head and neck cancer, acquired from 10 international medical centers. All examinations consisted of co-registered PET/CT scans with varying acquisition protocols, reflecting real-world clinical diversity across institutions. Primary gross tumor volumes (GTVp) and involved lymph nodes (GTVn) were manually segmented by experienced radiation oncologists and radiologists following standardized guidelines and quality control measures. We provide anonymized NifTi files of all studies, along with expert-annotated segmentation masks, radiotherapy dose distribution for a subset of patients, and comprehensive clinical metadata. This metadata includes TNM staging, HPV status, demographics (age and gender), long-term follow-up outcomes, survival times, censoring indicators, and treatment information. We demonstrate how this dataset can be used for three key clinical tasks: automated tumor segmentation, recurrence-free survival prediction, and HPV status classification, providing benchmark results using state-of-the-art deep learning models, including UNet, SegResNet, and multimodal prognostic frameworks.
Scalable Option Learning in High-Throughput Environments
Mikael Henaff
Michael Matthews
Michael G. Rabbat
Hierarchical reinforcement learning (RL) has the potential to enable effective decision-making over long timescales. Existing approaches, wh… (see more)ile promising, have yet to realize the benefits of large-scale training. In this work, we identify and solve several key challenges in scaling hierarchical RL to high-throughput environments. We propose Scalable Option Learning (SOL), a highly scalable hierarchical RL algorithm which achieves a 25x higher throughput compared to existing hierarchical methods. We train our hierarchical agents using 20 billion frames of experience on the complex game of NetHack, significantly surpassing flat agents and demonstrating positive scaling trends. We also validate our algorithm on MiniHack and Mujoco environments, showcasing its general applicability. Our code is open sourced at github.com/facebookresearch/sol.
A Transparent and Generalizable Deep Learning Framework for Genomic Ancestry Prediction
Raphaël Poujol
Jean-Christophe Grenier
Julie G Hussin
1 Accurately capturing genetic ancestry is critical for ensuring reproducibility and fairness in genomic st… (see more)udies and downstream health research. This study aims to address the prediction of ancestry from genetic data using deep learning, with a focus on generalizability across datasets with diverse populations and on explainability to improve model transparency. We adapt the Diet Network, a deep learning architecture proven effective in handling high-dimensional data, to learn population ancestry from single nucleotide polymorphisms (SNPs) data using the populational Thousand Genomes Project dataset. Our results highlight the model’s ability to generalize to diverse populations in the CARTaGENE and Montreal Heart Institute biobanks and that predictions remain robust to high levels of missing SNPs. We show that, despite the lack of North African populations in the training dataset, the model learns latent representations that reflect meaningful population structure for North African individuals in the biobanks. To improve model transparency, we apply Saliency Maps, DeepLift, GradientShap and Integrated Gradients attribution techniques and evaluate their performance in identifying SNPs leveraged by the model. Using DeepLift, we show that model’s predictions are driven by population-specific signals consistent with those identified by traditional population genetics metrics. This work presents a generalizable and interpretable deep learning framework for genetic ancestry inference in large-scale biobanks with genetic data. By enabling more widespread genomic ancestry characterization in these cohorts, this study contributes practical tools for integrating genetic data into downstream biomedical applications, supporting more inclusive and equitable healthcare solutions.
Assessing the exposure of buildings to long-term sea level rise across the Global South
M. Willard-Stepan
N. Gomez
Jeffrey A. Cardille
E. D. Galbraith
E. M. Bennett
Distributed Combined Space Partitioning and Network Flow Optimization: an Optimal Transport Approach (Extended Version)
Théo Laurentin
Patrick Coirault
Emmanuel Moulay
Jerome Le Ny
Aperiodic and Periodic EEG Component Lifespan Trajectories: Monotonic Decrease versus Growth-then-Decline
Min Li
Ying Wang
Yaqi Chen
Adrien E. E. Dubois
Gangyong Jia
Ying Wang
Maria L. Bringas-Vega
Pedro A. Valdes-Sosab
1.1 Unraveling the lifespan trajectories of human brain development is critical for understanding brain health and … (see more)disease. Recent research demonstrates that electroencephalography signals are composed of periodic and aperiodic components reflecting distinct physiological substrates. This dissociation raises the possibility that they follow different developmental tendencies. Here, we delineate the lifespan trajectories of aperiodic and periodic neural oscillations using a large international cohort (N=1,563, ages 5–95, resting state, eyes closed). We reveal two fundamental developmental patterns: a Monotonic decrease in aperiodic activity and a Growth-and-Decline pattern for periodic activity. Both components have inflections around age 20 and transition to a stable senescent phase around age 40. Spatially, anterior regions mainly exhibit aperiodic activity, while periodic activity concentrate on posterior regions and these patterns remain stable throughout life. Crucially, multimodal analysis shows these trajectories map onto distinct biological substrates. The periodic component’s Growth and Decline trajectory aligns with GABAergic function and myelination. In contrast, the monotonically decreasing trajectory of aperiodic activity mirrors fundamental biomarkers of biological aging, such as DNA methylation and telomere length. Transforming age to a logarithmic scale simplifies these nonlinear trajectories into a linear decreasing and a piecewise concave linear model for aperiodic and periodic components. This form provides a robust and parsimonious framework for quantifying maturation and identifying neurological deviations. We delineate distinct lifespan trajectories of aperiodic and periodic neural activity in a large-scale international cohort (N=1,563, ages 5–95). Aperiodic activity undergoes a Monotonic Decrease with age. In contrast, periodic activity follows a Growth-then-Decline trajectory, peaking in early adulthood. Both trajectories feature a critical transition around age 20 and stabilize into a protracted senescent phase from approximately 40 onward. These neural trajectories map onto distinct biological substrates: periodic activity tracks integrative functions (myelination, GABAergic, and aperiodic decline mirrors fundamental aging processes (DNA methylation). A stable pattern observed throughout the lifespan is the spatial segregation of neural activity, where aperiodic signals are dominant in anterior regions and periodic signals are concentrated in posterior ones. Logarithmically transforming age linearized the developmental trajectories, yielding a monotonic decline for the aperiodic component and a concave piecewise for the periodic one. This process establishes robust linear norms for the personalized assessment of brain dysfunction.
R3Mem: Bridging Memory Retention and Retrieval via Reversible Compression.
Xiaoqiang Wang 0007
Yun Zhu
Predictive Performance Precision Analysis in Medicine: Identification of low-confidence predictions at patient and profile levels (MED3pa I)
Félix Camirand Lemyre
Jean-François Ethier
Lyna Hiba Chikouche
Ludmila Amriou
Artificial Intelligence models are increasingly used in healthcare, yet global performance metrics can mask variations in reliability across… (see more) individual patients or subgroups with shared attributes, called patient profiles . This study introduces MED3pa, a method that identifies when models are less reliable, allowing clinicians to better assess model limitations. We propose a framework that estimates predictive confidence using three combined approaches: Individualized (IPC), Aggregated (APC), and Mixed Predictive Confidence (MPC). IPC estimates confidence for each patient, APC assesses it across profiles, and MPC combines both. We evaluate our method on four datasets: one simulated, two public, and one private clinical dataset. Metrics by Declaration Rate (MDR) curves show how performance changes when retaining only the most confident predictions, while interpretable decision trees reveal profiles with higher or lower model confidence. We demonstrate our method in internal, temporal, and external validation settings, as well as through a clinical example. In internal validation, limiting predictions to the 93% most confident cases improved sensitivity by 14.3% and the AUC by 5.1%. In the clinical example, MED3pa identified a patient profile with high misclassification risk, demonstrating its potential for safer deployment. By identifying low-confidence predictions, our framework improves model reliability in clinical settings. It can be integrated into decision support systems to help clinicians make more informed decisions. Confidence thresholds help balance model performance with the proportion of patients for whom predictions are considered reliable. Better leveraging confidence in model predictions could improve reliability and trustworthiness, supporting safer and more effective use in healthcare.
Source-free Domain Adaptation Requires Penalized Diversity
While neural networks are capable of achieving human-like performance in many tasks such as image classification, the impressive performance… (see more) of each model is limited to its own dataset. Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data, thus, increasing data privacy. Diversity in representation space can be vital to a model`s adaptability in varied and difficult domains. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. Motivated by the improved predictive performance of ensembles, we propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors with Distinct Backbone Architectures (DBA). Although diversity in feature space is increased, the unconstrained mutual information (MI) maximization may potentially introduce amplification of weak hypotheses. Thus we introduce the Weak Hypothesis Penalization (WHP) regularizer as a mitigation strategy. Our work proposes Penalized Diversity (PD) where the synergy of DBA and WHP is applied to unsupervised source-free domain adaptation for covariate shift. In addition, PD is augmented with a weighted MI maximization objective for label distribution shift. Empirical results on natural, synthetic, and medical domains demonstrate the effectiveness of PD under different distributional shifts.
Uncovering executive function profiles within interindividual variability: A data driven clustering exploration of design fluency in school-aged children
Myriam Sahraoui
Karim Jerbi CoCo Lab
Vanessa Hadid
Bruno Gauthier
Communication Efficient LLM Pre-training with SparseLoCo
Amir M. Sarfi
Joel Lidin
Low-dimensional embeddings of high-dimensional data
Cyril de Bodt
Alex Diaz-Papkovich
Michael Bleher
Kerstin Bunte
Corinna Coupette
Sebastian Damrich
Fred Hamprecht
EmHoke-'Agnes Horv'at
Dhruv Kohli
John A. Lee 0001
Boudewijn P. F. Lelieveldt
Leland McInnes
Ian T. Nabney
Maximilian Noichl
Pavlin G. Polivcar
Bastian Rieck
Gal Mishne … (see 1 more)
Dmitry Kobak
Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (see more)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.