Matthew Scicluna

scShapeBench: Discovering geometry from high dimensional scRNAseq data

Andrew J. Steindl

João Felipe Rocha

Brian Tshilengi Di Bassinga

Zachary Warren

Matthew Scicluna

César Miguel Valdez Cordova

Shabarni Gupta

Leire Torices

Daniel Neumann

Timothy J. Mann

Ihuan Gunawan

Dhananjay Bhaskar

John G. Lock

Christine L. Chaffer

Guy Wolf

Smita Krishnaswamy

High-dimensional point cloud data arise across many scientific domains, especially single-cell biology. The shapes or topologies of these da… (voir plus)tasets determine the types of information that can be extracted. For example, clustered data supports cell-type identification, trajectory structures support transition analysis, and archetypal structures capture continua of cellular behaviors. Existing analysis pipelines often assume a specific shape. The standard Seurat pipeline combines UMAP visualization with Louvain clustering and therefore assumes clustered data, while tools such as Monocle and SPADE assume tree-like structures, and flow-based models such as MIOFlow and Conditional Flow Matching target trajectories. Choosing which pipeline to apply is therefore often left to bioinformaticians who visually inspect datasets before selecting an analysis strategy. With the rise of agentic AI scientists, automating shape detection is increasingly important for selecting downstream analysis pipelines. To address this problem, we introduce scShapeBench, a benchmark dataset for shape detection containing both synthetic and expert-annotated single-cell datasets. Synthetic datasets are sampled from ground-truth skeleton graphs with controlled variance. Real single-cell datasets are curated from diverse sources and annotated by experts into four categories: clusters, single trajectory, multi-branching, and archetypal. We additionally introduce scReebTower, a baseline method that uses diffusion geometry to extract Reeb graphs and connect visualization with pipeline selection. We provide topology-aware evaluation metrics and compare scReebTower against PAGA and Mapper on synthetic and real data. Our results indicate that scReebTower outperforms existing baselines. Overall, our contributions span benchmarks, evaluation metrics, and a baseline for automated shape detection in single-cell data.

2026-05-11

arXiv (prépublication)

doi.org

arxiv.org

Measure Before You Look: Grounding Embeddings Through Manifold Metrics

César Miguel Valdez Cordova

Simon Gravel

2025-09-22

NeurIPS.cc/2025/Workshop/UniReps (publié)

openreview.net

A Transparent and Generalizable Deep Learning Framework for Genomic Ancestry Prediction

Camille Rochefort-Boulanger

Matthew Scicluna

Raphaël Poujol

Jean-Christophe Grenier

Pierre Luc Carrier

Sébastien Lemieux

Julie G Hussin

1 Accurately capturing genetic ancestry is critical for ensuring reproducibility and fairness in genomic st… (voir plus)udies and downstream health research. This study aims to address the prediction of ancestry from genetic data using deep learning, with a focus on generalizability across datasets with diverse populations and on explainability to improve model transparency. We adapt the Diet Network, a deep learning architecture proven effective in handling high-dimensional data, to learn population ancestry from single nucleotide polymorphisms (SNPs) data using the populational Thousand Genomes Project dataset. Our results highlight the model’s ability to generalize to diverse populations in the CARTaGENE and Montreal Heart Institute biobanks and that predictions remain robust to high levels of missing SNPs. We show that, despite the lack of North African populations in the training dataset, the model learns latent representations that reflect meaningful population structure for North African individuals in the biobanks. To improve model transparency, we apply Saliency Maps, DeepLift, GradientShap and Integrated Gradients attribution techniques and evaluate their performance in identifying SNPs leveraged by the model. Using DeepLift, we show that model’s predictions are driven by population-specific signals consistent with those identified by traditional population genetics metrics. This work presents a generalizable and interpretable deep learning framework for genetic ancestry inference in large-scale biobanks with genetic data. By enabling more widespread genomic ancestry characterization in these cohorts, this study contributes practical tools for integrating genetic data into downstream biomedical applications, supporting more inclusive and equitable healthcare solutions.

2025-08-29

bioRxiv (prépublication)

doi.org

Toward computing attributions for dimensionality reduction techniques

Matthew Scicluna

Jean-Christophe Grenier

Raphaël Poujol

Sébastien Lemieux

Julie G. Hussin

We describe the problem of computing local feature attributions for dimensionality reduction methods. We use one such method that is well es… (voir plus)tablished within the context of supervised classification—using the gradients of target outputs with respect to the inputs—on the popular dimensionality reduction technique t-SNE, widely used in analyses of biological data. We provide an efficient implementation for the gradient computation for this dimensionality reduction technique. We show that our explanations identify significant features using novel validation methodology; using synthetic datasets and the popular MNIST benchmark dataset. We then demonstrate the practical utility of our algorithm by showing that it can produce explanations that agree with domain knowledge on a SARS-CoV-2 sequence dataset. Throughout, we provide a road map so that similar explanation methods could be applied to other dimensionality reduction techniques to rigorously analyze biological datasets. We have created a Python package that can be installed using the following command: pip install interpretable_tsne. All code used can be found at github.com/MattScicluna/interpretable_tsne.

2022-12-31

Bioinformatics Advances (publié)

doi.org

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Matthew Scicluna

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Matthew Scicluna

Publications