Portrait of Jun Ding

Jun Ding

Affiliate Member
Assistant professor, McGill University, Department of Medicine
Research Topics
Computational Biology
Medical Machine Learning
Representation Learning

Biography

Jun Ding is an assistant professor in the Department of Medicine of the Faculty of Medicine and Health Sciences at McGill University.

Alongside his team, he is dedicated to employing machine learning techniques to decipher the complex dynamics of cells in various diseases, such as developmental disorders, pulmonary diseases and cancers. The diverse and intricate nature of these conditions necessitates innovative approaches, prompting the use of state-of-the-art single-cell technologies to meticulously profile individual cell states. The result is a rich source of data for our machine learning models.

These technologies present unprecedented opportunities to advance understanding, particularly in fields like developmental and cancer biology. However, the challenge is to develop computational models capable of linking this intricate biomedical data to potential discoveries.

Ding’s primary focus lies in the development and refinement of machine learning methodologies, especially probabilistic graphical models, to effectively analyze, model and visualize both single-cell and bulk omics data, often featuring longitudinal or spatial dimensions. The goal is to harness these advanced machine learning techniques to deepen the comprehension of cellular dynamics, and so develop groundbreaking diagnostic and therapeutic strategies that can significantly benefit public health.

Current Students

PhD - McGill University
Principal supervisor :

Publications

Dissecting and steering cell dynamics using spatially-informed RNA velocity with veloAgent
RNA velocity enables inference of cell state transitions from single-cell transcriptomics by modeling transcriptional dynamics from spliced … (see more)and unspliced mRNA. However, existing methods overlook spatial context and struggle to scale to large datasets, limiting insights into tissue organization and dynamic processes. We introduce veloAgent, a deep generative and agent-based framework that estimates gene- and cell-specific transcriptional kinetics while integrating spatial information through agent-based simulations of local microenvironments. By leveraging both molecular and spatial cues, veloAgent improves velocity accuracy and achieves sublinear memory scaling, enabling efficient analysis of large and multi-batch spatial datasets. A distinctive feature of veloAgent is its in silico perturbation module, which allows targeted manipulation of spatial velocity vectors to simulate regulatory interventions and predict their impact on cell fate dynamics. These capabilities position veloAgent as a scalable and versatile framework for dissecting spatially resolved cellular dynamics and guiding cell fate manipulation across diverse biological processes.
SIDISH integrates single-cell and bulk transcriptomics to identify high-risk cells and guide precision therapeutics through in silico perturbation
Yasmin Jolasun
Kailu Song
Yumin Zheng
Jingtao Wang
Gregory Fonseca
David H. Eidelman
Single-cell RNA sequencing (scRNA-seq) provides high-resolution insights into cellular heterogeneity but remains costly, restricting its use… (see more) to small cohorts that often lack comprehensive clinical data, reducing translational relevance. In contrast, bulk RNA sequencing is scalable and cost-effective but obscures critical single-cell insights. We introduce SIDISH, a neural network framework that integrates the granularity of scRNA-seq with the scalability of bulk RNA-seq. Using a variational autoencoder, deep Cox regression, and transfer learning, SIDISH identifies high-risk cell populations while enabling robust clinical predictions from large-cohort data. Its in silico perturbation module identifies therapeutic targets by simulating interventions that reduce high-risk cells associated with adverse outcomes. SIDISH also generalizes to spatial transcriptomics, identifying high-risk cells and mapping them within their native tissue microenvironment. Applied across diverse diseases, SIDISH establishes the link between cellular dynamics and clinical phenotypes, facilitating biomarker discovery and precision medicine. By unifying single-cell insights with large-scale clinical data, SIDISH advances computational tools for disease risk assessment and therapeutic prioritization, offering an integrative and scalable approach to precision medicine. SIDISH integrates single-cell and bulk RNA sequencing data using deep learning to identify high-risk cell populations and prognostic biomarkers, enabling in silico perturbations that could guide precision therapeutics and advance personalized medicine.
scGALA advances graph link prediction-based cell alignment for comprehensive data integration and harmonization
Guo Jiang
Kailu Song
Gregory J. Fonseca
Darcy E. Wagner
Iain C. Clark
Hui Wang
Single-cell technologies have transformed our understanding of cellular heterogeneity through multimodal data acquisition. However, robust c… (see more)ell alignment remains a major challenge for data integration and harmonization, including batch correction, label transfer, and multi-omics integration. Many existing methods constrain alignment based on rigid feature-wise distance metrics, limiting their ability to capture accurate cell correspondence across diverse cell populations and conditions. We introduce scGALA, a graph-based learning framework that redefines cell alignment by combining graph attention networks with a score-driven, task-independent optimization strategy. scGALA constructs enriched graphs of cell-cell relationships by integrating gene expression profiles with auxiliary information, such as spatial coordinates, and iteratively refines alignment via self-supervised graph link prediction, where a deep neural network is trained to identify and reinforce high-confidence correspondences across datasets. In extensive benchmarks, scGALA identifies over 25 percent more high-confidence alignments without compromising accuracy. By improving the core step of cell alignment, scGALA serves as a versatile enhancer for a wide range of single-cell data integration tasks.
DENetwork unveils non-differentially expressed genes with functional relevance across conditions through information flow perturbation
Bowen Zhao
Ting-Yi Su
Jingtao Wang
Quazi S. Islam
Kailu Song
Steven K. Huang
Matthieu Allez
Gregory J. Fonseca
Carolyn J. Baglole
Differential gene expression (DE) analysis of RNA-sequencing (RNA-seq) data is a standard approach for identifying phenotypic differences be… (see more)tween conditions. However, traditional DE methods such as DESeq2 focus on expression changes alone, often overlooking non-differentially expressed (non-DE) genes that may play key regulatory roles. This limits their ability to identify upstream drivers of transcriptomic variation. To address this gap, we introduce DENetwork, a network-based approach that prioritizes genes based on their influence on global information flow. Each gene is scored using an in silico knockout strategy that quantifies its impact across the inferred gene network, capturing both DE and non-DE genes with potential functional relevance. DENetwork deciphers intricate regulatory and signaling networks driving transcriptomic variations between conditions with distinct phenotypes. Across simulated and disease-relevant RNA-seq datasets, DENetwork identifies non-DE regulators enriched in known pathways and phenotypic associations, providing mechanistic insights missed by standard DE analysis, with implications for target discovery and intervention.
SUICA: Learning Super-high Dimensional Sparse Implicit Neural Representations for Spatial Transcriptomics
Qingtian Zhu
Yumin Zheng
Yuling Sang
Yifan Zhan
Ziyan Zhu
Yinqiang Zheng
Spatial Transcriptomics (ST) is a method that captures gene expression profiles aligned with spatial coordinates. The discrete spatial distr… (see more)ibution and the super-high dimensional sequencing results make ST data challenging to be modeled effectively. In this paper, we manage to model ST in a continuous and compact manner by the proposed tool, SUICA, empowered by the great approximation capability of Implicit Neural Representations (INRs) that can enhance both the spatial density and the gene expression. Concretely within the proposed SUICA, we incorporate a graph-augmented Autoencoder to effectively model the context information of the unstructured spots and provide informative embeddings that are structure-aware for spatial mapping. We also tackle the extremely skewed distribution in a regression-by-classification fashion and enforce classification-based loss functions for the optimization of SUICA. By extensive experiments of a wide range of common ST platforms under varying degradations, SUICA outperforms both conventional INR variants and SOTA methods regarding numerical fidelity, statistical correlation, and bio-conservation. The prediction by SUICA also showcases amplified gene signatures that enriches the bio-conservation of the raw data and benefits subsequent analysis.
SUICA: Learning Super-high Dimensional Sparse Implicit Neural Representations for Spatial Transcriptomics
Qingtian Zhu
Yumin Zheng
Yuling Sang
Yifan Zhan
Ziyan Zhu
Yinqiang Zheng
Spatial Transcriptomics (ST) is a method that captures gene expression profiles aligned with spatial coordinates. The discrete spatial distr… (see more)ibution and the super-high dimensional sequencing results make ST data challenging to be modeled effectively. In this paper, we manage to model ST in a continuous and compact manner by the proposed tool, SUICA, empowered by the great approximation capability of Implicit Neural Representations (INRs) that can enhance both the spatial density and the gene expression. Concretely within the proposed SUICA, we incorporate a graph-augmented Autoencoder to effectively model the context information of the unstructured spots and provide informative embeddings that are structure-aware for spatial mapping. We also tackle the extremely skewed distribution in a regression-by-classification fashion and enforce classification-based loss functions for the optimization of SUICA. By extensive experiments of a wide range of common ST platforms under varying degradations, SUICA outperforms both conventional INR variants and SOTA methods regarding numerical fidelity, statistical correlation, and bio-conservation. The prediction by SUICA also showcases amplified gene signatures that enriches the bio-conservation of the raw data and benefits subsequent analysis. The code is available at https://github.com/Szym29/SUICA.
Inhibition of epithelial cell YAP-TEAD/LOX signaling attenuates pulmonary fibrosis in preclinical models
Darcy Elizabeth Wagner
Hani N. Alsafadi
Nilay Mitash
Aurelien Justet
Qianjiang Hu
Ricardo Pineda
Claudia Staab-Weijnitz
Martina Korfei
Nika Gvazava
Kristin Wannemo
Ugochi Onwuka
Molly Mozurak
Adriana Estrada-Bernal
Juan Cala Garcia
Katrin Mutze
Rita Costa
Deniz Bölükbas
John Stegmayr
Wioletta Skronska-Wasek
Stephan Klee … (see 14 more)
Chiharu Ota
Hoeke A. Baarsma
Jingtao Wang
John Sembrat
Anne Hilgendorff
Andreas Günther
Rachel Chambers
Ivan O Rosas
Stijn de Langhe
Naftali Kaminski
Mareike Lehmann
Oliver Eickelberg
Melanie Königshoff
Idiopathic pulmonary fibrosis (IPF) is a progressive and lethal disease characterized by excessive extracellular matrix deposition. Current … (see more)IPF therapies slow disease progression but do not stop or reverse it. The (myo)fibroblasts are thought to be the main cellular contributors to excessive extracellular matrix production in IPF. Here we show that fibrotic alveolar type II cells regulate production and crosslinking of extracellular matrix via the co-transcriptional activator YAP. YAP leads to increased expression of Lysl oxidase (LOX) and subsequent LOX-mediated crosslinking by fibrotic alveolar type II cells. Pharmacological YAP inhibition via verteporfin reverses fibrotic alveolar type II cell reprogramming and LOX expression in experimental lung fibrosis in vivo and in human fibrotic tissue ex vivo. We thus identify YAP-TEAD/LOX inhibition in alveolar type II cells as a promising potential therapy for IPF patients.
Inhibition of epithelial cell YAP-TEAD/LOX signaling attenuates pulmonary fibrosis in preclinical models
Darcy Elizabeth Wagner
Hani N. Alsafadi
Nilay Mitash
Aurelien Justet
Qianjiang Hu
Ricardo Pineda
Claudia Staab-Weijnitz
Martina Korfei
Nika Gvazava
Kristin Wannemo
Ugochi Onwuka
Molly Mozurak
Adriana Estrada-Bernal
Juan Cala Garcia
Katrin Mutze
Rita Costa
Deniz Bölükbas
John Stegmayr
Wioletta Skronska-Wasek
Stephan Klee … (see 14 more)
Chiharu Ota
Hoeke A. Baarsma
Jingtao Wang
John Sembrat
Anne Hilgendorff
Andreas Günther
Rachel Chambers
Ivan O Rosas
Stijn de Langhe
Naftali Kaminski
Mareike Lehmann
Oliver Eickelberg
Melanie Königshoff
Idiopathic pulmonary fibrosis (IPF) is a progressive and lethal disease characterized by excessive extracellular matrix deposition. Current … (see more)IPF therapies slow disease progression but do not stop or reverse it. The (myo)fibroblasts are thought to be the main cellular contributors to excessive extracellular matrix production in IPF. Here we show that fibrotic alveolar type II cells regulate production and crosslinking of extracellular matrix via the co-transcriptional activator YAP. YAP leads to increased expression of Lysl oxidase (LOX) and subsequent LOX-mediated crosslinking by fibrotic alveolar type II cells. Pharmacological YAP inhibition via verteporfin reverses fibrotic alveolar type II cell reprogramming and LOX expression in experimental lung fibrosis in vivo and in human fibrotic tissue ex vivo. We thus identify YAP-TEAD/LOX inhibition in alveolar type II cells as a promising potential therapy for IPF patients.
Inhibition of epithelial cell YAP-TEAD/LOX signaling attenuates pulmonary fibrosis in preclinical models
Darcy Elizabeth Wagner
Hani N. Alsafadi
Nilay Mitash
Aurelien Justet
Qianjiang Hu
Ricardo Pineda
Claudia Staab-Weijnitz
Martina Korfei
Nika Gvazava
Kristin Wannemo
Ugochi Onwuka
Molly Mozurak
Adriana Estrada-Bernal
Juan Cala Garcia
Katrin Mutze
Rita Costa
Deniz Bölükbas
John Stegmayr
Wioletta Skronska-Wasek
Stephan Klee … (see 14 more)
Chiharu Ota
Hoeke A. Baarsma
Jingtao Wang
John Sembrat
Anne Hilgendorff
Andreas Günther
Rachel Chambers
Ivan O Rosas
Stijn de Langhe
Naftali Kaminski
Mareike Lehmann
Oliver Eickelberg
Melanie Königshoff
Idiopathic pulmonary fibrosis (IPF) is a progressive and lethal disease characterized by excessive extracellular matrix deposition. Current … (see more)IPF therapies slow disease progression but do not stop or reverse it. The (myo)fibroblasts are thought to be the main cellular contributors to excessive extracellular matrix production in IPF. Here we show that fibrotic alveolar type II cells regulate production and crosslinking of extracellular matrix via the co-transcriptional activator YAP. YAP leads to increased expression of Lysl oxidase (LOX) and subsequent LOX-mediated crosslinking by fibrotic alveolar type II cells. Pharmacological YAP inhibition via verteporfin reverses fibrotic alveolar type II cell reprogramming and LOX expression in experimental lung fibrosis in vivo and in human fibrotic tissue ex vivo. We thus identify YAP-TEAD/LOX inhibition in alveolar type II cells as a promising potential therapy for IPF patients.
Computational Tracking of Cell Origins Using CellSexID from Single-Cell Transcriptomes
Huilin Tai
Qian Li
Jingtao Wang
Jiahui Tan
Bowen Zhao
Ryann Lang
Basil J. Petrof
Cell tracking in chimeric models is essential yet challenging, particularly in developmental biology, regenerative medicine, and transplanta… (see more)tion research. Existing methods such as fluorescent labeling and genetic barcoding are technically demanding, costly, and often impractical for dynamic or heterogeneous tissues. Here, we introduce CellSexID, a computational framework that leverages sex as a surrogate marker for cell origin inference. Using a machine learning model trained on single-cell transcriptomic data, CellSexID accurately predicts the sex of individual cells, enabling in silico distinction between donor and recipient cells in sex-mismatched settings. The model identifies minimal sex-linked gene sets through ensemble feature selection and has been validated using both public datasets and experimental flow sorting, confirming the biological relevance of predicted populations. We further demonstrate CellSexID’s applicability beyond chimeric models, including organ transplantation and multiplexed sample demultiplexing. As a scalable and cost-effective alternative to physical labeling, CellSexID facilitates precise cell tracking and supports diverse biomedical applications involving mixed cellular origins.
DOLPHIN advances single-cell transcriptomics beyond gene level by leveraging exon and junction reads
Kailu Song
Yumin Zheng
Bowen Zhao
David H. Eidelman
Harnessing agent-based frameworks in CellAgentChat to unravel cell-cell interactions from single-cell and spatial transcriptomics