Portrait of Jun Ding

Jun Ding

Affiliate Member
Assistant professor, McGill University, Department of Medicine
Research Topics
Computational Biology
Medical Machine Learning
Representation Learning

Biography

Jun Ding is an assistant professor in the Department of Medicine of the Faculty of Medicine and Health Sciences at McGill University.

Alongside his team, he is dedicated to employing machine learning techniques to decipher the complex dynamics of cells in various diseases, such as developmental disorders, pulmonary diseases and cancers. The diverse and intricate nature of these conditions necessitates innovative approaches, prompting the use of state-of-the-art single-cell technologies to meticulously profile individual cell states. The result is a rich source of data for our machine learning models.

These technologies present unprecedented opportunities to advance understanding, particularly in fields like developmental and cancer biology. However, the challenge is to develop computational models capable of linking this intricate biomedical data to potential discoveries.

Ding’s primary focus lies in the development and refinement of machine learning methodologies, especially probabilistic graphical models, to effectively analyze, model and visualize both single-cell and bulk omics data, often featuring longitudinal or spatial dimensions. The goal is to harness these advanced machine learning techniques to deepen the comprehension of cellular dynamics, and so develop groundbreaking diagnostic and therapeutic strategies that can significantly benefit public health.

Current Students

PhD - McGill University
Principal supervisor :

Publications

Dissecting and steering cell dynamics using spatially-informed RNA velocity with veloAgent
RNA velocity enables inference of cell state transitions from single-cell transcriptomics by modeling transcriptional dynamics from spliced … (see more)and unspliced mRNA. However, existing methods overlook spatial context and struggle to scale to large datasets, limiting insights into tissue organization and dynamic processes. We introduce veloAgent, a deep generative and agent-based framework that estimates gene- and cell-specific transcriptional kinetics while integrating spatial information through agent-based simulations of local microenvironments. By leveraging both molecular and spatial cues, veloAgent improves velocity accuracy and achieves sublinear memory scaling, enabling efficient analysis of large and multi-batch spatial datasets. A distinctive feature of veloAgent is its in silico perturbation module, which allows targeted manipulation of spatial velocity vectors to simulate regulatory interventions and predict their impact on cell fate dynamics. These capabilities position veloAgent as a scalable and versatile framework for dissecting spatially resolved cellular dynamics and guiding cell fate manipulation across diverse biological processes.
SIDISH integrates single-cell and bulk transcriptomics to identify high-risk cells and guide precision therapeutics through in silico perturbation
Yasmin Jolasun
Kailu Song
Yumin Zheng
Jingtao Wang
Gregory Fonseca
David H. Eidelman
Single-cell RNA sequencing (scRNA-seq) provides high-resolution insights into cellular heterogeneity but remains costly, restricting its use… (see more) to small cohorts that often lack comprehensive clinical data, reducing translational relevance. In contrast, bulk RNA sequencing is scalable and cost-effective but obscures critical single-cell insights. We introduce SIDISH, a neural network framework that integrates the granularity of scRNA-seq with the scalability of bulk RNA-seq. Using a variational autoencoder, deep Cox regression, and transfer learning, SIDISH identifies high-risk cell populations while enabling robust clinical predictions from large-cohort data. Its in silico perturbation module identifies therapeutic targets by simulating interventions that reduce high-risk cells associated with adverse outcomes. SIDISH also generalizes to spatial transcriptomics, identifying high-risk cells and mapping them within their native tissue microenvironment. Applied across diverse diseases, SIDISH establishes the link between cellular dynamics and clinical phenotypes, facilitating biomarker discovery and precision medicine. By unifying single-cell insights with large-scale clinical data, SIDISH advances computational tools for disease risk assessment and therapeutic prioritization, offering an integrative and scalable approach to precision medicine. SIDISH integrates single-cell and bulk RNA sequencing data using deep learning to identify high-risk cell populations and prognostic biomarkers, enabling in silico perturbations that could guide precision therapeutics and advance personalized medicine.
scGALA advances graph link prediction-based cell alignment for comprehensive data integration and harmonization
Guo Jiang
Kailu Song
Gregory J. Fonseca
Darcy E. Wagner
Iain C. Clark
Hui Wang
Single-cell technologies have transformed our understanding of cellular heterogeneity through multimodal data acquisition. However, robust c… (see more)ell alignment remains a major challenge for data integration and harmonization, including batch correction, label transfer, and multi-omics integration. Many existing methods constrain alignment based on rigid feature-wise distance metrics, limiting their ability to capture accurate cell correspondence across diverse cell populations and conditions. We introduce scGALA, a graph-based learning framework that redefines cell alignment by combining graph attention networks with a score-driven, task-independent optimization strategy. scGALA constructs enriched graphs of cell-cell relationships by integrating gene expression profiles with auxiliary information, such as spatial coordinates, and iteratively refines alignment via self-supervised graph link prediction, where a deep neural network is trained to identify and reinforce high-confidence correspondences across datasets. In extensive benchmarks, scGALA identifies over 25 percent more high-confidence alignments without compromising accuracy. By improving the core step of cell alignment, scGALA serves as a versatile enhancer for a wide range of single-cell data integration tasks.
DENetwork unveils non-differentially expressed genes with functional relevance across conditions through information flow perturbation
Bowen Zhao
Ting-Yi Su
Jingtao Wang
Quazi S. Islam
Kailu Song
Steven K. Huang
Matthieu Allez
Gregory J. Fonseca
Carolyn J. Baglole
Differential gene expression (DE) analysis of RNA-sequencing (RNA-seq) data is a standard approach for identifying phenotypic differences be… (see more)tween conditions. However, traditional DE methods such as DESeq2 focus on expression changes alone, often overlooking non-differentially expressed (non-DE) genes that may play key regulatory roles. This limits their ability to identify upstream drivers of transcriptomic variation. To address this gap, we introduce DENetwork, a network-based approach that prioritizes genes based on their influence on global information flow. Each gene is scored using an in silico knockout strategy that quantifies its impact across the inferred gene network, capturing both DE and non-DE genes with potential functional relevance. DENetwork deciphers intricate regulatory and signaling networks driving transcriptomic variations between conditions with distinct phenotypes. Across simulated and disease-relevant RNA-seq datasets, DENetwork identifies non-DE regulators enriched in known pathways and phenotypic associations, providing mechanistic insights missed by standard DE analysis, with implications for target discovery and intervention.
SUICA: Learning Super-high Dimensional Sparse Implicit Neural Representations for Spatial Transcriptomics
Qingtian Zhu
Yumin Zheng
Yuling Sang
Yifan Zhan
Ziyan Zhu
Yinqiang Zheng
Spatial Transcriptomics (ST) is a method that captures gene expression profiles aligned with spatial coordinates. The discrete spatial distr… (see more)ibution and the super-high dimensional sequencing results make ST data challenging to be modeled effectively. In this paper, we manage to model ST in a continuous and compact manner by the proposed tool, SUICA, empowered by the great approximation capability of Implicit Neural Representations (INRs) that can enhance both the spatial density and the gene expression. Concretely within the proposed SUICA, we incorporate a graph-augmented Autoencoder to effectively model the context information of the unstructured spots and provide informative embeddings that are structure-aware for spatial mapping. We also tackle the extremely skewed distribution in a regression-by-classification fashion and enforce classification-based loss functions for the optimization of SUICA. By extensive experiments of a wide range of common ST platforms under varying degradations, SUICA outperforms both conventional INR variants and SOTA methods regarding numerical fidelity, statistical correlation, and bio-conservation. The prediction by SUICA also showcases amplified gene signatures that enriches the bio-conservation of the raw data and benefits subsequent analysis. The code is available at https://github.com/Szym29/SUICA.
CellSexID: Sex-Based Computational Tracking of Cellular Origins in Chimeric Models
Huilin Tai
Qian Li
Jingtao Wang
Jiahui Tan
Bowen Zhao
Ryann Lang
Basil J. Petrof
Cell tracking in chimeric models is essential yet challenging, particularly in developmental biology, regenerative medicine, and transplanta… (see more)tion studies. Existing methods, such as fluorescent labeling and genetic barcoding, are technically demanding, costly, and often impractical for dynamic, heterogeneous tissues. To address these limitations, we propose a computational framework that leverages sex as a surrogate marker for cell tracking. Our approach uses a machine learning model trained on single-cell transcriptomic data to predict cell sex with high accuracy, enabling clear distinction between donor (male) and recipient (female) cells in sex-mismatched chimeric models. The model identifies specific genes critical for sex prediction and has been validated using public datasets and experimental flow sorting, confirming the biological relevance of the identified cell populations. Applied to skeletal muscle macrophages, our method revealed distinct transcriptional profiles associated with cellular origins. This pipeline offers a robust, cost-effective solution for cell tracking in chimeric models, advancing research in regenerative medicine and immunology by providing precise insights into cellular origins and therapeutic outcomes.
Inhibition of epithelial cell YAP-TEAD/LOX signaling attenuates pulmonary fibrosis in preclinical models
Darcy Elizabeth Wagner
Hani N. Alsafadi
Nilay Mitash
Aurelien Justet
Qianjiang Hu
Ricardo Pineda
Claudia Staab-Weijnitz
Martina Korfei
Nika Gvazava
Kristin Wannemo
Ugochi Onwuka
Molly Mozurak
Adriana Estrada-Bernal
Juan Cala Garcia
Katrin Mutze
Rita Costa
Deniz Bölükbas
John Stegmayr
Wioletta Skronska-Wasek
Stephan Klee … (see 14 more)
Chiharu Ota
Hoeke A. Baarsma
Jingtao Wang
John Sembrat
Anne Hilgendorff
Andreas Günther
Rachel Chambers
Ivan O Rosas
Stijn de Langhe
Naftali Kaminski
Mareike Lehmann
Oliver Eickelberg
Melanie Königshoff
Idiopathic pulmonary fibrosis (IPF) is a progressive and lethal disease characterized by excessive extracellular matrix deposition. Current … (see more)IPF therapies slow disease progression but do not stop or reverse it. The (myo)fibroblasts are thought to be the main cellular contributors to excessive extracellular matrix production in IPF. Here we show that fibrotic alveolar type II cells regulate production and crosslinking of extracellular matrix via the co-transcriptional activator YAP. YAP leads to increased expression of Lysl oxidase (LOX) and subsequent LOX-mediated crosslinking by fibrotic alveolar type II cells. Pharmacological YAP inhibition via verteporfin reverses fibrotic alveolar type II cell reprogramming and LOX expression in experimental lung fibrosis in vivo and in human fibrotic tissue ex vivo. We thus identify YAP-TEAD/LOX inhibition in alveolar type II cells as a promising potential therapy for IPF patients.
DOLPHIN advances single-cell transcriptomics beyond gene level by leveraging exon and junction reads
Kailu Song
Yumin Zheng
Bowen Zhao
David H. Eidelman
The advent of single-cell sequencing has revolutionized the study of cellular dynamics, providing unprecedented resolution into the molecula… (see more)r states and heterogeneity of individual cells. However, the rich potential of exon-level information and junction reads within single cells remains underutilized. Conventional gene-count methods overlook critical exon and junction data, limiting the quality of cell representation and downstream analyses such as subpopulation identification and alternative splicing detection. We introduce DOLPHIN, a deep learning method that integrates exon-level and junction read data, representing genes as graph structures. These graphs are processed by a variational graph autoencoder to improve cell embeddings. DOLPHIN not only demonstrates superior performance in cell clustering, biomarker discovery, and alternative splicing detection but also provides a distinct capability to detect subtle transcriptomic differences at the exon level that are often masked in gene-level analyses. By examining cellular dynamics with enhanced resolution, DOLPHIN provides new insights into disease mechanisms and potential therapeutic targets.
Harnessing agent-based frameworks in CellAgentChat to unravel cell–cell interactions from single-cell and spatial transcriptomics
Understanding cell–cell interactions (CCIs) is essential yet challenging owing to the inherent intricacy and diversity of cellular dynamic… (see more)s. Existing approaches often analyze global patterns of CCIs using statistical frameworks, missing the nuances of individual cell behavior owing to their focus on aggregate data. This makes them insensitive in complex environments where the detailed dynamics of cell interactions matter. We introduce CellAgentChat, an agent-based model (ABM) designed to decipher CCIs from single-cell RNA sequencing and spatial transcriptomics data. This approach models biological systems as collections of autonomous agents governed by biologically inspired principles and rules. Validated across eight diverse single-cell data sets, CellAgentChat demonstrates its effectiveness in detecting intricate signaling events across different cell populations. Moreover, CellAgentChat offers the ability to generate animated visualizations of single-cell interactions and provides flexibility in modifying agent behavior rules, facilitating thorough exploration of both close and distant cellular communications. Furthermore, CellAgentChat leverages ABM features to enable intuitive in silico perturbations via agent rule modifications, facilitating the development of novel intervention strategies. This ABM method unlocks an in-depth understanding of cellular signaling interactions across various biological contexts, thereby enhancing in silico studies for cellular communication–based therapies.
A deep generative model for deciphering cellular dynamics and in silico drug discovery in complex diseases
Yumin Zheng
Jonas C. Schupp
Taylor Adams
Geremy Clair
Aurelien Justet
Farida Ahangari
Xiting Yan
Paul Hansen
Marianne Carlon
Emanuela Cortesi
Marie Vermant
Robin Vos
Laurens J. De Sadeleer
Ivan O. Rosas
Ricardo Pineda
John Sembrat
Melanie Königshoff
John E. McDonough
Bart M. Vanaudenaerde
Wim A. Wuyts … (see 2 more)
Naftali Kaminski
Human diseases are characterized by intricate cellular dynamics. Single-cell transcriptomics provides critical insights, yet a persistent ga… (see more)p remains in computational tools for detailed disease progression analysis and targeted in silico drug interventions. Here we introduce UNAGI, a deep generative neural network tailored to analyse time-series single-cell transcriptomic data. This tool captures the complex cellular dynamics underlying disease progression, enhancing drug perturbation modelling and screening. When applied to a dataset from patients with idiopathic pulmonary fibrosis, UNAGI learns disease-informed cell embeddings that sharpen our understanding of disease progression, leading to the identification of potential therapeutic drug candidates. Validation using proteomics reveals the accuracy of UNAGI’s cellular dynamics analysis, and the use of the fibrotic cocktail-treated human precision-cut lung slices confirms UNAGI’s predictions that nifedipine, an antihypertensive drug, may have anti-fibrotic effects on human tissues. UNAGI’s versatility extends to other diseases, including COVID, demonstrating adaptability and confirming its broader applicability in decoding complex cellular dynamics beyond idiopathic pulmonary fibrosis, amplifying its use in the quest for therapeutic solutions across diverse pathological landscapes.
Alveolar epithelial cell plasticity and injury memory in human pulmonary fibrosis
Taylor S Adams
Jonas C Schupp
Agshin Balayev
Johad Khoury
Aurelien Justet
Fadi Nikola
Laurens J De Sadeleer
Laurens J De Sadeleer
Juan Cala-Garcia
Marta Zapata-Ortega
Panayiotis V Benos
Panayiotis V Benos
Panayiotis V Benos
John E McDonough
Farida Ahangari
Melanie Königshoff
Robert J Homer
Ivan Rosas
Xiting Yan … (see 3 more)
Bart M Vanaudenaerde
Wim A Wuyts
Naftali Kaminski
Acute and repetitive lung epithelial injury can lead to irreversible and even progressive pulmonary fibrosis; Idiopathic pulmonary fibrosis … (see more)(IPF) is a fatal disease and quintessential example of this phenomenon. The composition of epithelial cells in human pulmonary fibrosis – irrespective of disease etiology – is marked by the presence of Aberrant Basaloid cells: an abnormal cell phenotype with pro-fibrotic and senescent features, localized to the surface of fibrotic lesions. Despite their relevance to human pulmonary fibrosis, the exotic molecular profile of Aberrant Basaloid cells has obscured their etiology, preventing insights into how or why these cells emerge with fibrosis. Here we identify cellular intermediaries between Aberrant Basaloid and normal alveolar epithelial cells in human IPF tissue. We track the emergence of Aberrant Basaloid cells from alveolar epithelial cells ex vivo and uncover a role for similar cells in epithelial regeneration under normal conditions. Lastly, we characterize the epigenetic changes that distinguish Aberrant Basaloid cells from their progenitors and identify hallmarks of AP-1 injury memory retention. This study elucidates the phenomenon of maladaptive epithelial plasticity and regeneration in pulmonary fibrosis and re-contextualizes therapeutic strategies for epithelial dysfunction.
Advancing global antifungal development to combat invasive fungal infection
Xiu-Li Wang
Koon Ho Wong
Chen Ding
Chang-Bin Chen
Wen-Juan Wu
Ningning Liu