Portrait of Jun Ding

Jun Ding

Affiliate Member
Assistant professor, McGill University, Department of Medicine
Research Topics
Computational Biology
Medical Machine Learning
Representation Learning

Biography

Jun Ding is an assistant professor in the Department of Medicine of the Faculty of Medicine and Health Sciences at McGill University.

Alongside his team, he is dedicated to employing machine learning techniques to decipher the complex dynamics of cells in various diseases, such as developmental disorders, pulmonary diseases and cancers. The diverse and intricate nature of these conditions necessitates innovative approaches, prompting the use of state-of-the-art single-cell technologies to meticulously profile individual cell states. The result is a rich source of data for our machine learning models.

These technologies present unprecedented opportunities to advance understanding, particularly in fields like developmental and cancer biology. However, the challenge is to develop computational models capable of linking this intricate biomedical data to potential discoveries.

Ding’s primary focus lies in the development and refinement of machine learning methodologies, especially probabilistic graphical models, to effectively analyze, model and visualize both single-cell and bulk omics data, often featuring longitudinal or spatial dimensions. The goal is to harness these advanced machine learning techniques to deepen the comprehension of cellular dynamics, and so develop groundbreaking diagnostic and therapeutic strategies that can significantly benefit public health.

Current Students

Master's Research - McGill University
Principal supervisor :

Publications

scCobra allows contrastive cell embedding learning with domain adaptation for single cell data integration and harmonization
Bowen Zhao
Kailu Song
Dong-Qing Wei
Yi Xiong
DTPSP: A Deep Learning Framework for Optimized Time Point Selection in Time-Series Single-Cell Studies
Michel Hijazin
Pumeng Shi
Jingtao Wang
Time-series studies are critical for uncovering dynamic biological processes, but achieving comprehensive profiling and resolution across mu… (see more)ltiple time points and modalities (multi-omics) remains challenging due to cost and scalability constraints. Current methods for studying temporal dynamics, whether at the bulk or single-cell level, often require extensive sampling, making it impractical to deeply profile all time points and modalities. To overcome these limitations, we present DTPSP, a deep learning framework designed to identify the most informative time points in any time-series study, enabling resource-efficient and targeted analyses. DTPSP models temporal gene expression patterns using readily obtainable data, such as bulk RNA-seq, to select time points that capture key system dynamics. It also integrates a deep generative module to infer data for non-sampled time points based on the selected time points, reconstructing the full temporal trajectory. This dual capability enables DTPSP to prioritize key time points for in-depth profiling, such as single-cell sequencing or multi-omics analyses, while filling gaps in the temporal landscape with high fidelity. We apply DTPSP to developmental and disease-associated time courses, demonstrating its ability to optimize experimental designs across bulk and single-cell studies. By reducing costs, enabling strategic multi-omics profiling, and enhancing biological insights, DTPSP provides a scalable and generalized solution for investigating dynamic systems.
DTPSP: A Deep Learning Framework for Optimized Time Point Selection in Time-Series Single-Cell Studies
Michel Hijazin
Pumeng Shi
Jingtao Wang
CellSexID: Sex-Based Computational Tracking of Cellular Origins in Chimeric Models
Huilin Tai
Qian Li
Jingtao Wang
Jiahui Tan
Ryann Lang
Basil J. Petrof
Cell tracking in chimeric models is essential yet challenging, particularly in developmental biology, regenerative medicine, and transplanta… (see more)tion studies. Existing methods, such as fluorescent labeling and genetic barcoding, are technically demanding, costly, and often impractical for dynamic, heterogeneous tissues. To address these limitations, we propose a computational framework that leverages sex as a surrogate marker for cell tracking. Our approach uses a machine learning model trained on single-cell transcriptomic data to predict cell sex with high accuracy, enabling clear distinction between donor (male) and recipient (female) cells in sex-mismatched chimeric models. The model identifies specific genes critical for sex prediction and has been validated using public datasets and experimental flow sorting, confirming the biological relevance of the identified cell populations. Applied to skeletal muscle macrophages, our method revealed distinct transcriptional profiles associated with cellular origins. This pipeline offers a robust, cost-effective solution for cell tracking in chimeric models, advancing research in regenerative medicine and immunology by providing precise insights into cellular origins and therapeutic outcomes.
CellSexID: Sex-Based Computational Tracking of Cellular Origins in Chimeric Models
Huilin Tai
Qian Li
Jingtao Wang
Jiahui Tan
Ryann Lang
Basil J. Petrof
Cell tracking in chimeric models is essential yet challenging, particularly in developmental biology, regenerative medicine, and transplanta… (see more)tion studies. Existing methods, such as fluorescent labeling and genetic barcoding, are technically demanding, costly, and often impractical for dynamic, heterogeneous tissues. To address these limitations, we propose a computational framework that leverages sex as a surrogate marker for cell tracking. Our approach uses a machine learning model trained on single-cell transcriptomic data to predict cell sex with high accuracy, enabling clear distinction between donor (male) and recipient (female) cells in sex-mismatched chimeric models. The model identifies specific genes critical for sex prediction and has been validated using public datasets and experimental flow sorting, confirming the biological relevance of the identified cell populations. Applied to skeletal muscle macrophages, our method revealed distinct transcriptional profiles associated with cellular origins. This pipeline offers a robust, cost-effective solution for cell tracking in chimeric models, advancing research in regenerative medicine and immunology by providing precise insights into cellular origins and therapeutic outcomes.
MATES: A Deep Learning-Based Model for Locus-specific Quantification of Transposable Elements in Single Cell
Ruohan Wang
Yumin Zheng
Zijian Zhang
Kailu Song
Erxi Wu
Xiaopeng Zhu
Tao P. Wu
Transposable elements (TEs) are crucial for genetic diversity and gene regulation. Current single-cell quantification methods often align mu… (see more)lti-mapping reads to either ‘best-mapped’ or ‘random-mapped’ locations and categorize them at sub-family levels, overlooking the biological necessity for accurate, locus-specific TE quantification. Moreover, these existing methods are primarily designed for and focused on transcriptomics data, which restricts their adaptability to single-cell data of other modalities. To address these challenges, here we introduce MATES, a novel deep-learning approach that accurately allocates multi-mapping reads to specific loci of TEs, utilizing context from adjacent read alignments flanking the TE locus. When applied to diverse single-cell omics datasets, MATES shows improved performance over existing methods, enhancing the accuracy of TE quantification and aiding in the identification of marker TEs for identified cell populations. This development enables exploring single-cell heterogeneity and gene regulation through the lens of TEs, offering a transformative tool for the single-cell genomics community.
scHiCyclePred: a deep learning framework for predicting cell cycle phases from single-cell Hi-C data using multi-scale interaction information
Yingfu Wu
Zhenqi Shi
Xiangfei Zhou
Pengyu Zhang
Xiuhui Yang
Hao Wu
scCross: A Deep Generative Model for Unifying Single-cell Multi-omics with Seamless Integration, Cross-modal Generation, and In-silico Exploration
Xiuhui Yang
Koren K. Mann
Hao Wu
Single-cell multi-omics illuminate intricate cellular states, yielding transformative insights into cellular dynamics and disease. Yet, whil… (see more)e the potential of this technology is vast, the integration of its multifaceted data presents challenges. Some modalities have not reached the robustness or clarity of established scRNA-seq. Coupled with data scarcity for newer modalities and integration intricacies, these challenges limit our ability to maximize single-cell omics benefits. We introduce scCross: a tool adeptly engineered using variational autoencoder, generative adversarial network principles, and the Mutual Nearest Neighbors (MNN) technique for modality alignment. This synergy ensures seamless integration of varied single-cell multi-omics data. Beyond its foundational prowess in multi-omics data integration, scCross excels in single-cell cross-modal data generation, multi-omics data simulation, and profound in-silico cellular perturbations. Armed with these capabilities, scCross is set to transform the field of single-cell research, establishing itself in the nuanced integration, generation, and simulation of complex multi-omics data.
scSemiProfiler: Advancing Large-scale Single-cell Studies through Semi-profiling with Deep Generative Models and Active Learning
Jingtao Wang
Gregory Fonseca
GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling
Yimin Fan
Adrien Osakwe
Shi Han
Yu Li
RAMEN Unveils Clinical Variable Networks for COVID-19 Severity and Long COVID Using Absorbing Random Walks and Genetic Algorithms
Yiwei Xiong
Jingtao Wang
Xiaoxiao Shang
Tingting Chen
Douglas D. Fraser
Gregory Fonseca
Simon Rousseau
The COVID-19 pandemic has significantly altered global socioeconomic structures and individual lives. Understanding the disease mechanisms a… (see more)nd facilitating diagnosis requires comprehending the complex interplay among clinical factors like demographics, symptoms, comorbidities, treatments, lab results, complications, and other metrics, and their relation to outcomes such as disease severity and long term outcomes (e.g., post-COVID-19 condition/long COVID). Conventional correlational methods struggle with indirect and directional connections among these factors, while standard graphical methods like Bayesian networks are computationally demanding for extensive clinical variables. In response, we introduced RAMEN, a methodology that integrates Genetic Algorithms with random walks for efficient Bayesian network inference, designed to map the intricate relationships among clinical variables. Applying RAMEN to the Biobanque québécoise de la COVID-19 (BQC19) dataset, we identified critical markers for long COVID and varying disease severity. The Bayesian Network, corroborated by existing literature and supported through multi-omics analyses, highlights significant clinical variables linked to COVID-19 outcomes. RAMEN’s ability to accurately map these connections contributes substantially to developing early and effective diagnostics for severe COVID-19 and long COVID.
An enhanced wideband tracking method for characteristic modes
Chao Huang
Chenjiang Guo
Xia Ma
Yi Yuan
An enhanced wideband tracking method for characteristic modes (CMs) is investigated in this paper. The method consists of three stages, and … (see more)its core tracking stage (CTS) is based on a classical eigenvector correlation-based algorithm. To decrease the tracking time and eliminate the crossing avoidance (CRA), we append a commonly used eigenvalue filter (EF) as the preprocessing stage and a novel postprocessing stage to the CTS. The proposed postprocessing stage can identify all CRA mode pairs by analyzing their trajectory and correlation characteristics. Subsequently, it can predict corresponding CRA frequencies and correct problematic qualities rapidly. Considering potential variations in eigenvector numbers at consecutive frequency samples caused by the EF, a new execution condition for the adaptive frequency adjustment in the CTS is introduced. Finally, CMs of a conductor plate and a fractal structure are investigated to demonstrate the performance of the proposed method, and the obtained results are discussed.