Portrait of Jun Ding

Jun Ding

Affiliate Member
Assistant professor, McGill University, Department of Medicine
Research Topics
Computational Biology
Medical Machine Learning
Representation Learning

Biography

Jun Ding is an assistant professor in the Department of Medicine of the Faculty of Medicine and Health Sciences at McGill University.

Alongside his team, he is dedicated to employing machine learning techniques to decipher the complex dynamics of cells in various diseases, such as developmental disorders, pulmonary diseases and cancers. The diverse and intricate nature of these conditions necessitates innovative approaches, prompting the use of state-of-the-art single-cell technologies to meticulously profile individual cell states. The result is a rich source of data for our machine learning models.

These technologies present unprecedented opportunities to advance understanding, particularly in fields like developmental and cancer biology. However, the challenge is to develop computational models capable of linking this intricate biomedical data to potential discoveries.

Ding’s primary focus lies in the development and refinement of machine learning methodologies, especially probabilistic graphical models, to effectively analyze, model and visualize both single-cell and bulk omics data, often featuring longitudinal or spatial dimensions. The goal is to harness these advanced machine learning techniques to deepen the comprehension of cellular dynamics, and so develop groundbreaking diagnostic and therapeutic strategies that can significantly benefit public health.

Current Students

PhD - McGill University
Principal supervisor :

Publications

DTractor enhances cell type deconvolution in spatial transcriptomics by integrating deep neural networks, transfer learning, and matrix factorization
Yong Jin Kweon
Chenyu Liu
Gregory Fonseca
Spatial transcriptomics (ST) captures gene expression with spatial context but lacks single-cell resolution. Single-cell RNA sequencing (scR… (see more)NA-seq) offers high-resolution profiles without spatial information. Accurate spot-level decomposition requires effective integration of both. We present DTractor, a deep learning-based framework that improves cell-type deconvolution in ST data through spatial constraints and transfer learning. DTractor achieves dual utilization of scRNA-seq reference data by incorporating both a cell-type-specific gene expression matrix and learned latent embeddings into a unified matrix factorization model. This joint modeling enables accurate estimation of cell-type proportions and cell-type-resolved gene expression within each spatial spot, while preserving biological and spatial coherence. DTractor further applies spatial regularization to maintain local tissue structure. Across multiple ST platforms and tissue types, DTractor demonstrates improved decomposition accuracy, robustness, and interpretability compared to existing methods. The results from DTractor support downstream applications such as spatial domain analysis and the study of spatially organized cellular behaviors.
Efficient and scalable construction of clinical variable networks for complex diseases with RAMEN.
Yiwei Xiong
Jingtao Wang
Tingting Chen
Douglas D. Fraser
Gregory Fonseca
Simon Rousseau
scCobra allows contrastive cell embedding learning with domain adaptation for single cell data integration and harmonization
Bowen Zhao
Kailu Song
Dong-Qing Wei
Yi Xiong
Single-Cell Multi-Omics Profiling of Immune Cells Isolated from Atherosclerotic Plaques in Male ApoE Knockout Mice Exposed to Arsenic
Kiran Makhani
Xiuhui Yang
France Dierick
Nivetha Subramaniam
Natascha Gagnon
Talin Ebrahimian
Stephanie Lehoux
Hao Wu
Koren K. Mann
Millions worldwide are exposed to elevated levels of arsenic that significantly increase their risk of developing atherosclerosis, a patholo… (see more)gy primarily driven by immune cells. While the impact of arsenic on immune cell populations in atherosclerotic plaques has been broadly characterized, cellular heterogeneity is a substantial barrier to in-depth examinations of the cellular dynamics for varying immune cell populations. This study aimed to conduct single-cell multi-omics profiling of atherosclerotic plaques in apolipoprotein E knockout (ApoE–/–) mice to elucidate transcriptomic and epigenetic changes in immune cells induced by arsenic exposure. The ApoE–/– mice were fed a high-fat diet and were exposed to either 200 ppb arsenic in drinking water or a tap water control, and single-cell multi-omics profiling was performed on atherosclerotic plaque-resident immune cells. Transcriptomic and epigenetic changes in immune cells were analyzed within the same cell to understand the effects of arsenic exposure. Our data revealed that the transcriptional profile of macrophages from arsenic-exposed mice were significantly different from that of control mice and that differences were subtype specific and associated with cell–cell interaction and cell fates. Additionally, our data suggest that differences in arsenic-mediated changes in chromosome accessibility in arsenic-exposed mice were statistically more likely to be due to factors other than random variation compared to their effects on the transcriptome, revealing markers of arsenic exposure and potential targets for intervention. These findings in mice provide insights into how arsenic exposure impacts immune cell types in atherosclerosis, highlighting the importance of considering cellular heterogeneity in studying such effects. The identification of subtype-specific differences and potential intervention targets underscores the significance of understanding the molecular mechanisms underlying arsenic-induced atherosclerosis. Further research is warranted to validate these findings and explore therapeutic interventions targeting immune cell dysfunction in arsenic-exposed individuals. https://doi.org/10.1289/EHP14285
DTPSP: A Deep Learning Framework for Optimized Time Point Selection in Time-Series Single-Cell Studies
Michel Hijazin
Pumeng Shi
Jingtao Wang
Time-series studies are critical for uncovering dynamic biological processes, but achieving comprehensive profiling and resolution across mu… (see more)ltiple time points and modalities (multi-omics) remains challenging due to cost and scalability constraints. Current methods for studying temporal dynamics, whether at the bulk or single-cell level, often require extensive sampling, making it impractical to deeply profile all time points and modalities. To overcome these limitations, we present DTPSP, a deep learning framework designed to identify the most informative time points in any time-series study, enabling resource-efficient and targeted analyses. DTPSP models temporal gene expression patterns using readily obtainable data, such as bulk RNA-seq, to select time points that capture key system dynamics. It also integrates a deep generative module to infer data for non-sampled time points based on the selected time points, reconstructing the full temporal trajectory. This dual capability enables DTPSP to prioritize key time points for in-depth profiling, such as single-cell sequencing or multi-omics analyses, while filling gaps in the temporal landscape with high fidelity. We apply DTPSP to developmental and disease-associated time courses, demonstrating its ability to optimize experimental designs across bulk and single-cell studies. By reducing costs, enabling strategic multi-omics profiling, and enhancing biological insights, DTPSP provides a scalable and generalized solution for investigating dynamic systems.
MATES: a deep learning-based model for locus-specific quantification of transposable elements in single cell
Ruohan Wang
Yumin Zheng
Zijian Zhang
Kailu Song
Erxi Wu
Xiaopeng Zhu
Tao P. Wu
Transposable elements (TEs) are crucial for genetic diversity and gene regulation. Current single-cell quantification methods often align mu… (see more)lti-mapping reads to either ‘best-mapped’ or ‘random-mapped’ locations and categorize them at the subfamily levels, overlooking the biological necessity for accurate, locus-specific TE quantification. Moreover, these existing methods are primarily designed for and focused on transcriptomics data, which restricts their adaptability to single-cell data of other modalities. To address these challenges, here we introduce MATES, a deep-learning approach that accurately allocates multi-mapping reads to specific loci of TEs, utilizing context from adjacent read alignments flanking the TE locus. When applied to diverse single-cell omics datasets, MATES shows improved performance over existing methods, enhancing the accuracy of TE quantification and aiding in the identification of marker TEs for identified cell populations. This development facilitates the exploration of single-cell heterogeneity and gene regulation through the lens of TEs, offering an effective transposon quantification tool for the single-cell genomics community.
scHiCyclePred: a deep learning framework for predicting cell cycle phases from single-cell Hi-C data using multi-scale interaction information
Yingfu Wu
Zhenqi Shi
Xiangfei Zhou
Pengyu Zhang
Xiuhui Yang
Hao Wu
scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in silico exploration
Xiuhui Yang
Koren K. Mann
Hao Wu
Single-cell multi-omics data reveal complex cellular states, providing significant insights into cellular dynamics and disease. Yet, integra… (see more)tion of multi-omics data presents challenges. Some modalities have not reached the robustness or clarity of established transcriptomics. Coupled with data scarcity for less established modalities and integration intricacies, these challenges limit our ability to maximize single-cell omics benefits. We introduce scCross, a tool leveraging variational autoencoders, generative adversarial networks, and the mutual nearest neighbors (MNN) technique for modality alignment. By enabling single-cell cross-modal data generation, multi-omics data simulation, and in silico cellular perturbations, scCross enhances the utility of single-cell multi-omics studies. The online version contains supplementary material available at 10.1186/s13059-024-03338-z.
scSemiProfiler: Advancing Large-scale Single-cell Studies through Semi-profiling with Deep Generative Models and Active Learning
Jingtao Wang
Gregory Fonseca
Single-cell sequencing is a crucial tool for dissecting the cellular intricacies of complex diseases. Its prohibitive cost, however, hampers… (see more) its application in expansive biomedical studies. Traditional cellular deconvolution approaches can infer cell type proportions from more affordable bulk sequencing data, yet they fall short in providing the detailed resolution required for single-cell-level analyses. To overcome this challenge, we introduce “scSemiProfiler”, an innovative computational framework that marries deep generative models with active learning strategies. This method adeptly infers single-cell profiles across large cohorts by fusing bulk sequencing data with targeted single-cell sequencing from a few rigorously chosen representatives. Extensive validation across heterogeneous datasets verifies the precision of our semi-profiling approach, aligning closely with true single-cell profiling data and empowering refined cellular analyses. Originally developed for extensive disease cohorts, “scSemiProfiler” is adaptable for broad applications. It provides a scalable, cost-effective solution for single-cell profiling, facilitating in-depth cellular investigation in various biological domains.
RAMEN Unveils Clinical Variable Networks for COVID-19 Severity and Long COVID Using Absorbing Random Walks and Genetic Algorithms
Yiwei Xiong
Jingtao Wang
Tingting Chen
Douglas D. Fraser
Gregory Fonseca
Simon Rousseau
The COVID-19 pandemic has significantly altered global socioeconomic structures and individual lives. Understanding the disease mechanisms a… (see more)nd facilitating diagnosis requires comprehending the complex interplay among clinical factors like demographics, symptoms, comorbidities, treatments, lab results, complications, and other metrics, and their relation to outcomes such as disease severity and long term outcomes ( e . g ., post-COVID-19 condition/long COVID). Conventional correlational methods struggle with indirect and directional connections among these factors, while standard graphical methods like Bayesian networks are computationally demanding for extensive clinical variables. In response, we introduced RAMEN, a methodology that integrates Genetic Algorithms with random walks for efficient Bayesian network inference, designed to map the intricate relationships among clinical variables. Applying RAMEN to the Biobanque québécoise de la COVID-19 (BQC19) dataset, we identified critical markers for long COVID and varying disease severity. The Bayesian Network, corroborated by existing literature and supported through multi-omics analyses, highlights significant clinical variables linked to COVID-19 outcomes. RAMEN’s ability to accurately map these connections contributes substantially to developing early and effective diagnostics for severe COVID-19 and long COVID.
An enhanced wideband tracking method for characteristic modes
Chao Huang
Chenjiang Guo
Xia Ma
Yi Yuan
An enhanced wideband tracking method for characteristic modes (CMs) is investigated in this paper. The method consists of three stages, and … (see more)its core tracking stage (CTS) is based on a classical eigenvector correlation-based algorithm. To decrease the tracking time and eliminate the crossing avoidance (CRA), we append a commonly used eigenvalue filter (EF) as the preprocessing stage and a novel postprocessing stage to the CTS. The proposed postprocessing stage can identify all CRA mode pairs by analyzing their trajectory and correlation characteristics. Subsequently, it can predict corresponding CRA frequencies and correct problematic qualities rapidly. Considering potential variations in eigenvector numbers at consecutive frequency samples caused by the EF, a new execution condition for the adaptive frequency adjustment in the CTS is introduced. Finally, CMs of a conductor plate and a fractal structure are investigated to demonstrate the performance of the proposed method, and the obtained results are discussed.
Author Correction: BCG immunization induces CX3CR1hi effector memory T cells to provide cross-protection via IFN-γ-mediated trained immunity.
Kim A. Tran
Erwan Pernet
Mina Sadeghi
Jeffrey Downey
Julia Chronopoulos
Elizabeth Lapshina
Oscar Tsai
Eva Kaufmann
Maziar Divangahi