Jun Ding

Efficient and scalable construction of clinical variable networks for complex diseases with RAMEN.

Yiwei Xiong

Jingtao Wang

Xiaoxiao Shang

Tingting Chen

Douglas D. Fraser

Gregory Fonseca

Simon Rousseau

Jun Ding

2025-04-01

Cell Reports Methods (published)

doi.org

scCobra allows contrastive cell embedding learning with domain adaptation for single cell data integration and harmonization

Bowen Zhao

Kailu Song

Dong-Qing Wei

Yi Xiong

Jun Ding

2025-02-13

Communications Biology (published)

doi.org

DTPSP: A Deep Learning Framework for Optimized Time Point Selection in Time-Series Single-Cell Studies

Michel Hijazin

Pumeng Shi

Jingtao Wang

Jun Ding

2024-12-20

bioRxiv (preprint)

doi.org

DTPSP: A Deep Learning Framework for Optimized Time Point Selection in Time-Series Single-Cell Studies

Michel Hijazin

Pumeng Shi

Jingtao Wang

Jun Ding

Time-series studies are critical for uncovering dynamic biological processes, but achieving comprehensive profiling and resolution across mu… (see more)ltiple time points and modalities (multi-omics) remains challenging due to cost and scalability constraints. Current methods for studying temporal dynamics, whether at the bulk or single-cell level, often require extensive sampling, making it impractical to deeply profile all time points and modalities. To overcome these limitations, we present DTPSP, a deep learning framework designed to identify the most informative time points in any time-series study, enabling resource-efficient and targeted analyses. DTPSP models temporal gene expression patterns using readily obtainable data, such as bulk RNA-seq, to select time points that capture key system dynamics. It also integrates a deep generative module to infer data for non-sampled time points based on the selected time points, reconstructing the full temporal trajectory. This dual capability enables DTPSP to prioritize key time points for in-depth profiling, such as single-cell sequencing or multi-omics analyses, while filling gaps in the temporal landscape with high fidelity. We apply DTPSP to developmental and disease-associated time courses, demonstrating its ability to optimize experimental designs across bulk and single-cell studies. By reducing costs, enabling strategic multi-omics profiling, and enhancing biological insights, DTPSP provides a scalable and generalized solution for investigating dynamic systems.

2024-12-20

bioRxiv (preprint)

doi.org

CellSexID: Sex-Based Computational Tracking of Cellular Origins in Chimeric Models

Huilin Tai

Qian Li

Jingtao Wang

Jiahui Tan

Ryann Lang

Basil J. Petrof

Jun Ding

Cell tracking in chimeric models is essential yet challenging, particularly in developmental biology, regenerative medicine, and transplanta… (see more)tion studies. Existing methods, such as fluorescent labeling and genetic barcoding, are technically demanding, costly, and often impractical for dynamic, heterogeneous tissues. To address these limitations, we propose a computational framework that leverages sex as a surrogate marker for cell tracking. Our approach uses a machine learning model trained on single-cell transcriptomic data to predict cell sex with high accuracy, enabling clear distinction between donor (male) and recipient (female) cells in sex-mismatched chimeric models. The model identifies specific genes critical for sex prediction and has been validated using public datasets and experimental flow sorting, confirming the biological relevance of the identified cell populations. Applied to skeletal muscle macrophages, our method revealed distinct transcriptional profiles associated with cellular origins. This pipeline offers a robust, cost-effective solution for cell tracking in chimeric models, advancing research in regenerative medicine and immunology by providing precise insights into cellular origins and therapeutic outcomes.

2024-12-05

bioRxiv (preprint)

doi.org

CellSexID: Sex-Based Computational Tracking of Cellular Origins in Chimeric Models

Huilin Tai

Qian Li

Jingtao Wang

Jiahui Tan

Ryann Lang

Basil J. Petrof

Jun Ding

Cell tracking in chimeric models is essential yet challenging, particularly in developmental biology, regenerative medicine, and transplanta… (see more)tion studies. Existing methods, such as fluorescent labeling and genetic barcoding, are technically demanding, costly, and often impractical for dynamic, heterogeneous tissues. To address these limitations, we propose a computational framework that leverages sex as a surrogate marker for cell tracking. Our approach uses a machine learning model trained on single-cell transcriptomic data to predict cell sex with high accuracy, enabling clear distinction between donor (male) and recipient (female) cells in sex-mismatched chimeric models. The model identifies specific genes critical for sex prediction and has been validated using public datasets and experimental flow sorting, confirming the biological relevance of the identified cell populations. Applied to skeletal muscle macrophages, our method revealed distinct transcriptional profiles associated with cellular origins. This pipeline offers a robust, cost-effective solution for cell tracking in chimeric models, advancing research in regenerative medicine and immunology by providing precise insights into cellular origins and therapeutic outcomes.

2024-12-05

bioRxiv (preprint)

doi.org

MATES: A Deep Learning-Based Model for Locus-specific Quantification of Transposable Elements in Single Cell

Ruohan Wang

Yumin Zheng

Zijian Zhang

Kailu Song

Erxi Wu

Xiaopeng Zhu

Tao P. Wu

Jun Ding

Transposable elements (TEs) are crucial for genetic diversity and gene regulation. Current single-cell quantification methods often align mu… (see more)lti-mapping reads to either ‘best-mapped’ or ‘random-mapped’ locations and categorize them at sub-family levels, overlooking the biological necessity for accurate, locus-specific TE quantification. Moreover, these existing methods are primarily designed for and focused on transcriptomics data, which restricts their adaptability to single-cell data of other modalities. To address these challenges, here we introduce MATES, a novel deep-learning approach that accurately allocates multi-mapping reads to specific loci of TEs, utilizing context from adjacent read alignments flanking the TE locus. When applied to diverse single-cell omics datasets, MATES shows improved performance over existing methods, enhancing the accuracy of TE quantification and aiding in the identification of marker TEs for identified cell populations. This development enables exploring single-cell heterogeneity and gene regulation through the lens of TEs, offering a transformative tool for the single-cell genomics community.

2024-10-11

Nature Communications (published)

doi.org

scHiCyclePred: a deep learning framework for predicting cell cycle phases from single-cell Hi-C data using multi-scale interaction information

Yingfu Wu

Zhenqi Shi

Xiangfei Zhou

Pengyu Zhang

Xiuhui Yang

Jun Ding

Hao Wu

2024-07-31

Communications Biology (published)

doi.org

scCross: A Deep Generative Model for Unifying Single-cell Multi-omics with Seamless Integration, Cross-modal Generation, and In-silico Exploration

Xiuhui Yang

Koren K. Mann

Hao Wu

Jun Ding

Single-cell multi-omics illuminate intricate cellular states, yielding transformative insights into cellular dynamics and disease. Yet, whil… (see more)e the potential of this technology is vast, the integration of its multifaceted data presents challenges. Some modalities have not reached the robustness or clarity of established scRNA-seq. Coupled with data scarcity for newer modalities and integration intricacies, these challenges limit our ability to maximize single-cell omics benefits. We introduce scCross: a tool adeptly engineered using variational autoencoder, generative adversarial network principles, and the Mutual Nearest Neighbors (MNN) technique for modality alignment. This synergy ensures seamless integration of varied single-cell multi-omics data. Beyond its foundational prowess in multi-omics data integration, scCross excels in single-cell cross-modal data generation, multi-omics data simulation, and profound in-silico cellular perturbations. Armed with these capabilities, scCross is set to transform the field of single-cell research, establishing itself in the nuanced integration, generation, and simulation of complex multi-omics data.

2024-07-29

Genome Biology (published)

doi.org

scSemiProfiler: Advancing Large-scale Single-cell Studies through Semi-profiling with Deep Generative Models and Active Learning

Jingtao Wang

Gregory Fonseca

Jun Ding

2024-07-16

Nature Communications (published)

doi.org

GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling

Yimin Fan

Shi Han

2024-05-17

Lecture Notes in Computer Science (published)

doi.org

RAMEN Unveils Clinical Variable Networks for COVID-19 Severity and Long COVID Using Absorbing Random Walks and Genetic Algorithms

Yiwei Xiong

Jingtao Wang

Xiaoxiao Shang

Tingting Chen

Douglas D. Fraser

Gregory Fonseca

Simon Rousseau

Jun Ding

The COVID-19 pandemic has significantly altered global socioeconomic structures and individual lives. Understanding the disease mechanisms a… (see more)nd facilitating diagnosis requires comprehending the complex interplay among clinical factors like demographics, symptoms, comorbidities, treatments, lab results, complications, and other metrics, and their relation to outcomes such as disease severity and long term outcomes (e.g., post-COVID-19 condition/long COVID). Conventional correlational methods struggle with indirect and directional connections among these factors, while standard graphical methods like Bayesian networks are computationally demanding for extensive clinical variables. In response, we introduced RAMEN, a methodology that integrates Genetic Algorithms with random walks for efficient Bayesian network inference, designed to map the intricate relationships among clinical variables. Applying RAMEN to the Biobanque québécoise de la COVID-19 (BQC19) dataset, we identified critical markers for long COVID and varying disease severity. The Bayesian Network, corroborated by existing literature and supported through multi-omics analyses, highlights significant clinical variables linked to COVID-19 outcomes. RAMEN’s ability to accurately map these connections contributes substantially to developing early and effective diagnostics for severe COVID-19 and long COVID.

2024-02-27

bioRxiv (preprint)

doi.org

Mila Community of Practice

Custom AI Learning Programs

Mil'Haq Fest 2025

Supervision Requests

Biography

Current Students

Publications

Mila Community of Practice

Custom AI Learning Programs

Mil'Haq Fest 2025

Supervision Requests

Popular keywords:

Jun Ding

Biography

Current Students

Publications