Yue Li

Yixuan Li

Doctorat - McGill

Superviseur⋅e principal⋅e :

Archer Yang

Dylan Mann-Krzisnik

Doctorat - McGill

Marshall Meng

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Maîtrise recherche - McGill

Co-superviseur⋅e :

Jun Ding

Ziyang Song

Doctorat - McGill

Wilbur Wang

Maîtrise recherche - McGill

Shadi Zabad

Doctorat - McGill

Caiya Zhang

Maîtrise recherche - McGill

He Zhu

Doctorat - McGill

Publications

FedWeight: mitigating covariate shift of federated learning on electronic health records data through patients re-weighting

Mike He Zhu

Jun Bai

Na Li

Xiaoxiao Li

Dianbo Liu

2025-05-17

NPJ Digital Medicine (publié)

Harnessing agent-based frameworks in CellAgentChat to unravel cell-cell interactions from single-cell and spatial transcriptomics

Vishvak Raghavan

Yumin Zheng

Jun Ding

2025-05-02

Genome Research (publié)

Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms.

Shadi Zabad

Chirayu Anant Haryan

Simon Gravel

Sanchit Misra

2025-05-01

American Journal of Human Genetics (publié)

ECLARE: multi-teacher contrastive learning via ensemble distillation for diagonal integration of single-cell multi-omic data

Dylan Mann-Krzisnik

Integrating multimodal single-cell data, such as scRNA-seq and scATAC-seq, is key for decoding gene regulatory networks but remains challeng… (voir plus)ing due to issues like feature harmonization and limited quantity of paired data. To address these challenges, we introduce ECLARE, a novel framework combining multi-teacher ensemble knowledge distillation with contrastive learning for diagonal integration of single-cell multi-omic data. ECLARE trains teacher models on paired datasets to guide a student model for unpaired data, leveraging a refined contrastive objective and transport-based loss for precise cross-modality alignment. Experiments demonstrate ECLARE’s competitive performance in cell pairing accuracy, multimodal integration and biological structure preservation, indicating that multi-teacher knowledge distillation provides an effective mean to improve a diagonal integration model beyond its zero-shot capabilities. Additionally, we validate ECLARE’s applicability through a case study on major depressive disorder (MDD) data, illustrating its capability to reveal gene regulatory insights from unpaired nuclei. While current results highlight the potential of ensemble distillation in multi-omic analyses, future work will focus on optimizing model complexity, dataset scalability, and exploring applications in diverse multi-omic contexts. ECLARE establishes a robust foundation for biologically informed single-cell data integration, facilitating advanced downstream analyses and scaling multi-omic data for training advanced machine learning models.

2025-01-27

bioRxiv (prépublication)

scGraphETM: Graph-Based Deep Learning Approach for Unraveling Cell Type-Specific Gene Regulatory Networks from Single-Cell Multi-Omics Data

Wenqi Dong

Manqi Zhou

Boyu Han

Yi Wang

2025-01-27

bioRxiv (prépublication)

SpaTM: Topic Models for Inferring Spatially Informed Transcriptional Programs

Adrien Osakwe

Wenqi Dong

Qihuang Zhang

Robert Sladek

Spatial transcriptomics has revolutionized our ability to characterize tissues and diseases by contextualizing gene expression with spatial … (voir plus)organization. Available methods require researchers to either train a model using histology-based annotations or use annotation-free clustering approaches to uncover spatial domains. However, few methods provide researchers with a way to jointly analyze spatial data from both annotation-free and annotation-guided perspectives using consistent inductive biases and levels of interpretability. A single framework with consistent inductive biases ensures coherence and transferability across tasks, reducing the risks of conflicting assumptions. To this end, we propose the Spatial Topic Model (SpaTM), a topic-modeling framework capable of annotation-guided and annotation-free analysis of spatial transcriptomics data. SpaTM can be used to learn gene programs that represent histology-based annotations while providing researchers with the ability to infer spatial domains with an annotation-free approach if manual annotations are limited or noisy. We demonstrate SpaTM’s interpretability with its use of topic mixtures to represent cell states and transcriptional programs and how its intuitive framework facilitates the integration of annotation-guided and annotation-free analyses of spatial data with downstream analyses such as cell type deconvolution. Finally, we demonstrate how both approaches can be used to extend the analysis of large-scale snRNA-seq atlases with the inference of cell proximity and spatial annotations in human brains with Major Depressive Disorder.

2025-01-27

bioRxiv (prépublication)

Towards whole-genome inference of polygenic scores with fast and memory-efficient algorithms

Shadi Zabad

Chirayu Anant Haryan

Simon Gravel

Sanchit Misra

2025-01-22

bioRxiv (prépublication)

Extrapolatable Transformer Pre-training for Ultra Long Time-Series Forecasting

Ziyang Song

Qincheng Lu

Hao Xu

Mike He Zhu

2024-12-16

Proceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (publié)

arxiv.org

MixEHR-Nest: Identifying Subphenotypes within Electronic Health Records through Hierarchical Guided-Topic Modeling

Ruohan Wang

Zilong Wang

Ziyang Song

Automatic subphenotyping from electronic health records (EHRs)provides numerous opportunities to understand diseases with unique subgroups a… (voir plus)nd enhance personalized medicine for patients. However, existing machine learning algorithms either focus on specific diseases for better interpretability or produce coarse-grained phenotype topics without considering nuanced disease patterns. In this study, we propose a guided topic model, MixEHR-Nest, to infer sub-phenotype topics from thousands of disease using multi-modal EHR data. Specifically, MixEHR-Nest detects multiple subtopics from each phenotype topic, whose prior is guided by the expert-curated phenotype concepts such as Phenotype Codes (PheCodes) or Clinical Classification Software (CCS) codes. We evaluated MixEHR-Nest on two EHR datasets: (1) the MIMIC-III dataset consisting of over 38 thousand patients from intensive care unit (ICU) from Beth Israel Deaconess Medical Center (BIDMC) in Boston, USA; (2) the healthcare administrative database PopHR, comprising 1.3 million patients from Montreal, Canada. Experimental results demonstrate that MixEHR-Nest can identify subphenotypes with distinct patterns within each phenotype, which are predictive for disease progression and severity. Consequently, MixEHR-Nest distinguishes between type 1 and type 2 diabetes by inferring subphenotypes using CCS codes, which do not differentiate these two subtype concepts. Additionally, MixEHR-Nest not only improved the prediction accuracy of short-term mortality of ICU patients and initial insulin treatment in diabetic patients but also revealed the contributions of subphenotypes. For longitudinal analysis, MixEHR-Nest identified subphenotypes of distinct age prevalence under the same phenotypes, such as asthma, leukemia, epilepsy, and depression. The MixEHR-Nest software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-Nest.

2024-12-16

Proceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (publié)

arxiv.org

scMoE: single-cell mixture of experts for learning hierarchical, cell-type-specific, and interpretable representations from heterogeneous scRNA-seq data

Michael Huang

2024-12-16

Proceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (publié)

MiRGraph: A hybrid deep learning approach to identify microRNA-target interactions by integrating heterogeneous regulatory network and genomic sequences

Pei Liu

Yong Liu

Jiawei Luo

2024-12-03

2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (publié)

Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning

Ziyang Song

Qincheng Lu

Mike He Zhu