Portrait of Yue Li

Yue Li

Associate Academic Member
Assistant Professor, McGill University, School of Computer Science
Research Topics
Computational Biology

Biography

I completed my PhD degree in computer science and computational biology at the University of Toronto in 2014. Prior to joining McGill University, I was a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT (2015–2018).

In general, my research program covers three main research areas that involve applied machine learning in computational genomics and health. More specifically, it focuses on developing interpretable probabilistic learning models and deep learning models to model genetic, epigenetic, electronic health record and single-cell genomic data.

By systematically integrating multimodal and longitudinal data, I aim to have impactful applications in computational medicine, including building intelligent clinical recommender systems, forecasting patient health trajectories, making personalized polygenic risk predictions, characterizing multi-trait functional genetic mutations, and dissecting cell-type-specific regulatory elements that underpin complex traits and diseases in humans.

Current Students

PhD - McGill University
Master's Research - McGill University
Master's Research - McGill University
PhD - McGill University
Principal supervisor :
PhD - McGill University
Master's Research - McGill University
Principal supervisor :
PhD - McGill University
Master's Research - McGill University
Co-supervisor :
PhD - McGill University
Collaborating Alumni - McGill University
Master's Research - McGill University
PhD - McGill University
Master's Research - McGill University
PhD - McGill University

Publications

ECLARE: multi-teacher contrastive learning via ensemble distillation for diagonal integration of single-cell multi-omic data
Dylan Mann-Krzisnik
Integrating multimodal single-cell data, such as scRNA-seq and scATAC-seq, is key for decoding gene regulatory networks but remains challeng… (see more)ing due to issues like feature harmonization and limited quantity of paired data. To address these challenges, we introduce ECLARE, a novel framework combining multi-teacher ensemble knowledge distillation with contrastive learning for diagonal integration of single-cell multi-omic data. ECLARE trains teacher models on paired datasets to guide a student model for unpaired data, leveraging a refined contrastive objective and transport-based loss for precise cross-modality alignment. Experiments demonstrate ECLARE’s competitive performance in cell pairing accuracy, multimodal integration and biological structure preservation, indicating that multi-teacher knowledge distillation provides an effective mean to improve a diagonal integration model beyond its zero-shot capabilities. Additionally, we validate ECLARE’s applicability through a case study on major depressive disorder (MDD) data, illustrating its capability to reveal gene regulatory insights from unpaired nuclei. While current results highlight the potential of ensemble distillation in multi-omic analyses, future work will focus on optimizing model complexity, dataset scalability, and exploring applications in diverse multi-omic contexts. ECLARE establishes a robust foundation for biologically informed single-cell data integration, facilitating advanced downstream analyses and scaling multi-omic data for training advanced machine learning models.
scGraphETM: Graph-Based Deep Learning Approach for Unraveling Cell Type-Specific Gene Regulatory Networks from Single-Cell Multi-Omics Data
Wenqi Dong
Manqi Zhou
Boyu Han
Yi Wang
SpaTM: Topic Models for Inferring Spatially Informed Transcriptional Programs
Adrien Osakwe
Wenqi Dong
Qihuang Zhang
Robert Sladek
Spatial transcriptomics has revolutionized our ability to characterize tissues and diseases by contextualizing gene expression with spatial … (see more)organization. Available methods require researchers to either train a model using histology-based annotations or use annotation-free clustering approaches to uncover spatial domains. However, few methods provide researchers with a way to jointly analyze spatial data from both annotation-free and annotation-guided perspectives using consistent inductive biases and levels of interpretability. A single framework with consistent inductive biases ensures coherence and transferability across tasks, reducing the risks of conflicting assumptions. To this end, we propose the Spatial Topic Model (SpaTM), a topic-modeling framework capable of annotation-guided and annotation-free analysis of spatial transcriptomics data. SpaTM can be used to learn gene programs that represent histology-based annotations while providing researchers with the ability to infer spatial domains with an annotation-free approach if manual annotations are limited or noisy. We demonstrate SpaTM’s interpretability with its use of topic mixtures to represent cell states and transcriptional programs and how its intuitive framework facilitates the integration of annotation-guided and annotation-free analyses of spatial data with downstream analyses such as cell type deconvolution. Finally, we demonstrate how both approaches can be used to extend the analysis of large-scale snRNA-seq atlases with the inference of cell proximity and spatial annotations in human brains with Major Depressive Disorder.
Towards whole-genome inference of polygenic scores with fast and memory-efficient algorithms
Shadi Zabad
Chirayu Anant Haryan
Simon Gravel
Sanchit Misra
Extrapolatable Transformer Pre-training for Ultra Long Time-Series Forecasting
Ziyang Song
Qincheng Lu
Hao Xu
Mike He Zhu
scMoE: single-cell mixture of experts for learning hierarchical, cell-type-specific, and interpretable representations from heterogeneous scRNA-seq data
Michael Huang
MiRGraph: A hybrid deep learning approach to identify microRNA-target interactions by integrating heterogeneous regulatory network and genomic sequences
Pei Liu
Ying Liu
Jiawei Luo
Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning
Ziyang Song
Qincheng Lu
Mike He Zhu
Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning
Ziyang Song
Qincheng Lu
He Zhu
Learning time-series representations for discriminative tasks, such as classification and regression, has been a long-standing challenge in … (see more)the healthcare domain. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on biosignals and longitudinal clinical records by both next-token and previous-token prediction in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignals and longitudinal clinical records, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from biosignal time-series sequences, even more so after fine-tuning on the task.
Abstract 4142894: Multimorbidity Trajectories Across the Lifespan in Patients with Congenital Heart Disease
Chao Li
Aihua Liu
Solomon Bendayan
Liming Guo
Judith Therrien
Robyn Tamblyn
Jay Brophy
Ariane Marelli
Background: Befitted from advances in medical care, patients with congenital heart disease (CHD) now survive to adulthood but face elevated… (see more) risks of both cardiac and non-cardiac complications. Understanding the trajectories of comorbidity development over a patient's lifespan is cornerstone to optimize care expected to improve long-term health outcomes. Research Aim: This study aims to investigate the temporal sequences and evolution of comorbidities in CHD patients across their lifespan. We hypothesize that multimorbidity trajectories in CHD patients are linked to CHD lesion severity and age at onset of specific comorbidities. Methods: Using the Quebec CHD database which comprised data in outpatient visits, hospitalization records and vital status from 1983 to 2017, we designed a longitudinal cohort study evaluating the development of 39 comorbidities coded using ICD-9/10. Temporal sequences were mapped using median age of onset. Associations between disease pairs were quantified by hazard ratios from Cox proportional hazard models adjusting for age, sex, genetic syndrome, competing risks of death, and taking into account the time-varying nature of the predictor diseases. Results: The cohort included 9,764 individuals with severe and 127,729 with non-severe CHD lesions. In severe CHD patients, most comorbidities developed between ages 25 and 40. Comorbidity progression began with childhood cardiovascular diseases, followed by systemic diseases such as diabetes, liver and kidney diseases, and advanced to heart failure and dementia in middle adulthood. In addition, mental disorders emerged in early adulthood and were associated with subsequent development of kidney diseases and dementia. Different trajectories were observed in non-severe CHD patients with 2-3 decades later disease onsets and non-differential onsets between cardiovascular and systemic complications (Figure). Conclusions: Distinct multimorbidity trajectories were observed in CHD patients by CHD lesion severity. In patients with severe CHD lesions, early systemic diseases significantly influenced subsequent complications. These findings highlight the need for well-timed surveillance guidelines and interventions to improve health outcomes.
scMoE: single-cell mixture of experts for learning hierarchical, cell-type-specific, and interpretable representations from heterogeneous scRNA-seq data
Michael Huang
Advancements in single-cell transcriptomics methods have resulted in a wealth of single-cell RNA sequencing (scRNA-seq) data. Methods to lea… (see more)rn cell representation from atlas-level scRNA-seq data across diverse tissues can shed light into cell functions implicated in diseases such as cancer. However, integrating large-scale and heterogeneous scRNA-seq data is challenging due to the disparity of cell-types and batch effects. We present single-cell Mixture of Expert (scMoE), a hierarchical mixture of experts single-cell topic model. Our key contributions are the cell-type specific experts, which explicitly aligns topics with cell-types, and the integration of hierarchical cell-type lineages and domain knowledge. scMoE is both transferable and highly interpretable. We benchmarked our scMoE’s performance on 9 single-cell RNA-seq datasets for clustering and 3 simulated spatial datasets for spatial deconvolution. We additionally show that our model, using single-cell references, yields meaningful biological results by deconvolving 3 cancer bulk RNA-seq datasets and 2 spatial transcriptomics datasets. scMoE is able to identify cell-types of survival importance, find cancer subtype specific deconvolutional patterns, and capture meaningful spatially distinct cell-type distributions.
ConvNTC: Convolutional neural tensor completion for predicting the disease-related miRNA pairs and cell-related drug pairs
Pei Liu
Xiao Liang
Jiawei Luo