Yue Li

Associate Academic Member

Assistant Professor, McGill University, School of Computer Science

Biography

I completed my PhD degree in computer science and computational biology at the University of Toronto in 2014. Prior to joining McGill University, I was a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT (2015–2018).

In general, my research program covers three main research areas that involve applied machine learning in computational genomics and health. More specifically, it focuses on developing interpretable probabilistic learning models and deep learning models to model genetic, epigenetic, electronic health record and single-cell genomic data.

By systematically integrating multimodal and longitudinal data, I aim to have impactful applications in computational medicine, including building intelligent clinical recommender systems, forecasting patient health trajectories, making personalized polygenic risk predictions, characterizing multi-trait functional genetic mutations, and dissecting cell-type-specific regulatory elements that underpin complex traits and diseases in humans.

Current Students

Adrien Osakwe

PhD - McGill University

adrien.osakwe@mila.quebec

Github

Google Scholar

Alain Han

Undergraduate - McGill University

boyu.han@mila.quebec

Caiya Zhang

Master's Research - McGill University

caiya.zhang@mila.quebec

Doruk Cakmakci

PhD - McGill University

doruk.cakmakci@mila.quebec

PhD - McGill University

dylan.mann-krzisnik@mila.quebec

He Zhu

PhD - McGill University

he.zhu@mila.quebec

Shadi Zabad

PhD - McGill University

shadi.zabad@mila.quebec

Github

Vicky Dong

Master's Research - McGill University

vicky.dong@mila.quebec

Vishvak Raghavan

Master's Research - McGill University

Co-supervisor :

Jun Ding

vishvak.raghavan@mila.quebec

Github

Yixuan Li

PhD - McGill University

Principal supervisor :

Archer Yang

yixuan.li@mila.quebec

Ziyang Song

PhD - McGill University

ziyang.song@mila.quebec

Publications

GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling

Yimin Fan

Yu Li

Jun Ding

Yue Li

2024-03-09

bioRxiv (preprint)

doi.org

Multi-ancestry polygenic risk scores using phylogenetic regularization

Elliot Layne

Shadi Zabad

Yue Li

Mathieu Blanchette

2024-02-17

bioRxiv (preprint)

doi.org

Bidirectional Generative Pre-training for Improving Time Series Representation Learning

Ziyang Song

Qincheng Lu

He Zhu

Yue Li

2024-02-14

ArXiv (preprint)

doi.org

arxiv.org

Machine Learning Informed Diagnosis for Congenital Heart Disease in Large Claims Data Source

Ariane J. Marelli

Chao Li

Aihua Liu

Hanh Nguyen

Harry Moroz

James M. Brophy

Liming Guo

2024-02-01

JACC: Advances (published)

doi.org

MiRGraph: A transformer-based feature learning approach to identify microRNA-target interactions by integrating heterogeneous graph network and sequence information

Pei Liu

Ying Liu

Jiawei Luo

Yue Li

MicroRNAs (miRNAs) play a crucial role in the regulation of gene expression by targeting specific mRNAs. They can function as both tumor sup… (see more)pressors and oncogenes depending on the specific miRNA and its target genes. Detecting miRNA-target interactions (MTIs) is critical for unraveling the complex mechanisms of gene regulation and identifying therapeutic targets and diagnostic markers. There is currently a lack of MTIs prediction method that simultaneously performs feature learning on heterogeneous graph network and sequence information. To improve the prediction performance of MTIs, we present a novel transformer-based multi-view feature learning method, named MiRGraph. It consists of two main modules for learning the sequence and heterogeneous graph network, respectively. For learning the sequence-based feaature embedding, we utilize the mature miRNA sequence and the complete 3’UTR sequence of the target mRNAs to encode sequence features. Specifically, a transformer-based CNN (TransCNN) module is designed for miRNAs and genes respectively to extract their personalized sequence features. For learning the network-based feature embedding, we utilize a heterogeneous graph transformer (HGT) module to extract the relational and structural information in a heterogeneous graph consisting of miRNA-miRNA, gene-gene and miRNA-target interactions. We learn the TransCNN and HGT modules end-to-end by utilizing a feedforward network, which takes the combined embedded features of the miRNA-gene pair to predict MTIs. Comparisons with other existing MTIs prediction methods illustrates the superiority of MiRGraph under standard criteria. In a case study on breast cancer, we identified plausible target genes of an oncomir hsa-MiR-122-5p and plausible miRNAs that regulate the oncogene BRCA1.

2024-01-26

bioRxiv (preprint)

doi.org

Extrapolatable Transformer Pre-training for Ultra Long Time-Series Forecasting

Ziyang Song

Qincheng Lu

Hao Xu

David Buckeridge

Yue Li

2023-11-29

ArXiv (preprint)

arxiv.org

Differential Chromatin Architecture and Risk Variants in Deep Layer Excitatory Neurons and Grey Matter Microglia Contribute to Major Depressive Disorder

Anjali Chawla

Doruk Cakmakci

Wenmin Zhang

Malosree Maitra

Reza Rahimian

Haruka Mitsuhashi

MA Davoli

Jenny Yang

Gary Gang Chen

Ryan Denniston

Deborah Mash

Naguib Mechawar

Matthew Suderman

Yue Li

Corina Nagy

Gustavo Turecki

2023-10-03

bioRxiv (preprint)

doi.org

GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes

Lakshmipuram Seshadri Swapna

Michael Huang

Yue Li

2023-08-18

Genome Biology (published)

doi.org

Guided-topic modelling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes

Lakshmipuram Seshadri Swapna

Michael Huang

Yue Li

Cell-type composition is an important indicator of health. We present Guided Topic Model for deconvolution (GTM-decon) to automatically infe… (see more)r cell-type-specific gene topic distributions from single-cell RNA-seq data for deconvolving bulk transcriptomes. GTM-decon performs competitively on deconvolving simulated and real bulk data compared with the state-of-the-art methods. Moreover, as demonstrated in deconvolving disease transcriptomes, GTM-decon can infer multiple cell-type-specific gene topic distributions per cell type, which captures sub-cell-type variations. GTM-decon can also use phenotype labels from single-cell or bulk data as a guide to infer phenotype-specific gene distributions. In a nested-guided design, GTM-decon identified cell-type-specific differentially expressed genes from bulk breast cancer transcriptomes.

2023-07-03

bioRxiv (preprint)

doi.org

Biomedical discovery through the integrative biomedical knowledge hub (iBKH).

Chang Su

Yu Hou

Manqi Zhou

Suraj Rajendran

Jacqueline R.M. A. Maasch

Zehra Abedi

Haotan Zhang

Zilong Bai

Anthony Cuturrufo

Winston Guo

Fayzan F. Chaudhry

Gregory Ghahramani

Jian Tang

Feixiong Cheng

Yue Li

Rui Zhang

Steven T. DeKosky

Jiang Bian

Fei Wang

2023-04-01

iScience (published)

doi.org

Single-cell multi-omic topic embedding reveals cell-type-specific and COVID-19 severity-related immune signatures

Manqi Zhou

Hao Zhang

Zilong Bai

Dylan Mann-Krzisnik

Fei Wang

Yue Li

The advent of single-cell multi-omics sequencing technology makes it possible for re-searchers to leverage multiple modalities for individua… (see more)l cells and explore cell heterogeneity. However, the high dimensional, discrete, and sparse nature of the data make the downstream analysis particularly challenging. Most of the existing computational methods for single-cell data analysis are either limited to single modality or lack flexibility and interpretability. In this study, we propose an interpretable deep learning method called multi-omic embedded topic model (moETM) to effectively perform integrative analysis of high-dimensional single-cell multimodal data. moETM integrates multiple omics data via a product-of-experts in the encoder for efficient variational inference and then employs multiple linear decoders to learn the multi-omic signatures of the gene regulatory programs. Through comprehensive experiments on public single-cell transcriptome and chromatin accessibility data (i.e., scRNA+scATAC), as well as scRNA and proteomic data (i.e., CITE-seq), moETM demonstrates superior performance compared with six state-of-the-art single-cell data analysis methods on seven publicly available datasets. By applying moETM to the scRNA+scATAC data in human bone marrow mononuclear cells (BMMCs), we identified sequence motifs corresponding to the transcription factors that regulate immune gene signatures. Applying moETM analysis to CITE-seq data from the COVID-19 patients revealed not only known immune cell-type-specific signatures but also composite multi-omic biomarkers of critical conditions due to COVID-19, thus providing insights from both biological and clinical perspectives.

2023-01-31

bioRxiv (preprint)

doi.org

Modeling electronic health record data using an end-to-end knowledge-graph-informed topic model

Yuesong Zou

Ahmad Pesaranghader

Ziyang Song

Aman Verma

David Buckeridge

Yue Li

2022-10-25

Scientific Reports (published)

doi.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Yue Li

Biography

Current Students

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Yue Li

Biography

Current Students

Publications