Yue Li

Google Scholar

Vicky Dong

PhD - McGill University

Claris Gu

Master's Research - McGill University

Eric Huang

Master's Research - McGill University

Yixuan Li

PhD - McGill University

Principal supervisor :

Dylan Mann-Krzisnik

PhD - McGill University

Marshall Meng

Master's Research - McGill University

Principal supervisor :

Adrien Osakwe

PhD - McGill University

Google Scholar

Vishvak Raghavan

PhD - McGill University

Co-supervisor :

Jun Ding

Jack Song

Master's Research - McGill University

Co-supervisor :

Wilbur Wang

Master's Research - McGill University

Kunpeng Xu

Postdoctorate - McGill University

Co-supervisor :

Publications

Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning

Ziyang Song

Qincheng Lu

Mike He Zhu

2024-01-01

MLHC (published)

proceedings.mlr.press

Bidirectional Generative Pre-training for Improving Time Series Representation Learning

Ziyang Song

Qincheng Lu

Mike He Zhu

2024-01-01

arXiv.org (preprint)

MixEHR-SurG: a joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records

Yixuan Li

Ariane Marelli

Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as morta… (see more)lity or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.

2023-12-20

ArXiv (preprint)

TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in Healthcare

Ziyang Song

Qincheng Lu

Hao Xu

Mike He Zhu

2023-11-29

ArXiv (preprint)

MDFD: Study of Distributed Non-IID Scenarios and Frechet Distance-Based Evaluation

Wei Wang

Mingwei Zhang

Ziwen Wu

Qianxi Chen

With the development of distributed machine learning and federated learning, the solution to the data island problem is promoted. People use… (see more) computer clusters to train machine learning models on data distributed in different regions. In the early stage of research, researchers usually assume that the data sets of each node are independent identically distribution (IID), but this is a strong assumption, which is challenging to meet in practical applications. Therefore, research on non-IID has become a hot spot in recent years. However, there is no uniform standard for designing and evaluating non-IID scenarios. This paper proposes a Frechet distance-independent non-IID distribution dataset metric MDFD. And we conducted experiments on different types of distributed machine-learning methods in different non-IID scenarios to verify the effectiveness of MDFD.

2023-10-08

International Conference on Information Photonics (published)

SDWD: Style Diversity Weighted Distance Evaluates the Intra-Class Data Diversity of Distributed GANs

Wei Wang

Ziwen Wu

Mingwei Zhang

2023-10-08

2023 IEEE International Conference on Image Processing (ICIP) (published)

Differential Chromatin Architecture and Risk Variants in Deep Layer Excitatory Neurons and Grey Matter Microglia Contribute to Major Depressive Disorder

Anjali Chawla

Doruk Cakmakci

Wenmin Zhang

Malosree Maitra

Reza Rahimian

Haruka Mitsuhashi

MA Davoli

Jenny Yang

Gary Gang Chen

Ryan Denniston

Deborah Mash

Naguib Mechawar

Matthew Suderman

Corina Nagy

Gustavo Turecki

2023-10-03

bioRxiv (preprint)

GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes

Lakshmipuram Seshadri Swapna

Michael Huang

2023-08-18

Genome Biology (published)

Guided-topic modelling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes

Lakshmipuram Seshadri Swapna

Michael Huang

Cell-type composition is an important indicator of health. We present Guided Topic Model for deconvolution (GTM-decon) to automatically infe… (see more)r cell-type-specific gene topic distributions from single-cell RNA-seq data for deconvolving bulk transcriptomes. GTM-decon performs competitively on deconvolving simulated and real bulk data compared with the state-of-the-art methods. Moreover, as demonstrated in deconvolving disease transcriptomes, GTM-decon can infer multiple cell-type-specific gene topic distributions per cell type, which captures sub-cell-type variations. GTM-decon can also use phenotype labels from single-cell or bulk data as a guide to infer phenotype-specific gene distributions. In a nested-guided design, GTM-decon identified cell-type-specific differentially expressed genes from bulk breast cancer transcriptomes.

2023-07-03

bioRxiv (preprint)

Biomedical discovery through the integrative biomedical knowledge hub (iBKH).

Chang Su

Yufang Hou

Manqi Zhou

Suraj Rajendran

Jacqueline R.M. A. Maasch

Zehra Abedi

Haotan Zhang

Zilong Bai

Anthony Cuturrufo

Winston Guo

Fayzan F. Chaudhry

Gregory Ghahramani

Jian Tang

Feixiong Cheng

Rui Zhang

Steven T. DeKosky

Jiang Bian

Yi Wang

2023-04-01

iScience (published)

Single-cell multi-omic topic embedding reveals cell-type-specific and COVID-19 severity-related immune signatures

Manqi Zhou

Hao Zhang

Zilong Bai

Dylan Mann-Krzisnik

Yi Wang

The advent of single-cell multi-omics sequencing technology makes it possible for re-searchers to leverage multiple modalities for individua… (see more)l cells and explore cell heterogeneity. However, the high dimensional, discrete, and sparse nature of the data make the downstream analysis particularly challenging. Most of the existing computational methods for single-cell data analysis are either limited to single modality or lack flexibility and interpretability. In this study, we propose an interpretable deep learning method called multi-omic embedded topic model (moETM) to effectively perform integrative analysis of high-dimensional single-cell multimodal data. moETM integrates multiple omics data via a product-of-experts in the encoder for efficient variational inference and then employs multiple linear decoders to learn the multi-omic signatures of the gene regulatory programs. Through comprehensive experiments on public single-cell transcriptome and chromatin accessibility data (i.e., scRNA+scATAC), as well as scRNA and proteomic data (i.e., CITE-seq), moETM demonstrates superior performance compared with six state-of-the-art single-cell data analysis methods on seven publicly available datasets. By applying moETM to the scRNA+scATAC data in human bone marrow mononuclear cells (BMMCs), we identified sequence motifs corresponding to the transcription factors that regulate immune gene signatures. Applying moETM analysis to CITE-seq data from the COVID-19 patients revealed not only known immune cell-type-specific signatures but also composite multi-omic biomarkers of critical conditions due to COVID-19, thus providing insights from both biological and clinical perspectives.

2023-01-31

bioRxiv (preprint)

Modeling electronic health record data using an end-to-end knowledge-graph-informed topic model

Yuesong Zou

Ahmad Pesaranghader

Ziyang Song

Aman Verma

2022-10-25

Scientific Reports (published)