Yue Li

Yixuan Li

Doctorat - McGill

Superviseur⋅e principal⋅e :

Dylan Mann-Krzisnik

Doctorat - McGill

Marshall Meng

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Doctorat - McGill

Doctorat - McGill

Co-superviseur⋅e :

Jun Ding

Jack Song

Maîtrise recherche - McGill

Co-superviseur⋅e :

Wilbur Wang

Maîtrise recherche - McGill

Mona Wang Wang

Doctorat - McGill

Kunpeng Xu

Postdoctorat - McGill

Co-superviseur⋅e :

Shadi Zabad

Doctorat - McGill

Publications

TrajGPT: Healthcare Time-Series Representation Learning for Trajectory Prediction

Ziyang Song

Qincheng Lu

Mike He Zhu

David Buckeridge

In many domains, such as healthcare, time-series data is irregularly sampled with varying intervals between observations. This creates chall… (voir plus)enges for classical time-series models that require equally spaced data. To address this, we propose a novel time-series Transformer called **Trajectory Generative Pre-trained Transformer (TrajGPT)**. It introduces a data-dependent decay mechanism that adaptively forgets irrelevant information based on clinical context. By interpreting TrajGPT as ordinary differential equations (ODEs), our approach captures continuous dynamics from sparse and irregular time-series data. Experimental results show that TrajGPT, with its time-specific inference approach, accurately predicts trajectories without requiring task-specific fine-tuning.

2024-10-10

NeurIPS.cc/2024/Workshop/TSALM (publié)

openreview.net

TrajGPT: Irregular Time-Series Representation Learning for Health Trajectory Analysis

Ziyang Song

Qincheng Lu

Mike He Zhu

David Buckeridge

2024-10-03

ArXiv (prépublication)

openreview.net

MiRGraph: A hybrid deep learning approach to identify microRNA-target interactions by integrating heterogeneous regulatory network and genomic sequences

Pei Liu

Ying Liu

Jiawei Luo

MicroRNAs (miRNAs) mediates gene expression regulation by targeting specific messenger RNAs (mRNAs) in the cytoplasm. They can function as b… (voir plus)oth tumor suppressors and oncogenes depending on the specific miRNA and its target genes. Detecting miRNA-target interactions (MTIs) is critical for unraveling the complex mechanisms of gene regulation and promising towards RNA therapy for cancer. There is currently a lack of MTIs prediction methods that simultaneously perform feature learning from heterogeneous gene regulatory network (GRN) and genomic sequences. To improve the prediction performance of MTIs, we present a novel transformer-based multiview feature learning method – MiRGraph, which consists of two main modules for learning the sequence-based and GRN-based feature embedding. For the former, we utilize the mature miRNA sequences and the complete 3’UTR sequence of the target mRNAs to encode sequence features using a hybrid transformer and convolutional neural network (CNN) (TransCNN) architecture. For the latter, we utilize a heterogeneous graph transformer (HGT) module to extract the relational and structural information from the GRN consisting of miRNA-miRNA, gene-gene and miRNA-target interactions. The TransCNN and HGT modules can be learned end-to-end to predict experimentally validated MTIs from MiRTarBase. MiRGraph outperforms existing methods in not only recapitulating the true MTIs but also in predicting strength of the MTIs based on the in-vitro measurements of miRNA transfections. In a case study on breast cancer, we identified plausible target genes of an oncomir.

2024-10-02

bioRxiv (prépublication)

MiRGraph: A transformer-based feature learning approach to identify microRNA-target interactions by integrating heterogeneous graph network and sequence information

Pei Liu

Yang Liu

Jiawei Luo

MicroRNAs (miRNAs) play a crucial role in the regulation of gene expression by targeting specific mRNAs. They can function as both tumor sup… (voir plus)pressors and oncogenes depending on the specific miRNA and its target genes. Detecting miRNA-target interactions (MTIs) is critical for unraveling the complex mechanisms of gene regulation and identifying therapeutic targets and diagnostic markers. There is currently a lack of MTIs prediction method that simultaneously performs feature learning on heterogeneous graph network and sequence information. To improve the prediction performance of MTIs, we present a novel transformer-based multi-view feature learning method, named MiRGraph. It consists of two main modules for learning the sequence and heterogeneous graph network, respectively. For learning the sequence-based feaature embedding, we utilize the mature miRNA sequence and the complete 3’UTR sequence of the target mRNAs to encode sequence features. Specifically, a transformer-based CNN (TransCNN) module is designed for miRNAs and genes respectively to extract their personalized sequence features. For learning the network-based feature embedding, we utilize a heterogeneous graph transformer (HGT) module to extract the relational and structural information in a heterogeneous graph consisting of miRNA-miRNA, gene-gene and miRNA-target interactions. We learn the TransCNN and HGT modules end-to-end by utilizing a feedforward network, which takes the combined embedded features of the miRNA-gene pair to predict MTIs. Comparisons with other existing MTIs prediction methods illustrates the superiority of MiRGraph under standard criteria. In a case study on breast cancer, we identified plausible target genes of an oncomir hsa-MiR-122-5p and plausible miRNAs that regulate the oncogene BRCA1.

2024-10-02

bioRxiv (prépublication)

Cell ontology guided transcriptome foundation model

Manqi Zhou

Boyu Han

Transcriptome foundation models (TFMs) hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by… (voir plus) self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present **s**ingle **c**ell, **Cell**-**o**ntology guided TFM (scCello). We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses. Source code and model weights are available at https://github.com/DeepGraphLearning/scCello.

2024-09-25

NeurIPS.cc/2024/Conference (spotlight)

openreview.net

GFETM: Genome Foundation-based Embedded Topic Model for scATAC-seq Modeling

Yimin Fan

Shi Han

2024-05-17

Lecture Notes in Computer Science (publié)

Supervised latent factor modeling isolates cell-type-specific transcriptomic modules that underlie Alzheimer’s disease progression

Liam Hodgson

Yasser Iturria-Medina

Jo Anne Stratton

Guy Wolf

Smita Krishnaswamy

David A. Bennett

Danilo Bzdok

2024-05-17

Communications Biology (publié)

Protocol to perform integrative analysis of high-dimensional single-cell multimodal data using an interpretable deep learning technique

Manqi Zhou

Hao Zhang

Zilong Bai

Dylan Mann-Krzisnik

Yi Wang

2024-05-14

STAR Protocols (publié)

MixEHR-SurG: a joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records

Yixuan Li

Ariane Marelli

Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as morta… (voir plus)lity or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.

2024-05-01

Journal of Biomedical Informatics (publié)

arxiv.org

Multi-ancestry polygenic risk scores using phylogenetic regularization

2024-02-17

bioRxiv (prépublication)

Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning

Ziyang Song

Qincheng Lu

He Zhu

David Buckeridge

Learning time-series representations for discriminative tasks, such as classification and regression, has been a long-standing challenge in … (voir plus)the healthcare domain. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on biosignals and longitudinal clinical records by both next-token and previous-token prediction in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignals and longitudinal clinical records, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from biosignal time-series sequences, even more so after fine-tuning on the task.

2024-02-14

ArXiv (prépublication)

arxiv.org

Machine Learning Informed Diagnosis for Congenital Heart Disease in Large Claims Data Source

Ariane Marelli

Chao Li

Aihua Liu

Hanh Nguyen

Harry Moroz

James M. Brophy

Liming Guo

2024-02-01

JACC: Advances (publié)