Yue Li

PhD - McGill University

Website

Google Scholar

Vicky Dong

PhD - McGill University

Neda Esfehani

Master's Research - McGill University

Claris Gu

Master's Research - McGill University

Eric Huang

Master's Research - McGill University

Yixuan Li

PhD - McGill University

Principal supervisor :

Dylan Mann-Krzisnik

PhD - McGill University

Marshall Meng

Master's Research - McGill University

Principal supervisor :

Adrien Osakwe

PhD - McGill University

Google Scholar

Vishvak Raghavan

PhD - McGill University

Co-supervisor :

Jun Ding

Jack Song

Master's Research - McGill University

Co-supervisor :

Bo-Hong Wang

Master's Research - McGill University

Kunpeng Xu

Postdoctorate - McGill University

Co-supervisor :

Publications

TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in Healthcare

Ziyang Song

Qincheng Lu

Hao Xu

Ziqi Yang

Mike He Zhu

Motivation: Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success in Natural Language Processing a… (see more)nd Computer Vision domains. However, the development of PTMs on healthcare time-series data is lagging behind. This underscores the limitations of the existing transformer-based architectures, particularly their scalability to handle large-scale time series and ability to capture long-term temporal dependencies. Methods: In this study, we present Timely Generative Pre-trained Transformer (TimelyGPT). TimelyGPT employs an extrapolatable position (xPos) embedding to encode trend and periodic patterns into time-series representations. It also integrates recurrent attention and temporal convolution modules to effectively capture global-local temporal dependencies. Materials: We evaluated TimelyGPT on two large-scale healthcare time series datasets corresponding to continuous biosignals and irregularly-sampled time series, respectively: (1) the Sleep EDF dataset consisting of over 1.2 billion timesteps; (2) the longitudinal healthcare administrative database PopHR, comprising 489,000 patients randomly sampled from the Montreal population. Results: In forecasting continuous biosignals, TimelyGPT achieves accurate extrapolation up to 6,000 timesteps of body temperature during the sleep stage transition, given a short look-up window (i.e., prompt) containing only 2,000 timesteps. For irregularly-sampled time series, TimelyGPT with a proposed time-specific inference demonstrates high top recall scores in predicting future diagnoses using early diagnostic records, effectively handling irregular intervals between clinical records. Together, we envision TimelyGPT to be useful in various health domains, including long-term patient health state forecasting and patient risk trajectory prediction. Availability: The open-sourced code is available at Github.

2025-10-14

Health Information Science and Systems (published)

arxiv.org

Timelygpt: extrapolatable transformer pre-training for long-term time-series forecasting in healthcare

Ziyang Song

Qincheng Lu

Hao Xu

Ziqi Yang

Mike He Zhu

2025-10-14

Health Information Science and Systems (published)

www.ncbi.nlm.nih.gov

TrajGPT: Irregular Time-Series Representation Learning of Health Trajectory.

Ziyang Song

Qincheng Lu

Mike He Zhu

In the healthcare domain, time-series data are often irregularly sampled with varying intervals through outpatient visits, posing challenges… (see more) for existing models designed for equally spaced sequential data. To address this, we propose Trajectory Generative Pre-trained Transformer (TrajGPT) for representation learning on irregularly-sampled healthcare time series. TrajGPT introduces a novel Selective Recurrent Attention (SRA) module that leverages a data-dependent decay to adaptively filter irrelevant past information. As a discretized ordinary differential equation (ODE) framework, TrajGPT captures underlying continuous dynamics and enables a time-specific inference for forecasting arbitrary target timesteps without auto-regressive prediction. Experimental results based on the longitudinal EHR data PopHR from Montreal health system and eICU from PhysioNet showcase TrajGPT's superior zero-shot performance in disease forecasting, drug usage prediction, and sepsis detection. The inferred trajectories of diabetic and cardiac patients reveal meaningful comorbidity conditions, underscoring TrajGPT as a useful tool for forecasting patient health evolution.

2025-10-13

IEEE journal of biomedical and health informatics (published)

TrajGPT: Irregular Time-Series Representation Learning of Health Trajectory.

Ziyang Song

Qincheng Lu

Mike He Zhu

2025-10-13

IEEE journal of biomedical and health informatics (published)

Single-nucleus chromatin accessibility profiling identifies cell types and functional variants contributing to major depression

Anjali Chawla

Laura M. Fiori

Wenmin Zang

Malosree Maitra

Jennie Yang

Dariusz Żurawek

Gabriella Frosi

Reza Rahimian

Haruka Mitsuhashi

Maria Antonietta Davoli

Ryan Denniston

Gary Gang Chen

Volodymyr Yerko

Deborah Mash

Kiran Girdhar

Schahram Akbarian

Naguib Mechawar

Matthew Suderman

Yue Li … (see 2 more)

Corina Nagy

Gustavo Turecki

2025-08-05

Nature Genetics (published)

Single-nucleus chromatin accessibility profiling identifies cell types and functional variants contributing to major depression.

Anjali Chawla

Laura M. Fiori

Wenmin Zang

Malosree Maitra

Jennie Yang

Dariusz Żurawek

Gabriella Frosi

Reza Rahimian

Haruka Mitsuhashi

MA Davoli

Ryan Denniston

Gary Gang Chen

V. Yerko

Deborah Mash

Kiran Girdhar

S. Akbarian

Naguib Mechawar

Matthew Suderman

Yue Li … (see 2 more)

Corina Nagy

Gustavo Turecki

2025-08-05

Nature Genetics (published)

Single-nucleus chromatin accessibility profiling identifies cell types and functional variants contributing to major depression

Anjali Chawla

Laura M. Fiori

Wenmin Zang

Malosree Maitra

Jennie Yang

Dariusz Żurawek

Gabriella Frosi

Reza Rahimian

Haruka Mitsuhashi

Maria Antonietta Davoli

MA Davoli

Ryan Denniston

Gary Gang Chen

Volodymyr Yerko

Deborah Mash

Kiran Girdhar

Schahram Akbarian

Naguib Mechawar

Matthew Suderman … (see 3 more)

Corina Nagy

Gustavo Turecki

2025-08-05

Nature Genetics (published)

Toward whole-genome inference of polygenic scores with fast and memory-efficient algorithms.

Shadi Zabad

Chirayu Anant Haryan

Simon Gravel

Sanchit Misra

2025-07-03

American Journal of Human Genetics (published)

Harnessing agent-based frameworks in CellAgentChat to unravel cell-cell interactions from single-cell and spatial transcriptomics

Vishvak Raghavan

Yumin Zheng

Jun Ding

2025-07-01

Genome Research (published)

FedWeight: mitigating covariate shift of federated learning on electronic health records data through patients re-weighting

Mike He Zhu

Jun Bai

Na Li

Xiaoxiao Li

Dianbo Liu

2025-05-17

NPJ Digital Medicine (published)

ECLARE: multi-teacher contrastive learning via ensemble distillation for diagonal integration of single-cell multi-omic data

Dylan Mann-Krzisnik

Integrating multimodal single-cell data, such as scRNA-seq and scATAC-seq, is key for decoding gene regulatory networks but remains challeng… (see more)ing due to issues like feature harmonization and limited quantity of paired data. To address these challenges, we introduce ECLARE, a novel framework combining multi-teacher ensemble knowledge distillation with contrastive learning for diagonal integration of single-cell multi-omic data. ECLARE trains teacher models on paired datasets to guide a student model for unpaired data, leveraging a refined contrastive objective and transport-based loss for precise cross-modality alignment. Experiments demonstrate ECLARE’s competitive performance in cell pairing accuracy, multimodal integration and biological structure preservation, indicating that multi-teacher knowledge distillation provides an effective mean to improve a diagonal integration model beyond its zero-shot capabilities. Additionally, we validate ECLARE’s applicability through a case study on major depressive disorder (MDD) data, illustrating its capability to reveal gene regulatory insights from unpaired nuclei. While current results highlight the potential of ensemble distillation in multi-omic analyses, future work will focus on optimizing model complexity, dataset scalability, and exploring applications in diverse multi-omic contexts. ECLARE establishes a robust foundation for biologically informed single-cell data integration, facilitating advanced downstream analyses and scaling multi-omic data for training advanced machine learning models.

2025-01-27

bioRxiv (preprint)

scGraphETM: Graph-Based Deep Learning Approach for Unraveling Cell Type-Specific Gene Regulatory Networks from Single-Cell Multi-Omics Data

Wenqi Dong

Manqi Zhou

Boyu Han

Yi Wang

2025-01-27

bioRxiv (preprint)