David Buckeridge

2025-10-14

Health Information Science and Systems (published)

www.ncbi.nlm.nih.gov

TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in Healthcare

Ziyang Song

Qincheng Lu

Hao Xu

Ziqi Yang

Mike He Zhu

Motivation: Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success in Natural Language Processing a… (see more)nd Computer Vision domains. However, the development of PTMs on healthcare time-series data is lagging behind. This underscores the limitations of the existing transformer-based architectures, particularly their scalability to handle large-scale time series and ability to capture long-term temporal dependencies. Methods: In this study, we present Timely Generative Pre-trained Transformer (TimelyGPT). TimelyGPT employs an extrapolatable position (xPos) embedding to encode trend and periodic patterns into time-series representations. It also integrates recurrent attention and temporal convolution modules to effectively capture global-local temporal dependencies. Materials: We evaluated TimelyGPT on two large-scale healthcare time series datasets corresponding to continuous biosignals and irregularly-sampled time series, respectively: (1) the Sleep EDF dataset consisting of over 1.2 billion timesteps; (2) the longitudinal healthcare administrative database PopHR, comprising 489,000 patients randomly sampled from the Montreal population. Results: In forecasting continuous biosignals, TimelyGPT achieves accurate extrapolation up to 6,000 timesteps of body temperature during the sleep stage transition, given a short look-up window (i.e., prompt) containing only 2,000 timesteps. For irregularly-sampled time series, TimelyGPT with a proposed time-specific inference demonstrates high top recall scores in predicting future diagnoses using early diagnostic records, effectively handling irregular intervals between clinical records. Together, we envision TimelyGPT to be useful in various health domains, including long-term patient health state forecasting and patient risk trajectory prediction. Availability: The open-sourced code is available at Github.

2025-10-14

Health Information Science and Systems (published)

arxiv.org

TrajGPT: Irregular Time-Series Representation Learning of Health Trajectory.

Ziyang Song

Qincheng Lu

Mike He Zhu

In the healthcare domain, time-series data are often irregularly sampled with varying intervals through outpatient visits, posing challenges… (see more) for existing models designed for equally spaced sequential data. To address this, we propose Trajectory Generative Pre-trained Transformer (TrajGPT) for representation learning on irregularly-sampled healthcare time series. TrajGPT introduces a novel Selective Recurrent Attention (SRA) module that leverages a data-dependent decay to adaptively filter irrelevant past information. As a discretized ordinary differential equation (ODE) framework, TrajGPT captures underlying continuous dynamics and enables a time-specific inference for forecasting arbitrary target timesteps without auto-regressive prediction. Experimental results based on the longitudinal EHR data PopHR from Montreal health system and eICU from PhysioNet showcase TrajGPT's superior zero-shot performance in disease forecasting, drug usage prediction, and sepsis detection. The inferred trajectories of diabetic and cardiac patients reveal meaningful comorbidity conditions, underscoring TrajGPT as a useful tool for forecasting patient health evolution.

2025-10-13

IEEE journal of biomedical and health informatics (published)

TrajGPT: Irregular Time-Series Representation Learning of Health Trajectory.

Ziyang Song

Qincheng Lu

Mike He Zhu

In the healthcare domain, time-series data are often irregularly sampled with varying intervals through outpatient visits, posing challenges… (see more) for existing models designed for equally spaced sequential data. To address this, we propose Trajectory Generative Pre-trained Transformer (TrajGPT) for representation learning on irregularly-sampled healthcare time series. TrajGPT introduces a novel Selective Recurrent Attention (SRA) module that leverages a data-dependent decay to adaptively filter irrelevant past information. As a discretized ordinary differential equation (ODE) framework, TrajGPT captures underlying continuous dynamics and enables a time-specific inference for forecasting arbitrary target timesteps without auto-regressive prediction. Experimental results based on the longitudinal EHR data PopHR from Montreal health system and eICU from PhysioNet showcase TrajGPT's superior zero-shot performance in disease forecasting, drug usage prediction, and sepsis detection. The inferred trajectories of diabetic and cardiac patients reveal meaningful comorbidity conditions, underscoring TrajGPT as a useful tool for forecasting patient health evolution.

2025-10-13

IEEE journal of biomedical and health informatics (published)

The impact of statistical adjustment for assay performance on inferences from SARS-CoV-2 serological surveillance studies

Jiacheng Chen

Yuan Yu

Sheila F O’Brien

Carmen Charlton

Steven J. Drews

Jane M Heffernan

Amber M Smith

Y. Nakagama

Yasutoshi Kido

W Alton Russell

Choice of immunoassay influences population seroprevalence estimates. Post-hoc adjustments for assay performance could improve comparability… (see more) of estimates across studies and enable pooled analyses. We assessed post-hoc adjustment methods using data from 2021–2023 SARS-CoV-2 serosurveillance studies in Alberta, Canada: one that tested 124,008 blood donations using Roche immunoassays (SARS-CoV-2 nucleocapsid total antibody and anti-SARS-CoV-2 S) and another that tested 214,780 patient samples using Abbott immunoassays (SARS-CoV-2 IgG and anti-SARS-CoV-2 S). Comparing datasets, seropositivity for antibodies against nucleocapsid (anti-N) diverged after May 2022 due to differential loss of sensitivity as a function of time since infection. The commonly used Rogen-Gladen adjustment did not reduce this divergence. Regression-based adjustments using the assays’ semi-quantitative results produced more similar estimates of anti-N seroprevalence and rolling incidence proportion (proportion of individuals infected in recent months). Seropositivity for antibodies targeting SARS-CoV-2 spike protein was similar without adjustment, and concordance was not improved when applying an alternative, functional threshold. These findings suggest that assay performance substantially impacted population inferences from SARS-CoV-2 serosurveillance studies in the Omicron period. Unlike methods that ignore time-varying assay sensitivity, regression-based methods using the semi-quantitative assay resulted in increased concordance in estimated anti-N seropositivity and rolling incidence between cohorts using different assays.

2025-07-22

American Journal of Epidemiology (published)

The impact of statistical adjustment for assay performance on inferences from SARS-CoV-2 serological surveillance studies

Jiacheng Chen

Yuan Yu

Sheila F O’Brien

Carmen Charlton

Steven J. Drews

Jane M Heffernan

Amber M Smith

Yu Nakagama

Yasutoshi Kido

W Alton Russell

2025-07-22

American Journal of Epidemiology (published)

Sociodemographic characteristics of SARS-CoV-2 serosurveillance studies with diverse recruitment strategies, Canada, 2020 to 2023

Matthew J Knight

Yuan Yu

Jiacheng Chen

Sheila F O’Brien

Carmen Charlton

W Alton Russell

2025-06-03

BMC Public Health (published)

FedWeight: mitigating covariate shift of federated learning on electronic health records data through patients re-weighting

Mike He Zhu

Jun Bai

Na Li

Xiaoxiao Li

Dianbo Liu

2025-05-17

NPJ Digital Medicine (published)

A three-state coupled Markov switching model for COVID-19 outbreaks across Quebec based on hospital admissions

Dirk Douwes-Schultz

Alexandra M. Schmidt

Yannan Shen

2025-03-01

The Annals of Applied Statistics (published)

arxiv.org

Characterizing co-purchased food products with soda, fresh fruits, and fresh vegetables using loyalty card purchasing data in Montréal, Canada, 2015–2017

Hiroshi Mamiya

Kody Crowell

Catherine L. Mah

Amélie Quesnel-Vallée

Aman Verma

2025-02-17

The International Journal of Behavioral Nutrition and Physical Activity (published)

Sociodemographic characteristics of SARS-CoV-2 serosurveillance studies with diverse recruitment strategies, Canada, 2020 to 2023

Matthew J Knight

Yuan Yu

Jiacheng Chen

Sheila F O’Brien

Carmen Charlton

W Alton Russell

Background. Serological testing was a key component of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) surveillance. Social dis… (see more)tancing interventions, resource limitations, and the need for timely data led to serosurveillance studies using a range of recruitment strategies, which likely influenced study representativeness. Characterizing representativeness in surveillance is crucial to identify gaps in sampling coverage and to assess health inequities. Methods. We retrospectively analyzed three pre-existing longitudinal cohorts, two convenience samples using residual blood, and one de novo probabilistic survey conducted in Canada between April 2020 - November 2023. We calculated study specimen counts by age, sex, urbanicity, race/ethnicity, and neighborhood deprivation quintiles. We derived a 'representation ratio' as a simple metric to assess generalizability to a target population and various sociodemographic strata. Results. The six studies included 1,321,675 specimens. When stratifying by age group and sex, 65% of racialized minority subgroups were moderately underrepresented (representation ratio 0.75). Representation was generally higher for older Canadians, urban neighborhoods, and neighborhoods with low material deprivation. Rural representation was highest in a study that used outpatient laboratory blood specimens. Racialized minority representation was highest in a de novo probabilistic survey cohort. Conclusion. While no study had adequate representation of all subgroups, less traditional recruitment strategies were more representative of some population dimensions. Understanding demographic representativeness and barriers to recruitment are important considerations when designing population health surveillance studies.

2024-12-31

medRxiv (preprint)

Extrapolatable Transformer Pre-training for Ultra Long Time-Series Forecasting

Ziyang Song

Qincheng Lu

Hao Xu

Mike He Zhu

2024-12-16

Proceedings of the 15th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (published)