Portrait of David Buckeridge

David Buckeridge

Associate Academic Member
Full Professor, McGill University, Department of Epidemiology, Biostatistics and Occupational Health
Research Topics
Medical Machine Learning

Biography

David Buckeridge is a professor at the School of Population and Global Health at McGill University, as well as chief digital health officer for the McGill University Health Centre and executive scientific director of the Public Health Agency of Canada.

A Tier 1 Canada Research Chair in Health Informatics and Data Science, Buckeridge has projected health system demand for the Canadian province of Quebec, led data management and analytics for the Canadian Immunity Task Force, and supported the World Health Organization in monitoring global immunity to SARS-CoV-2. He has an MD from Queen's University, an MSc in epidemiology from the University of Toronto and a PhD in biomedical informatics from Stanford University. He is a Fellow of the Royal College of Physicians of Canada.

Current Students

PhD - McGill University
Master's Research - McGill University
Master's Research - McGill University
Master's Research - McGill University

Publications

Timelygpt: extrapolatable transformer pre-training for long-term time-series forecasting in healthcare
Qincheng Lu
Hao Xu
Ziqi Yang
Mike He Zhu
TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in Healthcare
Qincheng Lu
Hao Xu
Ziqi Yang
Mike He Zhu
Motivation: Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success in Natural Language Processing a… (see more)nd Computer Vision domains. However, the development of PTMs on healthcare time-series data is lagging behind. This underscores the limitations of the existing transformer-based architectures, particularly their scalability to handle large-scale time series and ability to capture long-term temporal dependencies. Methods: In this study, we present Timely Generative Pre-trained Transformer (TimelyGPT). TimelyGPT employs an extrapolatable position (xPos) embedding to encode trend and periodic patterns into time-series representations. It also integrates recurrent attention and temporal convolution modules to effectively capture global-local temporal dependencies. Materials: We evaluated TimelyGPT on two large-scale healthcare time series datasets corresponding to continuous biosignals and irregularly-sampled time series, respectively: (1) the Sleep EDF dataset consisting of over 1.2 billion timesteps; (2) the longitudinal healthcare administrative database PopHR, comprising 489,000 patients randomly sampled from the Montreal population. Results: In forecasting continuous biosignals, TimelyGPT achieves accurate extrapolation up to 6,000 timesteps of body temperature during the sleep stage transition, given a short look-up window (i.e., prompt) containing only 2,000 timesteps. For irregularly-sampled time series, TimelyGPT with a proposed time-specific inference demonstrates high top recall scores in predicting future diagnoses using early diagnostic records, effectively handling irregular intervals between clinical records. Together, we envision TimelyGPT to be useful in various health domains, including long-term patient health state forecasting and patient risk trajectory prediction. Availability: The open-sourced code is available at Github.
TrajGPT: Irregular Time-Series Representation Learning of Health Trajectory.
Qincheng Lu
Mike He Zhu
In the healthcare domain, time-series data are often irregularly sampled with varying intervals through outpatient visits, posing challenges… (see more) for existing models designed for equally spaced sequential data. To address this, we propose Trajectory Generative Pre-trained Transformer (TrajGPT) for representation learning on irregularly-sampled healthcare time series. TrajGPT introduces a novel Selective Recurrent Attention (SRA) module that leverages a data-dependent decay to adaptively filter irrelevant past information. As a discretized ordinary differential equation (ODE) framework, TrajGPT captures underlying continuous dynamics and enables a time-specific inference for forecasting arbitrary target timesteps without auto-regressive prediction. Experimental results based on the longitudinal EHR data PopHR from Montreal health system and eICU from PhysioNet showcase TrajGPT's superior zero-shot performance in disease forecasting, drug usage prediction, and sepsis detection. The inferred trajectories of diabetic and cardiac patients reveal meaningful comorbidity conditions, underscoring TrajGPT as a useful tool for forecasting patient health evolution.
TrajGPT: Irregular Time-Series Representation Learning of Health Trajectory.
Qincheng Lu
Mike He Zhu
In the healthcare domain, time-series data are often irregularly sampled with varying intervals through outpatient visits, posing challenges… (see more) for existing models designed for equally spaced sequential data. To address this, we propose Trajectory Generative Pre-trained Transformer (TrajGPT) for representation learning on irregularly-sampled healthcare time series. TrajGPT introduces a novel Selective Recurrent Attention (SRA) module that leverages a data-dependent decay to adaptively filter irrelevant past information. As a discretized ordinary differential equation (ODE) framework, TrajGPT captures underlying continuous dynamics and enables a time-specific inference for forecasting arbitrary target timesteps without auto-regressive prediction. Experimental results based on the longitudinal EHR data PopHR from Montreal health system and eICU from PhysioNet showcase TrajGPT's superior zero-shot performance in disease forecasting, drug usage prediction, and sepsis detection. The inferred trajectories of diabetic and cardiac patients reveal meaningful comorbidity conditions, underscoring TrajGPT as a useful tool for forecasting patient health evolution.
The impact of statistical adjustment for assay performance on inferences from SARS-CoV-2 serological surveillance studies
Jiacheng Chen
Yuan Yu
Sheila F O’Brien
Carmen Charlton
Steven J. Drews
Jane M Heffernan
Amber M Smith
Y. Nakagama
Yasutoshi Kido
W Alton Russell
Choice of immunoassay influences population seroprevalence estimates. Post-hoc adjustments for assay performance could improve comparability… (see more) of estimates across studies and enable pooled analyses. We assessed post-hoc adjustment methods using data from 2021–2023 SARS-CoV-2 serosurveillance studies in Alberta, Canada: one that tested 124,008 blood donations using Roche immunoassays (SARS-CoV-2 nucleocapsid total antibody and anti-SARS-CoV-2 S) and another that tested 214,780 patient samples using Abbott immunoassays (SARS-CoV-2 IgG and anti-SARS-CoV-2 S). Comparing datasets, seropositivity for antibodies against nucleocapsid (anti-N) diverged after May 2022 due to differential loss of sensitivity as a function of time since infection. The commonly used Rogen-Gladen adjustment did not reduce this divergence. Regression-based adjustments using the assays’ semi-quantitative results produced more similar estimates of anti-N seroprevalence and rolling incidence proportion (proportion of individuals infected in recent months). Seropositivity for antibodies targeting SARS-CoV-2 spike protein was similar without adjustment, and concordance was not improved when applying an alternative, functional threshold. These findings suggest that assay performance substantially impacted population inferences from SARS-CoV-2 serosurveillance studies in the Omicron period. Unlike methods that ignore time-varying assay sensitivity, regression-based methods using the semi-quantitative assay resulted in increased concordance in estimated anti-N seropositivity and rolling incidence between cohorts using different assays.
The impact of statistical adjustment for assay performance on inferences from SARS-CoV-2 serological surveillance studies
Jiacheng Chen
Yuan Yu
Sheila F O’Brien
Carmen Charlton
Steven J. Drews
Jane M Heffernan
Amber M Smith
Yu Nakagama
Yasutoshi Kido
W Alton Russell
Sociodemographic characteristics of SARS-CoV-2 serosurveillance studies with diverse recruitment strategies, Canada, 2020 to 2023
Matthew J Knight
Yuan Yu
Jiacheng Chen
Sheila F O’Brien
Carmen Charlton
W Alton Russell
FedWeight: mitigating covariate shift of federated learning on electronic health records data through patients re-weighting
Mike He Zhu
Na Li
Xiaoxiao Li
Dianbo Liu
A three-state coupled Markov switching model for COVID-19 outbreaks across Quebec based on hospital admissions
Dirk Douwes-Schultz
Alexandra M. Schmidt
Characterizing co-purchased food products with soda, fresh fruits, and fresh vegetables using loyalty card purchasing data in Montréal, Canada, 2015–2017
Hiroshi Mamiya
Kody Crowell
Catherine L. Mah
Amélie Quesnel-Vallée
Aman Verma
Sociodemographic characteristics of SARS-CoV-2 serosurveillance studies with diverse recruitment strategies, Canada, 2020 to 2023
Matthew J Knight
Yuan Yu
Jiacheng Chen
Sheila F O’Brien
Carmen Charlton
W Alton Russell
Background. Serological testing was a key component of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) surveillance. Social dis… (see more)tancing interventions, resource limitations, and the need for timely data led to serosurveillance studies using a range of recruitment strategies, which likely influenced study representativeness. Characterizing representativeness in surveillance is crucial to identify gaps in sampling coverage and to assess health inequities. Methods. We retrospectively analyzed three pre-existing longitudinal cohorts, two convenience samples using residual blood, and one de novo probabilistic survey conducted in Canada between April 2020 - November 2023. We calculated study specimen counts by age, sex, urbanicity, race/ethnicity, and neighborhood deprivation quintiles. We derived a 'representation ratio' as a simple metric to assess generalizability to a target population and various sociodemographic strata. Results. The six studies included 1,321,675 specimens. When stratifying by age group and sex, 65% of racialized minority subgroups were moderately underrepresented (representation ratio 0.75). Representation was generally higher for older Canadians, urban neighborhoods, and neighborhoods with low material deprivation. Rural representation was highest in a study that used outpatient laboratory blood specimens. Racialized minority representation was highest in a de novo probabilistic survey cohort. Conclusion. While no study had adequate representation of all subgroups, less traditional recruitment strategies were more representative of some population dimensions. Understanding demographic representativeness and barriers to recruitment are important considerations when designing population health surveillance studies.
Extrapolatable Transformer Pre-training for Ultra Long Time-Series Forecasting
Qincheng Lu
Hao Xu
Mike He Zhu