Portrait of David Buckeridge

David Buckeridge

Associate Academic Member
Full Professor, McGill University, Department of Epidemiology, Biostatistics and Occupational Health
Research Topics
Medical Machine Learning

Biography

David Buckeridge is a professor at the School of Population and Global Health at McGill University, as well as chief digital health officer for the McGill University Health Centre and executive scientific director of the Public Health Agency of Canada.

A Tier 1 Canada Research Chair in Health Informatics and Data Science, Buckeridge has projected health system demand for the Canadian province of Quebec, led data management and analytics for the Canadian Immunity Task Force, and supported the World Health Organization in monitoring global immunity to SARS-CoV-2. He has an MD from Queen's University, an MSc in epidemiology from the University of Toronto and a PhD in biomedical informatics from Stanford University. He is a Fellow of the Royal College of Physicians of Canada.

Current Students

Master's Research - McGill University
PhD - McGill University
Master's Research - McGill University
Master's Research - McGill University
Master's Research - McGill University

Publications

Characterizing co-purchased food products with soda, fresh fruits, and fresh vegetables using loyalty card purchasing data in Montréal, Canada, 2015–2017
Hiroshi Mamiya
Kody Crowell
Catherine L. Mah
Amélie Quesnel-Vallée
Aman Verma
Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning
Ziyang Song
Qincheng Lu
He Zhu
Learning time-series representations for discriminative tasks, such as classification and regression, has been a long-standing challenge in … (see more)the healthcare domain. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on biosignals and longitudinal clinical records by both next-token and previous-token prediction in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignals and longitudinal clinical records, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from biosignal time-series sequences, even more so after fine-tuning on the task.
Evaluating the effectiveness of the Smart About Meds (SAM) mobile application among patients discharged from hospital: protocol of a randomised controlled trial
Robyn Tamblyn
Bettina Habib
Daniala L Weir
Elizaveta Frolova
Rolan Alattar
Jessica Rogozinsky
Caroline Beauchamp
Rosalba Pupo
Susan J Bartlett
Emily McDonald
MixEHR-Nest: Identifying Subphenotypes within Electronic Health Records through Hierarchical Guided-Topic Modeling
Ruohan Wang
Zilong Wang
Ziyang Song
Automatic subphenotyping from electronic health records (EHRs)provides numerous opportunities to understand diseases with unique subgroups a… (see more)nd enhance personalized medicine for patients. However, existing machine learning algorithms either focus on specific diseases for better interpretability or produce coarse-grained phenotype topics without considering nuanced disease patterns. In this study, we propose a guided topic model, MixEHR-Nest, to infer sub-phenotype topics from thousands of disease using multi-modal EHR data. Specifically, MixEHR-Nest detects multiple subtopics from each phenotype topic, whose prior is guided by the expert-curated phenotype concepts such as Phenotype Codes (PheCodes) or Clinical Classification Software (CCS) codes. We evaluated MixEHR-Nest on two EHR datasets: (1) the MIMIC-III dataset consisting of over 38 thousand patients from intensive care unit (ICU) from Beth Israel Deaconess Medical Center (BIDMC) in Boston, USA; (2) the healthcare administrative database PopHR, comprising 1.3 million patients from Montreal, Canada. Experimental results demonstrate that MixEHR-Nest can identify subphenotypes with distinct patterns within each phenotype, which are predictive for disease progression and severity. Consequently, MixEHR-Nest distinguishes between type 1 and type 2 diabetes by inferring subphenotypes using CCS codes, which do not differentiate these two subtype concepts. Additionally, MixEHR-Nest not only improved the prediction accuracy of short-term mortality of ICU patients and initial insulin treatment in diabetic patients but also revealed the contributions of subphenotypes. For longitudinal analysis, MixEHR-Nest identified subphenotypes of distinct age prevalence under the same phenotypes, such as asthma, leukemia, epilepsy, and depression. The MixEHR-Nest software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-Nest.
TrajGPT: Healthcare Time-Series Representation Learning for Trajectory Prediction
Ziyang Song
Qincheng Lu
Mike He Zhu
In many domains, such as healthcare, time-series data is irregularly sampled with varying intervals between observations. This creates chall… (see more)enges for classical time-series models that require equally spaced data. To address this, we propose a novel time-series Transformer called **Trajectory Generative Pre-trained Transformer (TrajGPT)**. It introduces a data-dependent decay mechanism that adaptively forgets irrelevant information based on clinical context. By interpreting TrajGPT as ordinary differential equations (ODEs), our approach captures continuous dynamics from sparse and irregular time-series data. Experimental results show that TrajGPT, with its time-specific inference approach, accurately predicts trajectories without requiring task-specific fine-tuning.
TrajGPT: Healthcare Time-Series Representation Learning for Trajectory Prediction
Ziyang Song
Qincheng Lu
Mike He Zhu
In many domains, such as healthcare, time-series data is irregularly sampled with varying intervals between observations. This creates chall… (see more)enges for classical time-series models that require equally spaced data. To address this, we propose a novel time-series Transformer called **Trajectory Generative Pre-trained Transformer (TrajGPT)**. It introduces a data-dependent decay mechanism that adaptively forgets irrelevant information based on clinical context. By interpreting TrajGPT as ordinary differential equations (ODEs), our approach captures continuous dynamics from sparse and irregular time-series data. Experimental results show that TrajGPT, with its time-specific inference approach, accurately predicts trajectories without requiring task-specific fine-tuning.
TrajGPT: Irregular Time-Series Representation Learning for Health Trajectory Analysis
Ziyang Song
Qincheng Lu
Mike He Zhu
Correction: Economic evaluation of the effect of needle and syringe programs on skin, soft tissue, and vascular infections in people who inject drugs: a microsimulation modelling approach
Jihoon Lim
W Alton Russell
Mariam El-Sheikh
Dimitra Panagiotoglou
Development of a Framework for Establishing 'Gold Standard' Outbreak Data from Submitted SARS-CoV-2 Genome Samples
Yannan Shen
Russell Steele
Submitted genomic data for respiratory viruses reflect the emergence and spread of new variants. Although delays in submission limit the uti… (see more)lity of these data for prospective surveillance, they may be useful for evaluating other surveillance sources. However, few studies have investigated the use of these data for evaluating aberration detection in surveillance systems. Our study used a Bayesian online change point detection algorithm (BOCP) to detect increases in the number of submitted genome samples as a means of establishing 'gold standard' dates of outbreak onset in multiple countries. We compared models using different data transformations and parameter values. BOCP detected change points that were not sensitive to different parameter settings. We also found data transformations were essential prior to change point detection. Our study presents a framework for using global genomic submission data to develop 'gold standard' dates about the onset of outbreaks due to new viral variants.
Canada's provincial COVID-19 pandemic modelling efforts: A review of mathematical models and their impacts on the responses.
Yiqing Xia
Jorge Luis Flores Anato
Caroline Colijin
Naveed Janjua
Mike Irvine
Tyler Williamson
Marie B. Varughese
Michael Li
Nathaniel D. Osgood
David J. D. Earn
Beate Sander
Lauren E. Cipriano
Kumar Murty
Fanyu Xiu
Arnaud Godin
Amy Hurford
Sharmistha Mishra
Mathieu Maheu-Giroux
SETTING Mathematical modelling played an important role in the public health response to COVID-19 in Canada. Variability in epidemic traject… (see more)ories, modelling approaches, and data infrastructure across provinces provides a unique opportunity to understand the factors that shaped modelling strategies. INTERVENTION Provinces implemented stringent pandemic interventions to mitigate SARS-CoV-2 transmission, considering evidence from epidemic models. This study aimed to summarize provincial COVID-19 modelling efforts. We identified modelling teams working with provincial decision-makers, through referrals and membership in Canadian modelling networks. Information on models, data sources, and knowledge translation were abstracted using standardized instruments. OUTCOMES We obtained information from six provinces. For provinces with sustained community transmission, initial modelling efforts focused on projecting epidemic trajectories and healthcare demands, and evaluating impacts of proposed interventions. In provinces with low community transmission, models emphasized quantifying importation risks. Most of the models were compartmental and deterministic, with projection horizons of a few weeks. Models were updated regularly or replaced by new ones, adapting to changing local epidemic dynamics, pathogen characteristics, vaccines, and requests from public health. Surveillance datasets for cases, hospitalizations and deaths, and serological studies were the main data sources for model calibration. Access to data for modelling and the structure for knowledge translation differed markedly between provinces. IMPLICATION Provincial modelling efforts during the COVID-19 pandemic were tailored to local contexts and modulated by available resources. Strengthening Canadian modelling capacity, developing and sustaining collaborations between modellers and governments, and ensuring earlier access to linked and timely surveillance data could help improve pandemic preparedness.
Canada's Provincial Covid-19 Pandemic Modelling Efforts: A Review of Mathematical Models and Their Impacts on the Responses
Yiqing Xia
Jorge Luis Flores Anato
Caroline Colijin
Naveed Janjua
Michael Otterstatter
Mike Irvine
Tyler Williamson
Marie B. Varughese
Michael Li
Nathaniel Osgood
David J. D. Earn
Beate Sander
Lauren E. Cipriano
Kumar Murty
Fanyu Xiu
Arnaud Godin
Amy Hurford
Sharmistha Mishra
Mathieu Maheu-Giroux
Canada's approach to SARS-CoV-2 sero-surveillance: Lessons learned for routine surveillance and future pandemics.
Sheila F. O’Brien
Michael Asamoah-Boaheng
Brian Grunau
Mel Krajden
David M. Goldfarb
Maureen Anderson
Marc Germain
Patrick Brown
Derek R. Stein
Kami Kandola
Graham Tipples
Philip Awadalla
Amanda Lang
Lesley Behl
Tiffany Fitzpatrick
Steven J. Drews
SETTING In Canada's federated healthcare system, 13 provincial and territorial jurisdictions have independent responsibility to collect data… (see more) to inform health policies. During the COVID-19 pandemic (2020-2023), national and regional sero-surveys mostly drew upon existing infrastructure to quickly test specimens and collect data but required cross-jurisdiction coordination and communication. INTERVENTION There were 4 national and 7 regional general population SARS-CoV-2 sero-surveys. Survey methodologies varied by participant selection approaches, assay choices, and reporting structures. We analyzed Canadian pandemic sero-surveillance initiatives to identify key learnings to inform future pandemic planning. OUTCOMES Over a million samples were tested for SARS-CoV-2 antibodies from 2020 to 2023 but siloed in 11 distinct datasets. Most national sero-surveys had insufficient sample size to estimate regional prevalence; differences in methodology hampered cross-regional comparisons of regional sero-surveys. Only four sero-surveys included questionnaires. Sero-surveys were not directly comparable due to different assays, sampling methodologies, and time-frames. Linkage to health records occurred in three provinces only. Dried blood spots permitted sample collection in remote populations and during stay-at-home orders. IMPLICATIONS To provide timely, high-quality information for public health decision-making, routine sero-surveillance systems must be adaptable, flexible, and scalable. National capability planning should include consortiums for assay design and validation, defined mechanisms to improve test capacity, base documents for data linkage and material transfer across jurisdictions, and mechanisms for real-time communication of data. Lessons learned will inform incorporation of a robust sero-survey program into routine surveillance with strategic sampling and capacity to adapt and scale rapidly as a part of a comprehensive national pandemic response plan.