Alexis Nolin-Lapalme

Foundation models for electrocardiogram interpretation: clinical implications

Alexis Nolin-Lapalme

Achille Sowa

Jacques Delfrate

Olivier Tastet

Denis Corbin

Merve Kulbay

Derman Ozdemir

Marie-Jeanne Noël

François-Christophe Marois-Blanchet

François Harvey

Surbhi Sharma

Minhaj Ansari

I-Min Chiu

Valentina D'souza

Sam F. Friedman

Michael Chassé

Brian J. Potter

Jonathan Afilalo

Pierre Adil Elias

Gilbert Jabbour … (see 13 more)

Mourad Bahani

Marie-Pierre Dubé

Patrick M. Boyle

Neal A. Chatterjee

Joshua Barrios

Geoffrey H. Tison

David Ouyang

Mahnaz Maddah

Shaan Khurshid

Julia Cadrin-Tourigny

Rafik Tadros

Julie Hussin

Robert Avram

The 12-lead electrocardiogram (ECG) remains a cornerstone of cardiac diagnostics, yet existing artificial intelligence (AI) solutions for au… (see more)tomated interpretation often lack generalizability, remain closed source, and are primarily trained using supervised learning (SL), which requires extensive labelled datasets and may limit adaptability across diverse clinical settings. Self-supervised learning (SSL) can potentially overcome these limitations by learning robust representations from unlabelled data. To address these challenges, this study developed and compared two open-source foundational ECG models: DeepECG-SL, a supervised multilabel ECG model, and DeepECG-SSL, a self-supervised model. Both models were trained on over 1 million ECGs using a standardized preprocessing pipeline and automated free-text extraction from ECG reports to predict 77 cardiac conditions. DeepECG-SSL leveraged unlabelled data through self-supervised contrastive learning and masked lead modelling before fine-tuning for downstream tasks, while DeepECG-SL was trained directly on labelled diagnostic data in an end-to-end fashion. Performance was evaluated across seven private, multilingual healthcare systems and four public ECG repositories, with assessment of fairness by age and sex, and investigation of privacy vulnerabilities as well as memory and compute requirements. DeepECG-SSL achieved micro-averaged area under the receiver operating characteristic curves (AUROCs) across all 77 cardiac conditions for ECG interpretation of 0.990 [95% confidence interval (CI): 0.990, 0.990] on the internal dataset (MHI-ds), 0.981 (95% CI: 0.981, 0.981) on external public datasets (UKB, CLSA, MIMIC-IV and PTB), and 0.983 (95% CI: 0.983, 0.983) on external private datasets (UW, UCSF, JGH, NYP, MGH, CSH and CHUM), while DeepECG-SL demonstrated AUROCs of 0.992 (95% CI: 0.992, 0.992), 0.980 (95% CI: 0.980, 0.980), and 0.983 (95% CI: 0.983, 0.984), respectively. Fairness analyses revealed minimal disparities (true-positive rate and false-positive rate difference <0.1) across age and sex groups for both models. DeepECG-SSL demonstrated superior performance on limited-data digital biomarker tasks, with the largest improvements in long QT syndrome (LQTS) genotype classification (AUROC 0.931 vs 0.850, P = .026, n = 127 ECGs) and 5 year atrial fibrillation risk prediction (AUROC 0.742 vs 0.734, P < 0.001, n = 132 050 ECGs), while achieving superior performance in left ventricular ejection fraction ≤40% classification (AUROC 0.926 vs 0.917, P < 0.001, n = 25 252 ECGs) and comparable performance in LQTS detection (AUROC 0.767 vs 0.735, P = 0.117, n = 934 ECGs). This study establishes SSL as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics. By releasing model weights, preprocessing tools, and validation code, this work aims to support robust, data-efficient AI diagnostics across diverse clinical environments and questions.

2026-01-21

European Heart Journal (published)

doi.org

Foundation models for generalizable electrocardiogram interpretation: comparison of supervised and self-supervised electrocardiogram foundation models

Alexis Nolin-Lapalme

Achille Sowa

Jacques Delfrate

Olivier Tastet

Denis Corbin

Merve Kulbay

Derman Ozdemir

Marie-Jeanne Noël

François-Christophe Marois-Blanchet

François Harvey

Surbhi Sharma

Minhaj Ansari

I-Min Chiu

Valentina Dsouza

Sam F. Friedman

Michael Chassé

Brian J. Potter

Jonathan Afilalo

Pierre Adil Elias

Gilbert Jabbour … (see 13 more)

Mourad Bahani

Marie-Pierre Dubé

Patrick M. Boyle

Neal A. Chatterjee

Joshua Barrios

Geoffrey H. Tison

David Ouyang

Mahnaz Maddah

Shaan Khurshid

Julia Cadrin-Tourigny

Rafik Tadros

Julie Hussin

Robert Avram

The 12-lead electrocardiogram (ECG) remains a cornerstone of cardiac diagnostics, yet existing artificial intelligence (AI) solutions for au… (see more)tomated interpretation often lack generalizability, remain closed-source, and are primarily trained using supervised learning, limiting their adaptability across diverse clinical settings. To address these challenges, we developed and compared two open-source foundational ECG models: DeepECG-SSL, a self-supervised learning model, and DeepECG-SL, a supervised learning model. Both models were trained on over 1 million ECGs using a standardized preprocessing pipeline and automated free-text extraction from ECG reports to predict 77 cardiac conditions. DeepECG-SSL was pretrained using self-supervised contrastive learning and masked lead modeling. The models were evaluated on six multilingual private healthcare systems and four public datasets for ECG interpretation across 77 diagnostic categories. Fairness analyses assessed disparities in performance across age and sex groups, while also investigating fairness and resource utilization. DeepECG-SSL achieved AUROCs of 0.990 (95%CI 0.990, 0.990) on internal dataset, 0.981 (95%CI 0.981, 0.981) on external public datasets, and 0.983 (95%CI 0.983, 0.983) on external private datasets, while DeepECG-SL demonstrated AUROCs of 0.992 (95%CI 0.992, 0.992), 0.980 (95%CI 0.980, 0.980) and 0.983 (95%CI 0.983, 0.983) respectively. Fairness analyses revealed minimal disparities (true positive rate & false positive rate difference<0.010) across age and sex groups. Digital biomarker prediction (Long QT syndrome (LQTS) classification, 5-year atrial fibrillation prediction and left ventricular ejection fraction (LVEF) classification) with limited labeled data, DeepECG-SSL outperformed DeepECG-SL in predicting 5-year atrial fibrillation risk (N=132,050; AUROC 0.742 vs. 0.720; Δ=0.022; P<0.001), identifying reduced LVEF ≤40% (N=25,252; 0.928 vs. 0.900; Δ=0.028; P<0.001), and classifying LQTS syndrome subtypes (N=127; 0.931 vs. 0.853; Δ=0.078; P=0.026). By releasing model weights, preprocessing tools, and validation code, we aim to support robust, data-efficient AI diagnostics across diverse clinical environments. This study establishes self-supervised learning as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics. Can self-supervised (SSL) learning yield ECG-based AI foundational models with enhanced performance, fairness, privacy, and generalizability compared to traditional supervised learning (SL) approaches? Our evaluation of DeepECG-SL and DeepECG-SSL across seven external health center datasets and four international publicly accessible datasets demonstrated that while both models achieve comparable diagnostic accuracy for ECG interpretation, SSL outperforms SL on novel tasks with smaller datasets. We validated DeepECG-SL and DeepECG-SSL across public and private datasets and demonstrated that SSL model had a superior generalizability by addressing fairness, privacy, and efficiency, and open sourcing our models, we advance ethical, adaptable AI for equitable, real-world ECG diagnostics. Graphical abstract: DeepECG-SL and DeepECG-SSL, two open-source AI models for 12-lead ECG interpretation, were trained on over 1 million ECGs. DeepECG-SSL, utilizing self-supervised contrastive learning and masked lead modeling, outperformed DeepECG-SL in utilizing digital biomarkers to predict atrial fibrillation risk, reduced LVEF, and long QT syndrome subtypes, while both models achieved high diagnostic accuracy with minimal fairness disparities across age and sex. Validated on ten external datasets, our work provides a robust, reproducible framework for equitable, efficient ECG-based cardiac diagnostics.

2025-03-04

medRxiv (preprint)

doi.org

Prediction of incident atrial fibrillation using deep learning, clinical models, and polygenic scores

Gilbert Jabbour

Alexis Nolin-Lapalme

Olivier Tastet

Denis Corbin

Paloma Jordà

Achille Sowa

Jacques Delfrate

David Busseuil

Julie G. Hussin

Marie-Pierre Dubé

Jean-Claude Tardif

Léna Rivard

Laurent Macle

Julia Cadrin-Tourigny

Paul Khairy

Robert Avram

Rafik Tadros

Deep learning applied to electrocardiograms (ECG-AI) is an emerging approach for predicting atrial fibrillation or flutter (AF). This study … (see more)introduces an ECG-AI model developed and tested at a tertiary cardiac centre, comparing its performance with clinical models and AF polygenic score (PGS). Electrocardiograms in sinus rhythm from the Montreal Heart Institute were analysed, excluding those from patients with pre-existing AF. The primary outcome was incident AF at 5 years. An ECG-AI model was developed by splitting patients into non-overlapping data sets: 70% for training, 10% for validation, and 20% for testing. The performance of ECG-AI, clinical models, and PGS was assessed in the test data set. The ECG-AI model was externally validated in the Medical Information Mart for Intensive Care-IV (MIMIC-IV) hospital data set. A total of 669 782 ECGs from 145 323 patients were included. Mean age was 61 ± 15 years, and 58% were male. The primary outcome was observed in 15% of patients, and the ECG-AI model showed an area under the receiver operating characteristic (AUC-ROC) curve of .78. In time-to-event analysis including the first ECG, ECG-AI inference of high risk identified 26% of the population with a 4.3-fold increased risk of incident AF (95% confidence interval: 4.02–4.57). In a subgroup analysis of 2301 patients, ECG-AI outperformed CHARGE-AF (AUC-ROC = .62) and PGS (AUC-ROC = .59). Adding PGS and CHARGE-AF to ECG-AI improved goodness of fit (likelihood ratio test P .001), with minimal changes to the AUC-ROC (.76–.77). In the external validation cohort (mean age 59 ± 18 years, 47% male, median follow-up 1.1 year), ECG-AI model performance remained consistent (AUC-ROC = .77). ECG-AI provides an accurate tool to predict new-onset AF in a tertiary cardiac centre, surpassing clinical and PGS.

2024-08-31

European Heart Journal (published)

doi.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Alexis Nolin-Lapalme

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Alexis Nolin-Lapalme

Publications