Portrait of Marc-André Legault

Marc-André Legault

Associate Academic Member
Assistant Professor, Université de Montréal, Pharmacogenetics
Research Topics
Causality
Computational Biology
Medical Machine Learning

Biography

Marc-André Legault obtained his Ph.D. in bioinformatics from Université de Montréal and the Montreal Heart Institute where he developed and applied new computational methods for drug target validation. He subsequently completed his postdoctoral training at McGill University and Mila - Quebec Artificial Intelligence Institute, working on instrumental variable estimation and machine learning for genetic epidemiology more broadly.

He is now an Assistant Professor in pharmacogenetics at the Université de Montréal's Faculty of pharmacy and a researcher at the CHU Sainte-Justine Azrieli research centre. His research program aims to develop and use computational approaches for drug target validation to better understand treatment heterogeneity and improve our ability to anticipate the on-target effect of new drug classes. He is also an Associate Academic member at Mila - Quebec Institute of Artificial Intelligence.

Current Students

Postdoctorate - Université de Montréal

Publications

Do machine learning methods make better predictions than conventional ones in pharmacoepidemiology? A systematic review, meta-analysis, and network meta-analysis.
Ana Paula Bruno Pena-Gralle
Mireille E. Schnitzer
Sofia-Nada Boureguaa
Félix Morin
Caroline Sirois
Alice Dragomir
Lucie Blais
PheCode-guided multi-modal topic modeling of electronic health records improves disease incidence prediction and GWAS discovery from UK Biobank
Ziqi Yang
Ziyang Song
Phenome-wide association studies rely on disease definitions derived from diagnostic codes, often failing to leverage the full richness of e… (see more)lectronic health records (EHR). We present MixEHR-SAGE, a PheCode-guided multi-modal topic model that integrates diagnoses, procedures, and medications to enhance phenotyping from large-scale EHRs. By combining expert-informed priors with probabilistic inference, MixEHR-SAGE identifies over 1000 interpretable phenotype topics from UK Biobank data. Applied to 350 000 individuals with high-quality genetic data, MixEHR-SAGE-derived risk scores accurately predict incident type 2 diabetes (T2D) and leukemia diagnoses. Subsequent genome-wide association studies using these continuous risk scores uncovered novel disease-associated loci, including PPP1R15A for T2D and JMJD6/SRSF2 for leukemia, that were missed by traditional binary case definitions. These results highlight the potential of probabilistic phenotyping from multi-modal EHRs to improve genetic discovery. The MixEHR-SAGE software is publicly available at: https://github.com/li-lab-mcgill/MixEHR-SAGE.
Reply to comment on "medication-based mortality prediction in COPD using machine learning and conventional statistical methods".
Ana Paula Pena-Gralle
Amélie Forget
Yohann Moanahere Chiu
M. Beauchesne
Lucie Blais
Medication-based mortality prediction in COPD using machine learning and conventional statistical methods.
Ana Paula Pena-Gralle
Amélie Forget
Yohann Chiu
M. Beauchesne
Lucie Blais
Genetic contribution to asthma informs acute chest syndrome pathophysiology and risk stratification
Sara El Aouhel
Vanessa Bellegarde
Stennio Da
Silva Faria
Tristan St-Laurent
Estelle Lecluze
Anne-Laure Pham Hung d’Alexandry d’Orengiani
F. Galactéros
Pablo Bartolucci
Guillaume Lettre
Thomas Pincez
A flexible machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition
Jason Hartford
Benoit J. Arsenault
Archer Y. Yang
Association Between Circulating Vitamin K Levels, Gut Microbiome, and Type 1 Diabetes: A Mendelian Randomization Study
Samuel De La Barrera
Benjamin De La Barrera
Isabel Gamache
Despoina Manousaki
Background/Objectives: Nutritional deficiencies have been proposed as possible etiological causes for autoimmune diseases, among which type … (see more)1 diabetes (T1D). Vitamin K (VK) has potentially positive effects on type 2 diabetes, but its role on T1D in humans remains largely unknown. We aimed to examine the presence of a causal association between VK and T1D using a Mendelian randomization (MR) approach. Methods: Genetic variants from a genome-wide association study (GWAS) for VK (N = 2138 Europeans) were used as instruments in our two-sample MR study to investigate whether circulating VK levels are causally associated with the risk of T1D in a large European T1D GWAS cohort (18,942 cases/520,580 controls). Through a multivariable MR (MVMR), the effects of both VK and specific gut microbiota on T1D were investigated given that the gut microbiome synthesizes VK. Results: We found that changes in levels of circulating VK did not affect T1D risk in our univariate two-sample MR, but this study had limited power to detect small effects of VK (OR for T1D of less than 0.8). However, our MVMR indicated a suggestive association of VK with the risk of T1D adjusting for two different gut microbiome populations. Conclusions: In conclusion, VK levels are unlikely to significantly affect the risk of T1D, but small effects cannot be excluded, and the role of gut microbiome in this association should be further investigated.
Do machine learning methods Make Better predictions in pharmacoepidemiology?
Ana Paula Pena-Gralle
Mireille E. Schnitzer
Sofia-Nada Boureguaa
Félix Morin
Caroline Sirois
Alice Dragomir
Lucie Blais
Predicting Five-Year All-Cause Mortality in COPD Patients Using Machine Learning
Ana Paula Pena-Gralle
Amélie Forget
Sofia-Nada Boureguaa
Lucie Blais
Diet Networks: Thin Parameters for Fat Genomics
Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (see more) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.