Marc-andré Legault

Genetic contribution to asthma informs acute chest syndrome pathophysiology and risk stratification

Sara El Aouhel

Vanessa Bellegarde

Stennio Da

Silva Faria

Tristan St-Laurent

Estelle Lecluze

Anne-Laure Pham Hung d’Alexandry d’Orengiani

F. Galactéros

Pablo Bartolucci

Marc-André Legault

Guillaume Lettre

Thomas Pincez

2025-10-02

medRxiv (preprint)

doi.org

Genetic contribution to asthma informs acute chest syndrome pathophysiology and risk stratification

Sara El Aouhel

Vanessa Bellegarde

Stennio Da

Silva Faria

Tristan St-Laurent

Estelle Lecluze

Anne-Laure Pham Hung d’Alexandry d’Orengiani

F. Galactéros

Pablo Bartolucci

Marc-André Legault

Guillaume Lettre

Thomas Pincez

2025-10-02

medRxiv (preprint)

doi.org

A flexible machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition

Marc-André Legault

Jason Hartford

Benoît J. Arsenault

Archer Yang

Joelle Pineau

2025-05-01

American Journal of Human Genetics (published)

doi.org

Association Between Circulating Vitamin K Levels, Gut Microbiome, and Type 1 Diabetes: A Mendelian Randomization Study

Samuel De La Barrera

Benjamin De La Barrera

Marc-André Legault

Isabel Gamache

Despoina Manousaki

2024-11-01

Nutrients (published)

doi.org

Do machine learning methods Make Better predictions in pharmacoepidemiology?

Ana Paula Pena-Gralle

Mireille E. Schnitzer

Sofia-Nada Boureguaa

Félix Morin

Marc-André Legault

Caroline Sirois

Alice Dragomir

Lucie Blais

2024-09-01

Annals of Epidemiology (published)

doi.org

Predicting Five-Year All-Cause Mortality in COPD Patients Using Machine Learning

Ana Paula Pena-Gralle

Amélie Forget

Sofia-Nada Boureguaa

Marc-André Legault

Lucie Blais

2024-09-01

Annals of Epidemiology (published)

doi.org

A novel and efficient machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition

Marc-André Legault

Jason Hartford

Benoît J. Arsenault

Y. Archer

Yang

Joelle Pineau

Mendelian Randomization (MR) enables estimation of causal effects while controlling for unmeasured confounding factors. However, traditional… (see more) MR's reliance on strong parametric assumptions can introduce bias if these are violated. We introduce a new machine learning MR estimator named Quantile Instrumental Variable (IV) that achieves low estimation error in a wide range of plausible MR scenarios. Quantile IV is distinctive in its ability to estimate nonlinear and heterogeneous causal effects and offers a flexible approach for subgroup analysis. Applying Quantile IV, we investigate the impact of circulating sclerostin levels on heel bone mineral density, osteoporosis, and cardiovascular outcomes in the UK Biobank. Employing various MR estimators and colocalization techniques that allow multiple causal variants, our analysis reveals that a genetically predicted reduction in sclerostin levels significantly increases heel bone mineral density and reduces the risk of osteoporosis, while showing no discernible effect on ischemic cardiovascular diseases. Quantile IV contributes to the advancement of MR methodology, and the case study on the impact of circulating sclerostin modulation contributes to our understanding of the on-target effects of sclerostin inhibition.

2024-01-31

medRxiv (preprint)

doi.org

Deep interpretability for GWAS

Deepak Sharma

Audrey Durand

Marc-André Legault

Louis-philippe Lemieux Perreault

Audrey Lemaccon

Marie-Pierre Dub'e

Joelle Pineau

Genome-Wide Association Studies are typically conducted using linear models to find genetic variants associated with common diseases. In the… (see more)se studies, association testing is done on a variant-by-variant basis, possibly missing out on non-linear interaction effects between variants. Deep networks can be used to model these interactions, but they are difficult to train and interpret on large genetic datasets. We propose a method that uses the gradient based deep interpretability technique named DeepLIFT to show that known diabetes genetic risk factors can be identified using deep models along with possibly novel associations.

2020-07-03

ArXiv (preprint)

arxiv.org

Diet Networks: Thin Parameters for Fat Genomics

Adriana Romero Soriano

Marie-Pierre Dubé

Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (see more) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in medical research, more specifically in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer (number of input features times number of hidden units): each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed in data), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation (based on the feature's identity not its value) to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). This approach views the problem of producing the parameters associated with each feature as a multi-task learning problem. We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.

2017-01-01

ICLR.cc/2017/conference (poster)

openreview.net

Diet Networks: Thin Parameters for Fat Genomics

Adriana Romero Soriano

Marie-Pierre Dubé

Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (see more) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in medical research, more specifically in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer (number of input features times number of hidden units): each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed in data), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation (based on the feature's identity not its value) to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). This approach views the problem of producing the parameters associated with each feature as a multi-task learning problem. We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.

2017-01-01

ICLR.cc/2017/conference (poster)

openreview.net

Diet Networks: Thin Parameters for Fat Genomic

Adriana Romero Soriano

M. Dubé

Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (see more) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.

2016-11-04

ArXiv (preprint)

arxiv.org

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Marc-andré Legault

Publications

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Popular keywords:

Marc-andré Legault

Publications