Portrait of Thibaud Godon

Thibaud Godon

Alumni

Publications

Extracting a COVID-19 signature from a multi-omic dataset
Baptiste Bauvin
Guillaume Bachelot
Claudia Carpentier
Riikka Huusaari
Maxime Déraspe
Juho Rousu
Caroline Quach
Extracting a COVID-19 signature from a multi-omic dataset
Baptiste Bauvin
Guillaume Bachelot
Claudia Carpentier
Riikka Huusaari
Maxime Déraspe
Juho Rousu
Caroline Quach
The complexity of COVID-19 requires approaches that extend beyond symptom-based descriptors. Multi-omic data, combining clinical, proteomic,… (see more) and metabolomic information, offer a more detailed view of disease mechanisms and biomarker discovery.As part of a large-scale Quebec initiative, we collected extensive datasets from COVID-19 positive and negative patient samples. Using a multi-view machine learning framework with ensemble methods, we integrated thousands of features across clinical, proteomic, and metabolomic domains to classify COVID-19 status. We further applied a novel feature relevance methodology to identify condensed signatures.Our models achieved a balanced accuracy of 89% ± 5% despite the high-dimensional nature of the data. Feature selection yielded 12- and 50-feature signatures that improved classification accuracy by at least 3% compared to the full feature set. These signatures were both accurate and interpretable.This work demonstrates that multi-omic integration, combined with advanced machine learning, enables the extraction of robust COVID-19 signatures from complex datasets. The condensed biomarker sets provide a practical path toward improved diagnosis and precision medicine, representing a significant advancement in COVID-19 biomarker discovery.
On Selecting Robust Approaches for Learning Predictive Biomarkers in Metabolomics Data Sets.
Metabolomics, the study of small molecules within biological systems, offers insights into metabolic processes and, consequently, holds grea… (see more)t promise for advancing health outcomes. Biomarker discovery in metabolomics represents a significant challenge, notably due to the high dimensionality of the data. Recent work has addressed this problem by analyzing the most important variables in machine learning models. Unfortunately, this approach relies on prior hypotheses about the structure of the data and may overlook simple patterns. To assess the true usefulness of machine learning methods, we evaluate them on a collection of 835 metabolomics data sets. This effort provides valuable insights for metabolomics researchers regarding where and when to use machine learning. It also establishes a benchmark for the evaluation of future methods. Nonetheless, the results emphasize the high diversity of data sets in metabolomics and the complexity of finding biologically relevant biomarkers. As a result, we propose a novel approach applicable across all data sets, offering guidance for future analyses. This method involves directly comparing univariate and multivariate models. We demonstrate through selected examples how this approach can guide data analysis across diverse data set structures, representative of the observed variability. Code and data are available for research purposes.
Invariant Causal Set Covering Machines
RandomSCM: interpretable ensembles of sparse classifiers tailored for omics data
Pier-Luc Plante
Baptiste Bauvin
Élina Francovic-Fontaine
Background: Understanding the relationship between the Omics and the phenotype is a central problem in precision medicine. The high dimensio… (see more)nality of metabolomics data challenges learning algorithms in terms of scalability and generalization. Most learning algorithms do not produce interpretable models -- Method: We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules. -- Results : Applications on metabolomics data shows that it produces models that achieves high predictive performances. The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.