Portrait de Mathieu Blanchette

Mathieu Blanchette

Membre académique associé
Directeur et professeur associé, McGill University, École d'informatique
Sujets de recherche
Apprentissage profond
Biologie computationnelle
Réseaux de neurones en graphes

Biographie

Mathieu Blanchette est professeur associé et directeur de l'École d'informatique de l'Université McGill.

Après avoir obtenu un doctorat (Université de Washington, 2002) et un postdoctorat (Université de Californie à Santa Cruz, 2003), il s'est joint à l'École d'informatique de l’Université McGill et a fondé le Laboratoire de génomique computationnelle. Les recherches effectuées par son équipe d’exception ont fait l'objet de plus de 70 publications. Récemment élu membre du Collège de nouveaux chercheurs et créateurs en art et science de la Société royale du Canada, il a été boursier Sloan (2009) et a reçu le prix Outstanding Young Computer Scientist Researcher de l'Association canadienne de l'informatique (2012) ainsi que le prix Chris Overton (2006). Il adore enseigner et superviser les étudiant·e·s, et a d’ailleurs reçu le prix Leo Yaffe pour l'enseignement (2008).

Étudiants actuels

Maîtrise recherche - McGill
Co-superviseur⋅e :
Maîtrise recherche - McGill
Maîtrise recherche - McGill
Doctorat - McGill

Publications

Telomere-to-telomere assembly detects genomic diversity in Canadian strains of
<i>Borrelia burgdorferi</i>
Atia B. Amin
Ana Victoria Ibarra Meneses
Simon Gagnon
Georgi Merhi
Martin Olivier
Momar Ndao
Christopher Fernandez-Prada
David Langlais
Genomic Flexibility Through Extrachromosomal Amplifications: A Leishmania Survival Strategy
Atia Amin
Ana Victoria Ibarra Meneses
Christopher Fernandez-Prada
David Langlais
Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations
Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning meth… (voir plus)ods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.
RobusTAD: reference panel based annotation of nested topologically associating domains
Yanlin Zhang
Rola Dali
Topologically associating domains (TADs) are fundamental units of 3D genomes and play essential roles in gene regulation. Hi-C data suggests… (voir plus) a hierarchical organization of TADs. Accurately annotating nested TADs from Hi-C data remains challenging, both in terms of the precise identification of boundaries and the correct inference of hierarchies. While domain boundary is relatively well conserved across cells, few approaches have taken advantage of this fact. Here, we present RobusTAD to annotate TAD hierarchies. It incorporates additional Hi-C data to refine boundaries annotated from the study sample. RobusTAD outperforms existing tools at boundary and domain annotation across several benchmarking tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-025-03568-9.
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold
Xi Zhang
Brandon Amos
Leo J. Lee
Kirill Neklyudov
Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynam… (voir plus)ics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depend on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrate along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions, unlike previously proposed methods. We demonstrate the ability of MFM to improve the prediction of individual treatment responses on a large-scale multi-patient single-cell drug screen dataset.
R3Design: deep tertiary structure-based RNA sequence design and beyond
Cheng Tan
Zhangyang Gao
Hanqun Cao
Siyuan Li
Siqi Ma
Stan Z. Li
The rational design of Ribonucleic acid (RNA) molecules is crucial for advancing therapeutic applications, synthetic biology, and understand… (voir plus)ing the fundamental principles of life. Traditional RNA design methods have predominantly focused on secondary structure-based sequence design, often neglecting the intricate and essential tertiary interactions. We introduce R3Design, a tertiary structure-based RNA sequence design method that shifts the paradigm to prioritize tertiary structure in the RNA sequence design. R3Design significantly enhances sequence design on native RNA backbones, achieving high sequence recovery and Macro-F1 score, and outperforming traditional secondary structure-based approaches by substantial margins. We demonstrate that R3Design can design RNA sequences that fold into the desired tertiary structures by validating these predictions using advanced structure prediction models. This method, which is available through standalone software, provides a comprehensive toolkit for designing, folding, and evaluating RNA at the tertiary level. Our findings demonstrate R3Design’s superior capability in designing RNA sequences, which achieves around \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}
Polaris: a universal tool for chromatin loop annotation in bulk and single-cell Hi-C data
Yusen Hou
Audrey Baguette
Yanlin Zhang
Annotating chromatin loops is essential for understanding the 3D genome’s role in gene regulation, but current methods struggle with low c… (voir plus)overage, particularly in single-cell datasets. Chromatin loops are kilo-to mega-range structures that exhibit broader features, such as co-occurring loops, stripes, and domain boundaries along axial directions of Hi-C contact maps. However, existing tools primarily focus on detecting localized, highly-concentrated, interactions. Furthermore, the wide variety of available chromatin conformation datasets is rarely utilized in developing effective loop callers. Here, we present Polaris, a universal tool that integrates axial attention with a U-shaped backbone to accurately detect loops across different 3D genome assays. By leveraging extensive Hi-C contact maps in a pretrain-finetune paradigm, Polaris achieves consistent performance across various datasets. We compare Polaris against existing tools in loop annotation from both bulk and single-cell data and find that Polaris outperforms other programs across different cell types, species, sequencing depths, and assays.
TULIPS decorate the three-dimensional genome of PFA ependymoma
Michael J. Johnston
John J.Y. Lee
Bo Hu
Ana Nikolic
Elham Hasheminasabgorji
Audrey Baguette
Seungil Paik
Haifen Chen
Sachin Kumar
Carol C.L. Chen
Selin Jessa
Polina Balin
Vernon Fong
Melissa Zwaig
Kulandaimanuvel Antony Michealraj
Xun Chen
Yanlin Zhang
Srinidhi Varadharajan
Pierre Billon
Nikoleta Juretic … (voir 30 de plus)
Craig Daniels
Amulya Nageswara Rao
Caterina Giannini
Eric M. Thompson
Miklos Garami
Peter Hauser
Timea Pocza
Young Shin Ra
Byung-Kyu Cho
Seung-Ki Kim
Kyu-Chang Wang
Ji Yeoun Lee
Wieslawa Grajkowska
Marta Perek-Polnik
Sameer Agnihotri
Stephen Mack
Benjamin Ellezam
Alex Weil
Jeremy Rich
Guillaume Bourque
Jennifer A. Chan
V. Wee Yong
Mathieu Lupien
Jiannis Ragoussis
Claudia Kleinman
Jacek Majewski
Nada Jabado
Michael D. Taylor
Marco Gallo
Improving microbial phylogeny with citizen science within a mass-market video game
Roman Sarrazin-Gendron
Parham Ghasemloo Gheidari
Alexander Butyaev
Timothy Keding
Eddie Cai
Renata Mutalova
Julien Mounthanyvong
Yuxue Zhu
Elena Nazarova
Chrisostomos Drogaris
Kornél Erhart
David Michael Joshua Mathieu Vincent Steven Dan Jonathan Bélanger Bouffard Davidson Falaise Fiset Hebert He
David Michael Joshua Mathieu Vincent Steven Dan Jonathan Seung Jonathan David Steve Ludger Bélanger
David Bélanger
Michael Bouffard
Joshua Davidson
Mathieu Falaise
Vincent Fiset
Steven Hebert … (voir 16 de plus)
Dan Hewitt
Jonathan Huot
Seung Kim
Jonathan Moreau-Genest
David Najjab
Steve Prince
Ludger Saintélien
Amélie Brouillette
Gabriel Richard
Randy Pitchford
Sébastien Caisse
Daniel McDonald
Rob Knight
Attila Szantner
Jérôme Waldispühl
Citizen science video games are designed primarily for users already inclined to contribute to science, which severely limits their accessib… (voir plus)ility for an estimated community of 3 billion gamers worldwide. We created Borderlands Science (BLS), a citizen science activity that is seamlessly integrated within a popular commercial video game played by tens of millions of gamers. This integration is facilitated by a novel game-first design of citizen science games, in which the game design aspect has the highest priority, and a suitable task is then mapped to the game design. BLS crowdsources a multiple alignment task of 1 million 16S ribosomal RNA sequences obtained from human microbiome studies. Since its initial release on 7 April 2020, over 4 million players have solved more than 135 million science puzzles, a task unsolvable by a single individual. Leveraging these results, we show that our multiple sequence alignment simultaneously improves microbial phylogeny estimations and UniFrac effect sizes compared to state-of-the-art computational methods. This achievement demonstrates that hyper-gamified scientific tasks attract massive crowds of contributors and offers invaluable resources to the scientific community.
Posterior inference of Hi-C contact frequency through sampling
Yanlin Zhang
Christopher J. F. Cameron
Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are r… (voir plus)epresented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.
Multi-ancestry polygenic risk scores using phylogenetic regularization
Accurately predicting phenotype using genotype across diverse ancestry groups remains a significant challenge in human genetics. Many state-… (voir plus)of-the-art polygenic risk score models are known to have difficulty generalizing to genetic ancestries that are not well represented in their training set. To address this issue, we present a novel machine learning method for fitting genetic effect sizes across multiple ancestry groups simultaneously, while leveraging prior knowledge of the evolutionary relationships among them. We introduce DendroPRS, a machine learning model where SNP effect sizes are allowed to evolve along the branches of the phylogenetic tree capturing the relationship among populations. DendroPRS outperforms existing approaches at two important genotype-to-phenotype prediction tasks: expression QTL analysis and polygenic risk scores. We also demonstrate that our method can be useful for multi-ancestry modelling, both by fitting population-specific effect sizes and by more accurately accounting for covariate effects across groups. We additionally find a subset of genes where there is strong evidence that an ancestry-specific approach improves eQTL modelling.
PERFUMES: pipeline to extract RNA functional motifs and exposed structures
Arnaud Chol
Roman Sarrazin-Gendron
Éric Lécuyer
Jérôme Waldispühl
Abstract Motivation Up to 75% of the human genome encodes RNAs. The function of many non-coding RNAs relies on their ability to fold into 3D… (voir plus) structures. Specifically, nucleotides inside secondary structure loops form non-canonical base pairs that help stabilize complex local 3D structures. These RNA 3D motifs can promote specific interactions with other molecules or serve as catalytic sites. Results We introduce PERFUMES, a computational pipeline to identify 3D motifs that can be associated with observable features. Given a set of RNA sequences with associated binary experimental measurements, PERFUMES searches for RNA 3D motifs using BayesPairing2 and extracts those that are over-represented in the set of positive sequences. It also conducts a thermodynamics analysis of the structural context that can support the interpretation of the predictions. We illustrate PERFUMES’ usage on the SNRPA protein binding site, for which the tool retrieved both previously known binder motifs and new ones. Availability and implementation PERFUMES is an open-source Python package (https://jwgitlab.cs.mcgill.ca/arnaud_chol/perfumes).