Mathieu Blanchette

Doctorat - McGill

Site web

Google Scholar

Nicole Zhang

Doctorat - McGill

Co-superviseur⋅e :

Publications

Telomere-to-telomere assembly detects genomic diversity in Canadian strains of
<i>Borrelia burgdorferi</i>

Atia B. Amin

Ana Victoria Ibarra Meneses

Simon Gagnon

Georgi Merhi

Martin Olivier

Momar Ndao

Christopher Fernandez-Prada

David Langlais

2026-01-31

Cell Reports (publié)

h-MINT: Modeling Pocket-Ligand Binding with Hierarchical Molecular Interaction Network

Yanru Qu

Wenjuan Tan

Xiangzhe Kong

Xiangxin Zhou

Chaoran Cheng

Jiaxuan You

Ge Liu

Accurate molecular representations are critical for drug discovery, and a central challenge lies in capturing the chemical environment of mo… (voir plus)lecular fragments, as key interactions, such as H-bond and π stacking—occur only under specific local conditions. Most existing approaches represent molecules as atom-level graphs; however, individual atoms cannot express stereochemistry, lone pairs, conjugation, and other complex features. Fragment-based methods (e.g., principal subgraph or functional group libraries) fail to preserve essential information such as chirality, aromatic bond integrity, and ionic states. This work addresses these limitations from two aspects. (i) **OverlapBPE tokenization**. We propose a novel data-driven molecule tokenization method. Unlike existing approaches, our method allows overlapping fragments, reflecting the inherently fuzzy boundaries of small-molecule substructures and, together with enriched chemical information at the token level, thereby preserving a more complete chemical context. (ii) **h- MINT model**. We develop a hierarchical molecular interaction network capable of jointly modeling drug–target interactions at both atom and fragment levels. By supporting fragment overlaps, the model naturally accommodates the many-to- many atom–fragment mappings introduced by the OverlapBPE scheme. Extensive evaluation against state-of-the-art methods shows our method improves binding affinity prediction by 2-4% Pearson/Spearman correlation on PDBBind and LBA, enhances virtual screening by 1-3% in key metrics on DUD-E and LIT-PCBA, and achieves the best overall HTS performance on PubChem assays. Further analysis demonstrates that our method effectively captures interactive information while maintaining good generalization.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

openreview.net

Genomic Flexibility Through Extrachromosomal Amplifications: A Leishmania Survival Strategy

Atia Amin

Ana Victoria Ibarra Meneses

Christopher Fernandez-Prada

David Langlais

2025-12-13

bioRxiv (prépublication)

Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations

Jason Hartford

Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning meth… (voir plus)ods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.

2025-07-25

Transactions on Machine Learning Research (accepté)

openreview.net

RobusTAD: reference panel based annotation of nested topologically associating domains

Yanlin Zhang

Rola Dali

Topologically associating domains (TADs) are fundamental units of 3D genomes and play essential roles in gene regulation. Hi-C data suggests… (voir plus) a hierarchical organization of TADs. Accurately annotating nested TADs from Hi-C data remains challenging, both in terms of the precise identification of boundaries and the correct inference of hierarchies. While domain boundary is relatively well conserved across cells, few approaches have taken advantage of this fact. Here, we present RobusTAD to annotate TAD hierarchies. It incorporates additional Hi-C data to refine boundaries annotated from the study sample. RobusTAD outperforms existing tools at boundary and domain annotation across several benchmarking tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-025-03568-9.

2025-05-18

Genome Biology (publié)

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Lazar Atanackovic

Xi Zhang

Brandon Amos

Leo J. Lee

Yoshua Bengio

Alexander Tong

Kirill Neklyudov

Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynam… (voir plus)ics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depend on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrate along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions, unlike previously proposed methods. We demonstrate the ability of MFM to improve the prediction of individual treatment responses on a large-scale multi-patient single-cell drug screen dataset.

2025-01-21

International Conference on Learning Representations (poster)

openreview.net

dyAb: Flow Matching for Flexible Antibody Design with AlphaFold-driven Pre-binding Antigen

Cheng Tan

Zhangyang Gao

Yufei Huang

Haitao Lin

Lirong Wu

Fandi Wu

Stan Z. Li

The development of therapeutic antibodies heavily relies on accurate predictions of how antigens will interact with antibodies. Existing com… (voir plus)putational methods in antibody design often overlook crucial conformational changes that antigens undergo during the binding process, significantly impacting the reliability of the resulting antibodies. To bridge this gap, we introduce dyAb, a flexible framework that incorporates AlphaFold2-driven predictions to model pre-binding antigen structures and specifically addresses the dynamic nature of antigen conformation changes. Our dyAb model leverages a unique combination of coarse-grained interface alignment and fine-grained flow matching techniques to simulate the interaction dynamics and structural evolution of the antigen-antibody complex, providing a realistic representation of the binding process. Extensive experiments show that dyAb significantly outperforms existing models in antibody design involving changing antigen conformations. These results highlight dyAb's potential to streamline the design process for therapeutic antibodies, promising more efficient development cycles and improved outcomes in clinical applications.

2024-12-31

AAAI Conference on Artificial Intelligence (publié)

arxiv.org

R3Design: deep tertiary structure-based RNA sequence design and beyond

Cheng Tan

Zhangyang Gao

Hanqun Cao

Siyuan Li

Siqi Ma

Stan Z. Li

The rational design of Ribonucleic acid (RNA) molecules is crucial for advancing therapeutic applications, synthetic biology, and understand… (voir plus)ing the fundamental principles of life. Traditional RNA design methods have predominantly focused on secondary structure-based sequence design, often neglecting the intricate and essential tertiary interactions. We introduce R3Design, a tertiary structure-based RNA sequence design method that shifts the paradigm to prioritize tertiary structure in the RNA sequence design. R3Design significantly enhances sequence design on native RNA backbones, achieving high sequence recovery and Macro-F1 score, and outperforming traditional secondary structure-based approaches by substantial margins. We demonstrate that R3Design can design RNA sequences that fold into the desired tertiary structures by validating these predictions using advanced structure prediction models. This method, which is available through standalone software, provides a comprehensive toolkit for designing, folding, and evaluating RNA at the tertiary level. Our findings demonstrate R3Design’s superior capability in designing RNA sequences, which achieves around \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}

2024-12-31

Briefings in Bioinformatics (publié)

Polaris: a universal tool for chromatin loop annotation in bulk and single-cell Hi-C data

Yusen Hou

Audrey Baguette

Yanlin Zhang

Annotating chromatin loops is essential for understanding the 3D genome’s role in gene regulation, but current methods struggle with low c… (voir plus)overage, particularly in single-cell datasets. Chromatin loops are kilo-to mega-range structures that exhibit broader features, such as co-occurring loops, stripes, and domain boundaries along axial directions of Hi-C contact maps. However, existing tools primarily focus on detecting localized, highly-concentrated, interactions. Furthermore, the wide variety of available chromatin conformation datasets is rarely utilized in developing effective loop callers. Here, we present Polaris, a universal tool that integrates axial attention with a U-shaped backbone to accurately detect loops across different 3D genome assays. By leveraging extensive Hi-C contact maps in a pretrain-finetune paradigm, Polaris achieves consistent performance across various datasets. We compare Polaris against existing tools in loop annotation from both bulk and single-cell data and find that Polaris outperforms other programs across different cell types, species, sequencing depths, and assays.

2024-12-23

bioRxiv (prépublication)

TULIPS decorate the three-dimensional genome of PFA ependymoma

Michael J. Johnston

John J.Y. Lee

Bo Hu

Ana Nikolic

Elham Hasheminasabgorji

Audrey Baguette

Seungil Paik

Haifen Chen

Sachin Kumar

Carol C.L. Chen

Selin Jessa

Polina Balin

Vernon Fong

Melissa Zwaig

Kulandaimanuvel Antony Michealraj

Xun Chen

Yanlin Zhang

Srinidhi Varadharajan

Pierre Billon

Nikoleta Juretic … (voir 30 de plus)

Craig Daniels

Amulya Nageswara Rao

Caterina Giannini

Eric M. Thompson

Miklos Garami

Peter Hauser

Timea Pocza

Young Shin Ra

Byung-Kyu Cho

Seung-Ki Kim

Kyu-Chang Wang

Ji Yeoun Lee

Wieslawa Grajkowska

Marta Perek-Polnik

Sameer Agnihotri

Stephen Mack

Benjamin Ellezam

Alex Weil

Jeremy Rich

Guillaume Bourque

Jennifer A. Chan

V. Wee Yong

Mathieu Lupien

Jiannis Ragoussis

Claudia Kleinman

Jacek Majewski

Nada Jabado

Michael D. Taylor

Marco Gallo

2024-08-31

Cell (publié)

Improving microbial phylogeny with citizen science within a mass-market video game

Roman Sarrazin-Gendron

Parham Ghasemloo Gheidari

Alexander Butyaev

Timothy Keding

Eddie Cai

Jiayue Zheng

Renata Mutalova

Julien Mounthanyvong

Yuxue Zhu

Elena Nazarova

Chrisostomos Drogaris

Kornél Erhart

David Michael Joshua Mathieu Vincent Steven Dan Jonathan Bélanger Bouffard Davidson Falaise Fiset Hebert He

David Michael Joshua Mathieu Vincent Steven Dan Jonathan Seung Jonathan David Steve Ludger Bélanger

David Bélanger

Michael Bouffard

Joshua Davidson

Mathieu Falaise

Vincent Fiset

Steven Hébert … (voir 16 de plus)

Dan Hewitt

Jonathan Huot

Seung Kim

Jonathan Moreau-Genest

David Najjab

Steve Prince

Ludger Saintélien

Amélie Brouillette

Gabriel Richard

Randy Pitchford

Sébastien Caisse

Daniel McDonald

Rob Knight

Attila Szantner

Jérôme Waldispühl

Citizen science video games are designed primarily for users already inclined to contribute to science, which severely limits their accessib… (voir plus)ility for an estimated community of 3 billion gamers worldwide. We created Borderlands Science (BLS), a citizen science activity that is seamlessly integrated within a popular commercial video game played by tens of millions of gamers. This integration is facilitated by a novel game-first design of citizen science games, in which the game design aspect has the highest priority, and a suitable task is then mapped to the game design. BLS crowdsources a multiple alignment task of 1 million 16S ribosomal RNA sequences obtained from human microbiome studies. Since its initial release on 7 April 2020, over 4 million players have solved more than 135 million science puzzles, a task unsolvable by a single individual. Leveraging these results, we show that our multiple sequence alignment simultaneously improves microbial phylogeny estimations and UniFrac effect sizes compared to state-of-the-art computational methods. This achievement demonstrates that hyper-gamified scientific tasks attract massive crowds of contributors and offers invaluable resources to the scientific community.

2024-04-14

Nature Biotechnology (publié)

Posterior inference of Hi-C contact frequency through sampling

Yanlin Zhang

Christopher J. F. Cameron

Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are r… (voir plus)epresented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.

2024-02-21

Frontiers in Bioinformatics (publié)