Mathieu Blanchette

Telomere-to-telomere assembly detects genomic diversity in Canadian strains of Borrelia burgdorferi

Atia B. Amin

Ana Victoria Ibarra Meneses

Simon Gagnon

Georgi Merhi

Martin Olivier

Momar Ndao

Christopher Fernandez-Prada

David Langlais

2025-07-20

bioRxiv (prépublication)

RobusTAD: reference panel based annotation of nested topologically associating domains

Yanlin Zhang

Rola Dali

Topologically associating domains (TADs) are fundamental units of 3D genomes and play essential roles in gene regulation. Hi-C data suggests… (voir plus) a hierarchical organization of TADs. Accurately annotating nested TADs from Hi-C data remains challenging, both in terms of the precise identification of boundaries and the correct inference of hierarchies. While domain boundary is relatively well conserved across cells, few approaches have taken advantage of this fact. Here, we present RobusTAD to annotate TAD hierarchies. It incorporates additional Hi-C data to refine boundaries annotated from the study sample. RobusTAD outperforms existing tools at boundary and domain annotation across several benchmarking tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-025-03568-9.

2025-05-19

Genome Biology (publié)

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Lazar Atanackovic

Xi Zhang

Brandon Amos

Leo J Lee

Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynam… (voir plus)ics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depends on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrating along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions unlike previously proposed methods. We demonstrate the ability of MFM to improve prediction of individual treatment responses on a large scale multi-patient single-cell drug screen dataset.

2025-01-22

ICLR.cc/2025/Conference (poster)

Sparsity regularization via tree-structured environments for disentangled representations

Jason Hartford

Many causal systems such as biological processes in cells can only be observed indirectly via measurements, such as gene expression. Causal … (voir plus)representation learning -- the task of correctly mapping low-level observations to latent causal variables -- could advance scientific understanding by enabling inference of latent variables such as pathway activation. In this paper, we develop methods for inferring latent variables from multiple related datasets (environments) and tasks. As a running example, we consider the task of predicting a phenotype from gene expression, where we often collect data from multiple cell types or organisms that are related in known ways. The key insight is that the mapping from latent variables driven by gene expression to the phenotype of interest changes sparsely across closely related environments. To model sparse changes, we introduce Tree-Based Regularization (TBR), an objective that minimizes both prediction error and regularizes closely related environments to learn similar predictors. We prove that under assumptions about the degree of sparse changes, TBR identifies the true latent variables up to some simple transformations. We evaluate the theory empirically with both simulations and ground-truth gene expression data. We find that TBR recovers the latent causal variables better than related methods across these settings, even under settings that violate some assumptions of the theory.

2025-01-01

Trans. Mach. Learn. Res. (publié)

Polaris: a universal tool for chromatin loop annotation in bulk and single-cell Hi-C data

Yusen Hou

Audrey Baguette

Yanlin Zhang

Annotating chromatin loops is essential for understanding the 3D genome’s role in gene regulation, but current methods struggle with low c… (voir plus)overage, particularly in single-cell datasets. Chromatin loops are kilo-to mega-range structures that exhibit broader features, such as co-occurring loops, stripes, and domain boundaries along axial directions of Hi-C contact maps. However, existing tools primarily focus on detecting localized, highly-concentrated, interactions. Furthermore, the wide variety of available chromatin conformation datasets is rarely utilized in developing effective loop callers. Here, we present Polaris, a universal tool that integrates axial attention with a U-shaped backbone to accurately detect loops across different 3D genome assays. By leveraging extensive Hi-C contact maps in a pretrain-finetune paradigm, Polaris achieves consistent performance across various datasets. We compare Polaris against existing tools in loop annotation from both bulk and single-cell data and find that Polaris outperforms other programs across different cell types, species, sequencing depths, and assays.

2024-12-24

bioRxiv (prépublication)

ARGV: 3D genome structure exploration using augmented reality

Chrisostomos Drogaris

Yanlin Zhang

Éric Zhang

Elena Nazarova

Roman Sarrazin-Gendron

Sélik Wilhelm-Landry

Yan Cyr

Jacek Majewski

Jérôme Waldispühl

2024-08-27

BMC Bioinformatics (publié)

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Lazar Atanackovic

Xi Zhang

Brandon Amos

Leo J Lee

Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynam… (voir plus)ics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depend on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrate along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions, unlike previously proposed methods. We demonstrate the ability of MFM to improve the prediction of individual treatment responses on a large-scale multi-patient single-cell drug screen dataset.

2024-08-26

ArXiv (prépublication)

arxiv.org

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Lazar Atanackovic

Xi Zhang

Brandon Amos

Leo J Lee

Numerous biological and physical processes can be modeled as systems of interacting samples evolving continuously over time, e.g. the dynami… (voir plus)cs of communicating cells or physical particles. Flow-based models allow for learning these dynamics at the population level --- they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We propose

2024-06-17

ICML.cc/2024/Workshop/GRaM (publié)

Sparsity regularization via tree-structured environments for disentangled representations

Elliot Layne

Dhanya Sridhar

Jason Hartford

2024-05-30

ArXiv (prépublication)

arxiv.org

Improving microbial phylogeny with citizen science within a mass-market video game

Roman Sarrazin-Gendron

Parham Ghasemloo Gheidari

Alexander Butyaev

Timothy Keding

Eddie Cai

Jiayue Zheng

Renata Mutalova

Julien Mounthanyvong

Yuxue Zhu

Elena Nazarova

Chrisostomos Drogaris

Kornél Erhart

David Michael Joshua Mathieu Vincent Steven Dan Jonathan Seung Jonathan David Steve Ludger Bélanger

Amélie Brouillette

Gabriel Richard

David Bélanger

Randy Pitchford

Michael Bouffard

Joshua Davidson

Sébastien Caisse … (voir 15 de plus)

Mathieu Falaise

Daniel McDonald

Vincent Fiset

Steven Hebert

Rob Knight

Attila Szantner

Dan Hewitt

Jérôme Waldispühl

Jonathan Huot

Seung Kim

Jonathan Moreau-Genest

David Najjab

Steve Prince

Ludger Saintélien

2024-04-15

Nature Biotechnology (publié)

Posterior inference of Hi-C contact frequency through sampling

Yanlin Zhang

Christopher J. F. Cameron

Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are r… (voir plus)epresented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.

2024-02-22

Frontiers in Bioinformatics (publié)