Portrait de Mathieu Blanchette

Mathieu Blanchette

Membre académique associé
Directeur et professeur associé, McGill University, École d'informatique
Sujets de recherche
Apprentissage profond
Biologie computationnelle
Réseaux de neurones en graphes

Biographie

Mathieu Blanchette est professeur associé et directeur de l'École d'informatique de l'Université McGill.

Après avoir obtenu un doctorat (Université de Washington, 2002) et un postdoctorat (Université de Californie à Santa Cruz, 2003), il s'est joint à l'École d'informatique de l’Université McGill et a fondé le Laboratoire de génomique computationnelle. Les recherches effectuées par son équipe d’exception ont fait l'objet de plus de 70 publications. Récemment élu membre du Collège de nouveaux chercheurs et créateurs en art et science de la Société royale du Canada, il a été boursier Sloan (2009) et a reçu le prix Outstanding Young Computer Scientist Researcher de l'Association canadienne de l'informatique (2012) ainsi que le prix Chris Overton (2006). Il adore enseigner et superviser les étudiant·e·s, et a d’ailleurs reçu le prix Leo Yaffe pour l'enseignement (2008).

Étudiants actuels

Maîtrise recherche - McGill
Maîtrise recherche - McGill
Maîtrise recherche - McGill
Doctorat - McGill
Doctorat - McGill
Co-superviseur⋅e :

Publications

H3K27me3 spreading organizes canonical PRC1 chromatin architecture to regulate developmental programs
Brian Krug
Bo Hu
Haifen Chen
Adam Ptack
Xiao Chen
Kristjan H. Gretarsson
Shriya Deshmukh
Nisha Kabir
Augusto Faria Andrade
Elias Jabbour
Ashot S. Harutyunyan
John J. Y. Lee
Maud Hulswit
Damien Faury
Caterina Russo
Xinjing Xu
Michael Johnston
Audrey Baguette
Nathan A. Dahl
Alexander G. Weil … (voir 12 de plus)
Benjamin Ellezam
Rola Dali
Khadija Wilson
Benjamin A. Garcia
Rajesh Kumar Soni
Marco Gallo
Michael D. Taylor
Claudia Kleinman
Jacek Majewski
Nada Jabado
Chao Lu
Player-Guided AI outperforms standard AI in Sequence Alignment Puzzles
Renata Mutalova
Roman Sarrazin-Gendron
Parham Ghasemloo Gheidari
Eddie Cai
Gabriel Richard
Sébastien Caisse
Rob Knight
Attila Szantner
Jérôme Waldispühl
PhyloGFN: Phylogenetic inference with generative flow networks
Ming Yang Zhou
Zichao Yan
Elliot Layne
Nikolay Malkin
Dinghuai Zhang
Moksh J. Jain
Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history a… (voir plus)nd numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods. Our code is available at https://github.com/zmy1116/phylogfn.
Reference panel-guided super-resolution inference of Hi-C data
Yanlin Zhang
Abstract Motivation Accurately assessing contacts between DNA fragments inside the nucleus with Hi-C experiment is crucial for understanding… (voir plus) the role of 3D genome organization in gene regulation. This challenging task is due in part to the high sequencing depth of Hi-C libraries required to support high-resolution analyses. Most existing Hi-C data are collected with limited sequencing coverage, leading to poor chromatin interaction frequency estimation. Current computational approaches to enhance Hi-C signals focus on the analysis of individual Hi-C datasets of interest, without taking advantage of the facts that (i) several hundred Hi-C contact maps are publicly available and (ii) the vast majority of local spatial organizations are conserved across multiple cell types. Results Here, we present RefHiC-SR, an attention-based deep learning framework that uses a reference panel of Hi-C datasets to facilitate the enhancement of Hi-C data resolution of a given study sample. We compare RefHiC-SR against tools that do not use reference samples and find that RefHiC-SR outperforms other programs across different cell types, and sequencing depths. It also enables high-accuracy mapping of structures such as loops and topologically associating domains. Availability and implementation https://github.com/BlanchetteLab/RefHiC.
Playing the System: Can Puzzle Players Teach us How to Solve Hard Problems?
Renata Mutalova
Roman Sarrazin-Gendron
Eddie Cai
Gabriel Richard
Parham Ghasemloo Gheidari
Sébastien Caisse
Rob Knight
Attila Szantner
Jérôme Waldispühl
Detection and genomic analysis of BRAF fusions in Juvenile Pilocytic Astrocytoma through the combination and integration of multi-omic data
Melissa Zwaig
Audrey Baguette
Bo Hu
Michael Johnston
Hussein Lakkis
Emily M. Nakada
Damien Faury
Nikoleta Juretic
Benjamin Ellezam
Alexandre G. Weil
Jason Karamchandani
Jacek Majewski
Michael D. Taylor
Marco Gallo
Claudia Kleinman
Nada Jabado
Jiannis Ragoussis
Reference panel guided topological structure annotation of Hi-C data
Yanlin Zhang
Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations
Elliot Layne
Jason Hartford
Sébastien Lachapelle
Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning meth… (voir plus)ods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.
Leishmania parasites exchange drug-resistance genes through extracellular vesicles
Noélie Douanne
George Dong
Atia Amin
Lorena Bernardo
David Langlais
Martin Olivier
Christopher Fernandez-Prada
PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information
Faizy Ahsan
Zichao Yan
Abstract Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics … (voir plus)study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.
Reconstruction of full-length LINE-1 progenitors from ancestral genomes
Laura F. Campitelli
Isaac Yellan
Mihai Tudor Albu
Marjan Barazandeh
Zain M. Patel
T. Hughes
Abstract Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, wit… (voir plus)h 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily. Here, we have coupled 2 levels of sequence reconstruction (at the level of whole genomes and L1 subfamilies) to reconstruct progenitor sequences for all human L1 subfamilies that are more functionally and phylogenetically plausible than existing models. Most of the reconstructed sequences are at or near the canonical length of L1s and encode uninterrupted ORFs with expected protein domains. We also show that the presence or absence of binding sites for KRAB-C2H2 Zinc Finger Proteins, even in ancient-reconstructed progenitor L1s, mirrors binding observed in human ChIP-exo experiments, thus extending the arms race and domestication model. RepeatMasker searches of the modern human genome suggest that the new models may be able to assign subfamily resolution identities to previously ambiguous L1 instances. The reconstructed L1 sequences will be useful for genome annotation and functional study of both L1 evolution and L1 contributions to host regulatory networks.
Reconstruction of full-length LINE-1 progenitors from ancestral genomes
Laura F Campitelli
Isaac Yellan
Mihai Albu
Marjan Barazandeh
Zain M Patel
Timothy R Hughes