Mathieu Blanchette

Membre académique associé

Directeur et professeur associé, McGill University, École d'informatique

Biographie

Mathieu Blanchette est professeur associé et directeur de l'École d'informatique de l'Université McGill.

Après avoir obtenu un doctorat (Université de Washington, 2002) et un postdoctorat (Université de Californie à Santa Cruz, 2003), il s'est joint à l'École d'informatique de l’Université McGill et a fondé le Laboratoire de génomique computationnelle. Les recherches effectuées par son équipe d’exception ont fait l'objet de plus de 70 publications. Récemment élu membre du Collège de nouveaux chercheurs et créateurs en art et science de la Société royale du Canada, il a été boursier Sloan (2009) et a reçu le prix Outstanding Young Computer Scientist Researcher de l'Association canadienne de l'informatique (2012) ainsi que le prix Chris Overton (2006). Il adore enseigner et superviser les étudiant·e·s, et a d’ailleurs reçu le prix Leo Yaffe pour l'enseignement (2008).

Étudiants actuels

Cesar Miguel Valdez Cordova

Doctorat - McGill University

cesar.valdez@mila.quebec

Doctorat - McGill University

elliot.layne@mila.quebec

Google Scholar

Lucas Nelson

Baccalauréat

lucas.nelson@mila.quebec

Nicole Zhang

Doctorat - McGill University

Co-superviseur⋅e :

Maîtrise recherche - McGill University

noah.elrimawi-fine@mila.quebec

Github

Publications

Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations

Elliot Layne

Jason Hartford

Sébastien Lachapelle

Mathieu Blanchette

Dhanya Sridhar

Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning meth… (voir plus)ods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.

2022-07-09

auai.org/UAI/2022/Workshop/CRL (poster)

doi.org

openreview.net

Leishmania parasites exchange drug-resistance genes through extracellular vesicles

Noélie Douanne

George Dong

Atia Amin

Lorena Bernardo

Mathieu Blanchette

David Langlais

Martin Olivier

Christopher Fernandez-Prada

2022-07-01

Cell Reports (publié)

doi.org

PhyloPGM: boosting regulatory function prediction accuracy using evolutionary information

Faizy Ahsan

Zichao Yan

Doina Precup

Mathieu Blanchette

Abstract Motivation The computational prediction of regulatory function associated with a genomic sequence is of utter importance in -omics … (voir plus)study, which facilitates our understanding of the underlying mechanisms underpinning the vast gene regulatory network. Prominent examples in this area include the binding prediction of transcription factors in DNA regulatory regions, and predicting RNA–protein interaction in the context of post-transcriptional gene expression. However, existing computational methods have suffered from high false-positive rates and have seldom used any evolutionary information, despite the vast amount of available orthologous data across multitudes of extant and ancestral genomes, which readily present an opportunity to improve the accuracy of existing computational methods. Results In this study, we present a novel probabilistic approach called PhyloPGM that leverages previously trained TFBS or RNA–RBP binding predictors by aggregating their predictions from various orthologous regions, in order to boost the overall prediction accuracy on human sequences. Throughout our experiments, PhyloPGM has shown significant improvement over baselines such as the sequence-based RNA–RBP binding predictor RNATracker and the sequence-based TFBS predictor that is known as FactorNet. PhyloPGM is simple in principle, easy to implement and yet, yields impressive results. Availability and implementation The PhyloPGM package is available at https://github.com/BlanchetteLab/PhyloPGM Supplementary information Supplementary data are available at Bioinformatics online.

2022-06-27

Bioinformatics (publié)

doi.org

Reconstruction of full-length LINE-1 progenitors from ancestral genomes

Laura F. Campitelli

Isaac Yellan

Mihai Tudor Albu

Marjan Barazandeh

Zain M. Patel

Mathieu Blanchette

T. Hughes

Abstract Sequences derived from the Long INterspersed Element-1 (L1) family of retrotransposons occupy at least 17% of the human genome, wit… (voir plus)h 67 distinct subfamilies representing successive waves of expansion and extinction in mammalian lineages. L1s contribute extensively to gene regulation, but their molecular history is difficult to trace, because most are present only as truncated and highly mutated fossils. Consequently, L1 entries in current databases of repeat sequences are composed mainly of short diagnostic subsequences, rather than full functional progenitor sequences for each subfamily. Here, we have coupled 2 levels of sequence reconstruction (at the level of whole genomes and L1 subfamilies) to reconstruct progenitor sequences for all human L1 subfamilies that are more functionally and phylogenetically plausible than existing models. Most of the reconstructed sequences are at or near the canonical length of L1s and encode uninterrupted ORFs with expected protein domains. We also show that the presence or absence of binding sites for KRAB-C2H2 Zinc Finger Proteins, even in ancient-reconstructed progenitor L1s, mirrors binding observed in human ChIP-exo experiments, thus extending the arms race and domestication model. RepeatMasker searches of the modern human genome suggest that the new models may be able to assign subfamily resolution identities to previously ambiguous L1 instances. The reconstructed L1 sequences will be useful for genome annotation and functional study of both L1 evolution and L1 contributions to host regulatory networks.

2022-05-12

Genetics (published)

doi.org

Reconstruction of full-length LINE-1 progenitors from ancestral genomes

Laura F Campitelli

Isaac Yellan

Mihai Albu

Marjan Barazandeh

Zain M Patel

Mathieu Blanchette

Timothy R Hughes

2022-05-12

Genetics (publié)

doi.org

Phylogenetic Manifold Regularization: A semi-supervised approach to predict transcription factor binding sites

Faizy Ahsan

Alexandre Drouin

François Laviolette

Doina Precup

Mathieu Blanchette

The computational prediction of transcription factor binding sites remains a challenging problems in bioinformatics, despite significant met… (voir plus)hodological developments from the field of machine learning. Such computational models are essential to help interpret the non-coding portion of human genomes, and to learn more about the regulatory mechanisms controlling gene expression. In parallel, massive genome sequencing efforts have produced assembled genomes for hundred of vertebrate species, but this data is underused. We present PhyloReg, a new semi-supervised learning approach that can be used for a wide variety of sequence-to-function prediction problems, and that takes advantage of hundreds of millions of years of evolution to regularize predictors and improve accuracy. We demonstrate that PhyloReg can be used to better train a previously proposed deep learning model of transcription factor binding. Simulation studies further help delineate the benefits of the a pproach. G ains in prediction accuracy are obtained over a broad set of transcription factors and cell types.

2020-12-16

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (publié)

doi.org

Algorithms in Bioinformatics

P. Agarwal

Tatsuya Akutsu

Amir Amihood

Alberto Apostolico

C. Benham

Gary Gustaf Benson

Mathieu Blanchette

Nadia El-Mabrouk

Olivier Gascuel

Raffaele Giancarlo

R. Guigó

Michael Hallet

D. Huson

G. Kucherov

Michelle R. Lacey

Jens Lagergren

Giuseppe Lancia

Gad M. Landau

Thierry. Lecroq

B. Moret … (voir 21 de plus)

S. Morishita

Elchanan Mossel

Vincent Moulton

Lior S. Pachter

Knut Reinert

I. Rigoutsos

David Sankoff

Sophie Schbath

Eran Segal

Charles Semple

J. Setubal

Roded Sharan

S. Skiena

Jens Stoye

Esko Ukkonen

Lisa Allen Vawter

Alfonso Valencia

Tandy J. Warnow

Lusheng Wang

Rita Casadio

Gene Myers

2012-10-01

Lecture Notes in Computer Science (publié)

doi.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mathieu Blanchette

Biographie

Étudiants actuels

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Mathieu Blanchette

Biographie

Étudiants actuels

Publications