Portrait de Guy Wolf

Guy Wolf

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur titulaire, Université de Montréal, Département de mathématiques et statistiques
Concordia University
CHUM - Montreal University Hospital Center
Sujets de recherche
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage multimodal
Apprentissage profond
Apprentissage spectral
Apprentissage sur graphes
Exploration des données
Modélisation moléculaire
Recherche d'information
Réseaux de neurones en graphes
Systèmes dynamiques
Théorie de l'apprentissage automatique

Biographie

Guy Wolf est professeur titulaire au Département de mathématiques et de statistique (DMS) de l'Université de Montréal (UdeM), titulaire d'une chaire en IA Canada-CIFAR et membre académique principal de Mila (l'Institut québécois d'intelligence artificielle), chercheur associé au CRCHUM (Centre de recherche du Centre hospitalier de l'Université de Montréal) et chercheur principal participant au Laboratoire international Helmholtz pour la dynamique cellulaire causale.

En 2024, il a reçu une bourse de recherche Humboldt pour chercheurs expérimentés, dans le cadre de laquelle il a été professeur invité à l'Université de Heidelberg (2024) et à Helmholtz Munich (2024-2026) en Allemagne. Avant de joindre l'UdeM et Mila, il a été professeur adjoint Gibbs (2015-2018) au sein du programme de mathématiques appliquées, puis chercheur scientifique associé au Département de génétique (2018) de l'Université Yale (Connecticut, États-Unis). Auparavant, il a travaillé comme chercheur postdoctoral (2013-2015) au Département d'informatique de l'École normale supérieure à Paris (France). Il détient un doctorat en informatique de l'Université de Tel-Aviv (Israel) et possède cinq ans d'expérience préalable en conception et développement de logiciels informatiques pour l'analyse de données en contexte militaire.

Ses recherches actuelles portent sur l'apprentissage guidé de représentations pour l'exploration de données, notamment par des méthodes qui exploitent l'apprentissage de variétés (manifold learning) et l'apprentissage profond géométrique pour la réduction de dimensionnalité, la visualisation, le débruitage, l'augmentation de données et la modélisation à gros grains (coarse graining). Bien que ces approches s'appliquent à un large éventail de domaines, il s'intéresse particulièrement à l'intersection de l'IA et de la santé, notamment aux outils facilitant l'analyse exploratoire de données biomédicales, comme dans les domaines de la multiomique sur cellule unique (single-cell multiomics), de la découverte de médicaments et des neurosciences.

Étudiants actuels

Doctorat - UdeM
Collaborateur·rice de recherche - Yale University
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Doctorat - Concordia
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Visiteur de recherche indépendant - Helmholtz Munich
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Collaborateur·rice de recherche
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - Concordia
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - Concordia
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - BYU
Maîtrise recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Maîtrise recherche - UdeM
Collaborateur·rice alumni - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - McGill (assistant professor)

Publications

Recipe for a General, Powerful, Scalable Graph Transformer
Vijay Prakash Dwivedi
Anh Tuan Luu
We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art result… (voir plus)s on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being
Patient health records and whole viral genomes from an early SARS-CoV-2 outbreak in a Quebec hospital reveal features associated with favorable outcomes
Bastien Paré
Marieke Rozendaal
Raphaël Poujol
Shawn M. Simpson
Jean-Christophe Grenier
Henry Xing
Miguelle Sanchez
Ariane Yechouron
Ronald Racette
Julie G. Hussin
Ivan Pavlov
Martin A. Smith
The first confirmed case of COVID-19 in Quebec, Canada, occurred at Verdun Hospital on February 25, 2020. A month later, a localized outbrea… (voir plus)k was observed at this hospital. We performed tiled amplicon whole genome nanopore sequencing on nasopharyngeal swabs from all SARS-CoV-2 positive samples from 31 March to 17 April 2020 in 2 local hospitals to assess the viral diversity of the outbreak. We report 264 viral genomes from 242 individuals (both staff and patients) with associated clinical features and outcomes, as well as longitudinal samples, technical replicates and the first publicly disseminated SARS-CoV-2 genomes in Quebec. Viral lineage assessment identified multiple subclades in both hospitals, with a predominant subclade in the Verdun outbreak, indicative of hospital-acquired transmission. Dimensionality reduction identified two subclades that evaded supervised lineage assignment methods, including Pangolin, and identified certain symptoms (headache, myalgia and sore throat) that are significantly associated with favorable patient outcomes. We also address certain limitations of standard SARS-CoV-2 bioinformatics procedures, notably when presented with multiple viral haplotypes.
Fixing Bias in Reconstruction-Based Anomaly Detection with Lipschitz Discriminators
Anomaly detection is of great interest in fields where abnormalities need to be identified and corrected (e.g., medicine and finance). Deep … (voir plus)learning methods for this task often rely on autoencoder reconstruction error, sometimes in conjunction with other errors. We show that this approach exhibits intrinsic biases that lead to undesirable results. Reconstruction-based methods are sensitive to training-data outliers and simple-to-reconstruct points. Instead, we introduce a new unsupervised Lipschitz anomaly discriminator that does not suffer from these biases. Our anomaly discriminator is trained, similar to the ones used in GANs, to detect the difference between the training data and corruptions of the training data. We show that this procedure successfully detects unseen anomalies with guarantees on those that have a certain Wasserstein distance from the data or corrupted training set. These additions allow us to show improved performance on MNIST, CIFAR10, and health record data.
Data-driven approaches for genetic characterization of SARS-CoV-2 lineages
Isabel Gamache
Arnaud N’Guessan
Justin Pelletier
Carmen Lia Murall
Raphaël Poujol
Jean-Christophe Grenier
Martin Smith
Etienne Caron
Morgan Craig
Jesse Shapiro
Julie G. Hussin
The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19)… (voir plus), has been sequenced at an unprecedented scale, leading to a tremendous amount of viral genome sequencing data. To understand the evolution of this virus in humans, and to assist in tracing infection pathways and designing preventive strategies, we present a set of computational tools that span phylogenomics, population genetics and machine learning approaches. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic diversity of SARS-CoV-2 in first year of the COVID-19 pandemic, using 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets, enabling real-time analyses. Furthermore, time series change of Tajima’s D provides a powerful metric of population expansion. Unsupervised learning techniques further highlight key steps in variant detection and facilitate the study of the role of this genomic variation in the context of SARS-CoV-2 infection, with Multiscale PHATE methodology identifying fine-scale structure in the SARS-CoV-2 genetic data that underlies the emergence of key lineages. The computational framework presented here is useful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of worldwide populations of humans and other organisms.
Embedding Signals on Graphs with Unbalanced Diffusion Earth Mover's Distance
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (voir plus)s in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying graph. Typically, EMD is computed by optimizing over the cost of transporting one probability distribution to another over an underlying metric space. However, this is inefficient when computing the EMD between many signals. Here, we propose an unbalanced graph EMD that efficiently embeds the unbalanced EMD on an underlying graph into an
Extendable and invertible manifold learning with geometry regularized autoencoders
Andrés F. Duque
Kevin Moon
A fundamental task in data exploration is to extract simplified low dimensional representations that capture intrinsic geometry in data, esp… (voir plus)ecially for faithfully visualizing data in two or three dimensions. Common approaches to this task use kernel methods for manifold learning. However, these methods typically only provide an embedding of fixed input data and cannot extend to new data points. Autoencoders have also recently become popular for representation learning. But while they naturally compute feature extractors that are both extendable to new data and invertible (i.e., reconstructing original features from latent representation), they have limited capabilities to follow global intrinsic geometry compared to kernel-based manifold learning. We present a new method for integrating both approaches by incorporating a geometric regularization term in the bottleneck of the autoencoder. Our regularization, based on the diffusion potential distances from the recently-proposed PHATE visualization method, encourages the learned latent representation to follow intrinsic data geometry, similar to manifold learning algorithms, while still enabling faithful extension to new data and reconstruction of data in the original feature space from latent coordinates. We compare our approach with leading kernel methods and autoencoder models for manifold learning to provide qualitative and quantitative evidence of our advantages in preserving intrinsic structure, out of sample extension, and reconstruction. Our method is easily implemented for big-data applications, whereas other methods are limited in this regard.
Multiscale PHATE Exploration of SARS-CoV-2 Data Reveals Multimodal Signatures of Disease
Manik Kuchroo
Patrick Wong
Jean-Christophe Grenier
Dennis Shung
Carolina Lucas
Jon Klein
Daniel B. Burkhardt
Scott Gigante
Abhinav Godavarthi
Benjamin Israelow
Tianyang Mao
Ji Eun Oh
Julio Silva
Takehiro Takahashi
Camila D. Odio
Arnau Casanovas-Massana
John Fournier
Shelli Farhadian … (voir 7 de plus)
Charles S. Dela Cruz
Albert I. Ko
F. Perry Wilson
Akiko Iwasaki
Abstract

The biomedical community is producing increasingly high dimensional datasets, integrated from hundreds of… (voir plus) patient samples, which current computational techniques struggle to explore. To uncover biological meaning from these complex datasets, we present an approach called Multiscale PHATE, which learns abstracted biological features from data that can be directly predictive of disease. Built on a coarse graining process called diffusion condensation, Multiscale PHATE learns a data topology that can be analyzed at coarse levels for high level summarizations of data, as well as at fine levels for detailed representations on subsets. We apply Multiscale PHATE to study the immune response to COVID-19 in 54 million cells from 168 hospitalized patients. Through our analysis of patient samples, we identify CD16-hi,CD66b-lo neutrophil and IFNγ+,GranzymeB+ Th17 cell responses enriched in patients who die. Furthermore, we show that population groupings Multiscale PHATE discovers can be directly fed into a classifier to predict disease outcome. We also use Multiscale PHATE-derived features to construct two different manifolds of patients, one from abstracted flow cytometry features and another directly on patient clinical features, both associating immune subsets and clinical markers with outcome.

Geometric Wavelet Scattering Networks on Compact Riemannian Manifolds
Michael Perlmutter
Feng Gao
Matthew Hirn
The Euclidean scattering transform was introduced nearly a decade ago to improve the mathematical understanding of convolutional neural netw… (voir plus)orks. Inspired by recent interest in geometric deep learning, which aims to generalize convolutional neural networks to manifold and graph-structured domains, we define a geometric scattering transform on manifolds. Similar to the Euclidean scattering transform, the geometric scattering transform is based on a cascade of wavelet filters and pointwise nonlinearities. It is invariant to local isometries and stable to certain types of diffeomorphisms. Empirical results demonstrate its utility on several geometric learning tasks. Our results generalize the deformation stability and local translation invariance of Euclidean scattering, and demonstrate the importance of linking the used filter structures to the underlying geometry of the data.