Portrait of Guy Wolf

Guy Wolf

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, Université de Montréal, Department of Mathematics and Statistics
Concordia University
CHUM - Montreal University Hospital Center
Research Topics
Data Mining
Deep Learning
Dynamical Systems
Graph Neural Networks
Information Retrieval
Learning on Graphs
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Multimodal Learning
Representation Learning
Spectral Learning

Biography

Guy Wolf is an associate professor in the Department of Mathematics and Statistics at Université de Montréal.

His research interests lie at the intersection of machine learning, data science and applied mathematics. He is particularly interested in data mining methods that use manifold learning and deep geometric learning, as well as applications for the exploratory analysis of biomedical data.

Wolf’s research focuses on exploratory data analysis and its applications in bioinformatics. His approaches are multidisciplinary and bring together machine learning, signal processing and applied math tools. His recent work has used a combination of diffusion geometries and deep learning to find emergent patterns, dynamics, and structure in big high dimensional- data (e.g., in single-cell genomics and proteomics).

Current Students

Independent visiting researcher - University of Lorraine
Master's Research - Université de Montréal
Co-supervisor :
Collaborating Alumni
Principal supervisor :
PhD - Université de Montréal
Collaborating Alumni
Collaborating researcher - Western Washington University (faculty; assistant prof))
Co-supervisor :
PhD - Université de Montréal
Master's Research - McGill University
Principal supervisor :
PhD - Université de Montréal
PhD - Concordia University
Principal supervisor :
Master's Research - Université de Montréal
Principal supervisor :
Collaborating researcher - Yale
Postdoctorate - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Concordia University
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
Co-supervisor :
Postdoctorate - Concordia University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Concordia University
Principal supervisor :
Independent visiting researcher
Master's Research - Université de Montréal
Collaborating researcher - Concordia University
Principal supervisor :
Collaborating researcher - Université de Montréal
Co-supervisor :
Collaborating researcher - Yale
PhD - Université de Montréal
Research Intern - Western Washington University
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - McGill University (assistant professor)

Publications

Multi-view manifold learning of human brain state trajectories
Erica Lindsey Busch
Je-chun Huang
Andrew Benz
Tom Wallenstein
Smita Krishnaswamy
Nicholas Turk-Browne
Graph Fourier MMD for signals on data graphs
Samuel Leone
Alexander Tong
Guillaume Huguet
Smita Krishnaswamy
While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (see more)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel a distance between distributions, or non-negative signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called {\em gene localization score} which helps select genes for cellular state space characterization.
Reliability of CKA as a Similarity Measure in Deep Learning
MohammadReza Davari
Stefan Horoi
Amine Natik
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different w… (see more)ays. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of claims about similarity and dissimilarity of these various representations have been made using CKA results. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation to CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counterintuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.
Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
GEODESIC SINKHORN FOR FAST AND ACCURATE OPTIMAL TRANSPORT ON MANIFOLDS
Guillaume Huguet
Alexander Tong
María Ramos Zapatero
Christopher J. Tape
Smita Krishnaswamy
Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods a… (see more)re currently the state-of-the-art for such computations, but require O(n2) computations. In addition, Sinkhorn-based methods commonly use an Euclidean ground distance between datapoints. However, with the prevalence of manifold structured scientific data, it is often desirable to consider geodesic ground distance. Here, we tackle both issues by proposing Geodesic Sinkhorn—based on diffusing a heat kernel on a manifold graph. Notably, Geodesic Sinkhorn requires only O(n log n) computation, as we approximate the heat kernel with Chebyshev polynomials based on the sparse graph Laplacian. We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy. In particular, we define the barycentric distance as the distance between two such barycenters. Using this definition, we identify an optimal transport distance and path associated with the effect of treatment on cellular data.
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Guillaume Huguet
Alexander Tong
Edward De Brouwer
Yanlei Zhang
Ian Adelstein
Smita Krishnaswamy
Inferring Dynamic Regulatory Interaction Graphs From Time Series Data With Perturbations
Dhananjay Bhaskar
Daniel Sumner Magruder
Edward De Brouwer
Matheo Morales
Aarthi Venkat
Frederik Wenkel
Smita Krishnaswamy
Learning Shared Neural Manifolds from Multi-Subject FMRI Data
Jessie Huang
Je-chun Huang
Erica Lindsey Busch
Tom Wallenstein
Michal Gerasimiuk
Andrew Benz
Nicholas Turk-Browne
Smita Krishnaswamy
Functional magnetic resonance imaging (fMRI) data is collected in millions of noisy, redundant dimensions. To understand how different brain… (see more)s process the same stimulus, we aim to denoise the fMRI signal via a meaningful embedding space that captures the data's intrinsic structure as shared across brains. We assume that stimulus-driven responses share latent features common across subjects that are jointly discoverable. Previous approaches to this problem have relied on linear methods like principal component analysis and shared response modeling. We propose a neural network called MRMD-AE (manifold-regularized multiple- decoder, autoencoder) that learns a common embedding from multi-subject fMRI data while retaining the ability to decode individual responses. Our latent common space represents an extensible manifold (where untrained data can be mapped) and improves classification accuracy of stimulus features of unseen timepoints, as well as cross-subject translation of fMRI signals.
Parametric Scattering Networks
Shanel Gauthier
Benjamin Thérien
Laurent Alséne-Racicot
Muawiz Chaudhary
Michael Eickenberg
The wavelet scattering transform creates geometric in-variants and deformation stability. In multiple signal do-mains, it has been shown to … (see more)yield more discriminative rep-resentations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering trans-form are typically selected to create a tight frame via a pa-rameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is op-timal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We show that our learned versions of the scattering transform yield significant performance gains in small-sample classification settings over the standard scat-tering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract effective rep-resentations.
Multiscale PHATE identifies multimodal signatures of COVID-19
Manik Kuchroo
Je-chun Huang
Patrick Wong
Jean-Christophe Grenier
Dennis Shung
Alexander Tong
Carolina Lucas
Jon Klein
Daniel B. Burkhardt
Scott Gigante
Abhinav Godavarthi
Bastian Rieck
Benjamin Israelow
Michael Simonov
Tianyang Mao
Ji Eun Oh
Julio Silva
Takehiro Takahashi
Camila D. Odio
Arnau Casanovas-Massana … (see 10 more)
John Fournier
Shelli Farhadian
Charles S. Dela Cruz
Albert I. Ko
Matthew Hirn
F. Perry Wilson
Akiko Iwasaki
Smita Krishnaswamy
Multiscale PHATE identifies multimodal signatures of COVID-19
Manik Kuchroo
Je-chun Huang
Patrick W. Wong
Jean-Christophe Grenier
Dennis L. Shung
Alexander Tong
C. Lucas
J. Klein
Daniel B. Burkhardt
Scott Gigante
Abhinav Godavarthi
Bastian Rieck
Benjamin Israelow
Michael Simonov
Tianyang Mao
Ji Eun Oh
Julio Silva
Takehiro Takahashi
C. Odio
Arnau Casanovas‐massana … (see 10 more)
John Byrne Fournier
Shelli F. Farhadian
C. D. Dela Cruz
A. Ko
Matthew Hirn
F. Wilson
Akiko Iwasaki
Smita Krishnaswamy
Population Genomics Approaches for Genetic Characterization of SARS-CoV-2 Lineages
Fatima Mostefai
I. Gamache
Arnaud N’Guessan
Justin Pelletier
Jessie Huang
Carmen Lia Murall
Ahmad Pesaranghader
Vanda Gaonac'h-Lovejoy
David J. Hamelin
Raphael Poujol
Jean-Christophe Grenier
Martin W. Smith
Étienne Caron
Morgan Craig
Smita Krishnaswamy
B. Jesse Shapiro
The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19)… (see more), has been sequenced at an unprecedented scale leading to a tremendous amount of viral genome sequencing data. To assist in tracing infection pathways and design preventive strategies, a deep understanding of the viral genetic diversity landscape is needed. We present here a set of genomic surveillance tools from population genetics which can be used to better understand the evolution of this virus in humans. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic diversity of SARS-CoV-2 in first year of the COVID-19 pandemic. We analyzed 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets. This approach enables real-time lineage identification, a clear description of the relationship between variants of concern, and efficient detection of recurrent mutations. Furthermore, time series change of Tajima's D by haplotype provides a powerful metric of lineage expansion. Finally, principal component analysis (PCA) highlights key steps in variant emergence and facilitates the visualization of genomic variation in the context of SARS-CoV-2 diversity. The computational framework presented here is simple to implement and insightful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of populations of humans and other organisms.