Portrait of Guy Wolf

Guy Wolf

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, Université de Montréal, Department of Mathematics and Statistics
Concordia University
CHUM - Montreal University Hospital Center
Research Topics
Data Mining
Deep Learning
Dynamical Systems
Graph Neural Networks
Information Retrieval
Learning on Graphs
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Multimodal Learning
Representation Learning
Spectral Learning

Biography

Guy Wolf is an associate professor in the Department of Mathematics and Statistics at Université de Montréal.

His research interests lie at the intersection of machine learning, data science and applied mathematics. He is particularly interested in data mining methods that use manifold learning and deep geometric learning, as well as applications for the exploratory analysis of biomedical data.

Wolf’s research focuses on exploratory data analysis and its applications in bioinformatics. His approaches are multidisciplinary and bring together machine learning, signal processing and applied math tools. His recent work has used a combination of diffusion geometries and deep learning to find emergent patterns, dynamics, and structure in big high dimensional- data (e.g., in single-cell genomics and proteomics).

Current Students

Master's Research - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Collaborating Alumni
Collaborating Alumni - Université de Montréal
Collaborating researcher - Western Washington University (faculty; assistant prof))
Co-supervisor :
PhD - Université de Montréal
Collaborating Alumni - McGill University
Master's Research - Concordia University
Principal supervisor :
PhD - Université de Montréal
PhD - Concordia University
Principal supervisor :
Master's Research - Université de Montréal
Principal supervisor :
Research Intern - Université de Montréal
Postdoctorate - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Concordia University
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Postdoctorate - Concordia University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Concordia University
Principal supervisor :
Master's Research - Université de Montréal
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Université de Montréal
Co-supervisor :
PhD - Université de Montréal
Research Intern - Western Washington University
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - McGill University (assistant professor)

Publications

Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming
Mostafa ElAraby
Data Imputation with an Autoencoder and MAGIC
Devin Eddington
Andres Felipe Duque Correa
Kevin R. Moon
Missing data is a common problem in many applications. Imputing missing values is a challenging task, as the imputations need to be accurate… (see more) and robust to avoid introducing bias in downstream analysis. In this paper, we propose an ensemble method that combines the strengths of a manifold learning-based imputation method called MAGIC and an autoencoder deep learning model. We call our method Deep MAGIC. Deep MAGIC is trained on a linear combination of the mean squared error of the original data and the mean squared error of the MAGIC-imputed data. Experimental results on three benchmark datasets show that Deep MAGIC outperforms several state-of-the-art imputation methods, demonstrating its effectiveness and robustness in handling large amounts of missing data.
Graph Fourier MMD for Signals on Graphs
Samuel Leone
Aarthi Venkat
Guillaume Huguet
Alexander Tong
While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (see more)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel distance between distributions and signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes the difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called gene localization score which helps select genes for cellular state space characterization.
Manifold Alignment with Label Information
Andres F. Duque Correa
Myriam Lizotte
Kevin R. Moon
Multi-domain data is becoming increasingly common and presents both challenges and opportunities in the data science community. The integrat… (see more)ion of distinct data-views can be used for exploratory data analysis, and benefit downstream analysis including machine learning related tasks. With this in mind, we present a novel manifold alignment method called MALI (Manifold alignment with label information) that learns a correspondence between two distinct domains. MALI belongs to a middle ground between the more commonly addressed semi-supervised manifold alignment, where some known correspondences between the two domains are assumed to be known beforehand, and the purely unsupervised case, where no information linking both domains is available. To do this, MALI learns the manifold structure in both domains via a diffusion process and then leverages discrete class labels to guide the alignment. MALI recovers a pairing and a common representation that reveals related samples in both domains. We show that MALI outperforms the current state-of-the-art manifold alignment methods across multiple datasets.
Single-cell analysis reveals inflammatory interactions driving macular degeneration
Manik Kuchroo
Marcello DiStasio
Eric Song
Eda Calapkulu
Le Zhang
Maryam Ige
Amar H. Sheth
Abdelilah Majdoubi
Madhvi Menon
Alexander Tong
Abhinav Godavarthi
Yu Xing
Scott Gigante
Holly Steach
Jessie Huang
Je-chun Huang
Guillaume Huguet
Janhavi Narain
Kisung You
George Mourgkos … (see 6 more)
Rahul M. Dhodapkar
Matthew Hirn
Bastian Rieck
Brian P. Hafler
Neural FIM for learning Fisher Information Metrics from point cloud data
Oluwadamilola Fasina
Guillaume Huguet
Alexander Tong
Yanlei Zhang
Maximilian Nickel
Ian Adelstein
Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the under… (see more)lying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM's utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).
Multi-view manifold learning of human brain state trajectories
Erica Lindsey Busch
Je-chun Huang
Andrew Benz
Tom Wallenstein
Nicholas Turk-Browne
Graph Fourier MMD for signals on data graphs
Samuel Leone
Alexander Tong
Guillaume Huguet
While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (see more)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel a distance between distributions, or non-negative signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called {\em gene localization score} which helps select genes for cellular state space characterization.
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.