Portrait de Guy Wolf

Guy Wolf

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département de mathématiques et statistiques
Concordia University
CHUM - Montreal University Hospital Center
Sujets de recherche
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage multimodal
Apprentissage profond
Apprentissage spectral
Apprentissage sur graphes
Exploration des données
Modélisation moléculaire
Recherche d'information
Réseaux de neurones en graphes
Systèmes dynamiques
Théorie de l'apprentissage automatique

Biographie

Guy Wolf est professeur agrégé au Département de mathématiques et de statistique de l'Université de Montréal. Ses intérêts de recherche se situent au carrefour de l'apprentissage automatique, de la science des données et des mathématiques appliquées. Il s'intéresse particulièrement aux méthodes d'exploration de données qui utilisent l'apprentissage multiple et l'apprentissage géométrique profond, ainsi qu'aux applications pour l'analyse exploratoire des données biomédicales.

Ses recherches portent sur l'analyse exploratoire des données, avec des applications en bio-informatique. Ses approches sont multidisciplinaires et combinent l'apprentissage automatique, le traitement du signal et les outils mathématiques appliqués. En particulier, ses travaux récents utilisent une combinaison de géométries de diffusion et d'apprentissage profond pour trouver des modèles émergents, des dynamiques et des structures dans les mégadonnées à grande dimension (par exemple, dans la génomique et la protéomique de la cellule unique).

Étudiants actuels

Maîtrise recherche - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Collaborateur·rice alumni
Collaborateur·rice alumni - UdeM
Collaborateur·rice alumni - McGill
Maîtrise recherche - Concordia
Superviseur⋅e principal⋅e :
Doctorat - Concordia
Superviseur⋅e principal⋅e :
Stagiaire de recherche - UdeM
Collaborateur·rice alumni - UdeM
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Visiteur de recherche indépendant
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - Concordia
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - Concordia
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Collaborateur·rice de recherche - UdeM
Co-superviseur⋅e :
Postdoctorat - UdeM
Collaborateur·rice de recherche - McGill (assistant professor)

Publications

Multi-view manifold learning of human brain state trajectories
Erica Lindsey Busch
Je-chun Huang
Andrew Benz
Tom Wallenstein
Nicholas Turk-Browne
Graph Fourier MMD for signals on data graphs
While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (voir plus)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel a distance between distributions, or non-negative signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called {\em gene localization score} which helps select genes for cellular state space characterization.
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (voir plus)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (voir plus)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (voir plus)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Improving and generalizing flow-based generative models with minibatch optimal transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (voir plus)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.
Reliability of CKA as a Similarity Measure in Deep Learning
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different w… (voir plus)ays. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of claims about similarity and dissimilarity of these various representations have been made using CKA results. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation to CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counterintuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.
Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have thus far been held back by limitations i… (voir plus)n their simulation-based maximum likelihood training. In this paper, we introduce a new technique called conditional flow matching (CFM), a simulation-free training objective for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, our CFM objec-tive does not require the source distribution to be Gaussian or require evaluation of its density. Based on this new objective, we also introduce optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks such as inferring single cell dynamics, unsupervised image translation, and Schr ¨ odinger bridge inference. Code is available at https://github.com/atong01/ conditional-flow-matching .
Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport
GEODESIC SINKHORN FOR FAST AND ACCURATE OPTIMAL TRANSPORT ON MANIFOLDS
María Ramos Zapatero
Christopher J. Tape
Efficient computation of optimal transport distance between distributions is of growing importance in data science. Sinkhorn-based methods a… (voir plus)re currently the state-of-the-art for such computations, but require O(n2) computations. In addition, Sinkhorn-based methods commonly use an Euclidean ground distance between datapoints. However, with the prevalence of manifold structured scientific data, it is often desirable to consider geodesic ground distance. Here, we tackle both issues by proposing Geodesic Sinkhorn—based on diffusing a heat kernel on a manifold graph. Notably, Geodesic Sinkhorn requires only O(n log n) computation, as we approximate the heat kernel with Chebyshev polynomials based on the sparse graph Laplacian. We apply our method to the computation of barycenters of several distributions of high dimensional single cell data from patient samples undergoing chemotherapy. In particular, we define the barycentric distance as the distance between two such barycenters. Using this definition, we identify an optimal transport distance and path associated with the effect of treatment on cellular data.
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction
Edward De Brouwer
Yanlei Zhang
Ian Adelstein
Inferring Dynamic Regulatory Interaction Graphs From Time Series Data With Perturbations
Dhananjay Bhaskar
Daniel Sumner Magruder
Edward De Brouwer
Matheo Morales
Aarthi Venkat