Guy Wolf

Biographie

Guy Wolf est professeur agrégé au Département de mathématiques et de statistique de l'Université de Montréal. Ses intérêts de recherche se situent au carrefour de l'apprentissage automatique, de la science des données et des mathématiques appliquées. Il s'intéresse particulièrement aux méthodes d'exploration de données qui utilisent l'apprentissage multiple et l'apprentissage géométrique profond, ainsi qu'aux applications pour l'analyse exploratoire des données biomédicales.

Ses recherches portent sur l'analyse exploratoire des données, avec des applications en bio-informatique. Ses approches sont multidisciplinaires et combinent l'apprentissage automatique, le traitement du signal et les outils mathématiques appliqués. En particulier, ses travaux récents utilisent une combinaison de géométries de diffusion et d'apprentissage profond pour trouver des modèles émergents, des dynamiques et des structures dans les mégadonnées à grande dimension (par exemple, dans la génomique et la protéomique de la cellule unique).

Étudiants actuels

Ria Arora

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

semihcanturk00@gmail.com

Collaborateur·rice alumni

Enrique Fita Sanmartin

Collaborateur·rice alumni - UdeM

Kameron Harris

Collaborateur·rice de recherche - Western Washington University (faculty; assistant prof))

Co-superviseur⋅e :

Doctorat - UdeM

Will Hua

Collaborateur·rice alumni - McGill

Xiaolong Huang

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Paul Janson

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Charles-Etienne Joseph

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

M. Elyes Kanoun

Stagiaire de recherche - UdeM

Vincent Létourneau

Postdoctorat - UdeM

Myriam Lizotte

Doctorat - UdeM

Philippe Martin

Doctorat - UdeM

Co-superviseur⋅e :

Paul François

Paria Mehrbod

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Lydia Mezrag

Doctorat - UdeM

Sacha Morin

Doctorat - UdeM

Co-superviseur⋅e :

Postdoctorat - Concordia

Superviseur⋅e principal⋅e :

geraldin.nanfack@mila.quebec

Amine Natik

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Guillaume Lajoie

Shuang Ni

Doctorat - UdeM

stephanie.zandee@mcgill.ca

Albert Orozco Camacho

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Matthew Scicluna

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Stagiaire de recherche - Western Washington University

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Collaborateur·rice de recherche - McGill (assistant professor)

Analyser le paradoxe des interférons inhérent à la COVID-19 au moyen de la réduction de la dimensionnalité et du regroupement

Billets de blogue

Graph and representation of working methodology, and graph of data on deaths 60 days after onset of symptoms.

19 février 2025

par

Sacha Morin

Elsa Brunet-Ratnasingham

Guy Wolf

Lire l'article

Publications

Graph Fourier MMD for Signals on Graphs

Samuel Leone

Aarthi Venkat

Guillaume Huguet

Alexander Tong

While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (voir plus)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel distance between distributions and signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes the difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called gene localization score which helps select genes for cellular state space characterization.

2023-05-21

SampTA/2023/Conference (publié)

Manifold Alignment with Label Information

Andres F. Duque Correa

Myriam Lizotte

Kevin R. Moon

Multi-domain data is becoming increasingly common and presents both challenges and opportunities in the data science community. The integrat… (voir plus)ion of distinct data-views can be used for exploratory data analysis, and benefit downstream analysis including machine learning related tasks. With this in mind, we present a novel manifold alignment method called MALI (Manifold alignment with label information) that learns a correspondence between two distinct domains. MALI belongs to a middle ground between the more commonly addressed semi-supervised manifold alignment, where some known correspondences between the two domains are assumed to be known beforehand, and the purely unsupervised case, where no information linking both domains is available. To do this, MALI learns the manifold structure in both domains via a diffusion process and then leverages discrete class labels to guide the alignment. MALI recovers a pairing and a common representation that reveals related samples in both domains. We show that MALI outperforms the current state-of-the-art manifold alignment methods across multiple datasets.

2023-05-21

SampTA/2023/Conference (publié)

Single-cell analysis reveals inflammatory interactions driving macular degeneration

Manik Kuchroo

Marcello DiStasio

Eric Song

Eda Calapkulu

Le Zhang

Maryam Ige

Amar H. Sheth

Abdelilah Majdoubi

Madhvi Menon

Alexander Tong

Abhinav Godavarthi

Yu Xing

Scott Gigante

Holly Steach

Jessie Huang

Je-chun Huang

Guillaume Huguet

Janhavi Narain

Kisung You

George Mourgkos … (voir 6 de plus)

Rahul M. Dhodapkar

Matthew Hirn

Bastian Rieck

Brian P. Hafler

2023-05-05

Nature Communications (publié)

Neural FIM for learning Fisher Information Metrics from point cloud data

Oluwadamilola Fasina

Guillaume Huguet

Alexander Tong

Yanlei Zhang

Maximilian Nickel

Ian Adelstein

Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the under… (voir plus)lying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM's utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).

2023-04-24

ICML.cc/2023/Conference (poster)

Multi-view manifold learning of human brain state trajectories

Erica Lindsey Busch

Je-chun Huang

Andrew Benz

Tom Wallenstein

Guillaume Lajoie

Nicholas Turk-Browne

2023-03-27

Nature Computational Science (publié)

Graph Fourier MMD for signals on data graphs

Samuel Leone

Alexander Tong

Guillaume Huguet

While numerous methods have been proposed for computing distances between probability distributions in Euclidean space, relatively little at… (voir plus)tention has been given to computing such distances for distributions on graphs. However, there has been a marked increase in data that either lies on graph (such as protein interaction networks) or can be modeled as a graph (single cell data), particularly in the biomedical sciences. Thus, it becomes important to find ways to compare signals defined on such graphs. Here, we propose Graph Fourier MMD (GFMMD), a novel a distance between distributions, or non-negative signals on graphs. GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation between the pair of distributions on the graph. We find an analytical solution to this optimization problem as well as an embedding of distributions that results from this method. We also prove several properties of this method including scale invariance and applicability to disconnected graphs. We showcase it on graph benchmark datasets as well on single cell RNA-sequencing data analysis. In the latter, we use the GFMMD-based gene embeddings to find meaningful gene clusters. We also propose a novel type of score for gene selection called {\em gene localization score} which helps select genes for cellular state space characterization.

2023-02-01

ICLR.cc/2023/Conference (rejected)

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (voir plus)mulation-based maximum likelihood training. We introduce the generalized conditional flow matching (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, we show that when the true OT plan is available, our OT-CFM method approximates dynamic OT. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schr\"odinger bridge inference.

2023-02-01

ArXiv (prépublication)

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

2023-02-01

ArXiv (prépublication)

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

2023-02-01

ArXiv (prépublication)

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

2023-02-01

ArXiv (prépublication)

Reliability of CKA as a Similarity Measure in Deep Learning

MohammadReza Davari

Stefan Horoi

Amine Natik

Guillaume Lajoie

Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different w… (voir plus)ays. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of claims about similarity and dissimilarity of these various representations have been made using CKA results. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation to CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counterintuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.

2023-02-01

ICLR.cc/2023/Conference (poster)

Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

Continuous normalizing ﬂows (CNFs) are an attractive generative modeling technique, but they have thus far been held back by limitations i… (voir plus)n their simulation-based maximum likelihood training. In this paper, we introduce a new technique called conditional ﬂow matching (CFM), a simulation-free training objective for CNFs. CFM features a stable regression objective like that used to train the stochastic ﬂow in diffusion models but enjoys the efﬁcient inference of deterministic ﬂow models. In contrast to both diffusion models and prior CNF training algorithms, our CFM objec-tive does not require the source distribution to be Gaussian or require evaluation of its density. Based on this new objective, we also introduce optimal transport CFM (OT-CFM), which creates simpler ﬂows that are more stable to train and lead to faster inference, as evaluated in our experiments. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks such as inferring single cell dynamics, unsupervised image translation, and Schr ¨ odinger bridge inference. Code is available at https://github.com/atong01/ conditional-flow-matching .

2023-01-01

arXiv.org (prépublication)