Guy Wolf

Biography

Guy Wolf is an associate professor in the Department of Mathematics and Statistics at Université de Montréal.

His research interests lie at the intersection of machine learning, data science and applied mathematics. He is particularly interested in data mining methods that use manifold learning and deep geometric learning, as well as applications for the exploratory analysis of biomedical data.

Wolf’s research focuses on exploratory data analysis and its applications in bioinformatics. His approaches are multidisciplinary and bring together machine learning, signal processing and applied math tools. His recent work has used a combination of diffusion geometries and deep learning to find emergent patterns, dynamics, and structure in big high dimensional- data (e.g., in single-cell genomics and proteomics).

Current Students

Adrien Aumon

PhD - Université de Montréal

Semih Cantürk

PhD - Université de Montréal

Joao Felipe Carneiro Barbosa Rocha

Collaborating researcher - Yale University

Co-supervisor :

Collaborating Alumni

PhD - Université de Montréal

Xiao Huang

Master's Research - Concordia University

Principal supervisor :

PhD - Université de Montréal

Paul Janson

PhD - Concordia University

Principal supervisor :

PhD - Université de Montréal

Philippe Martin

PhD - Université de Montréal

Co-supervisor :

Paul François

Paria Mehrbod

Master's Research - Concordia University

Principal supervisor :

Lydia Mezrag

PhD - Université de Montréal

Kevin Moon

Collaborating researcher

Sacha Morin

PhD - Université de Montréal

Co-supervisor :

Postdoctorate - Concordia University

Principal supervisor :

Shuang Ni

PhD - Université de Montréal

Albert Orozco Camacho

PhD - Concordia University

Principal supervisor :

Master's Research - Université de Montréal

Matthew Scicluna

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Francisco Tellez

Master's Research - Université de Montréal

Pedro Vianna

Collaborating researcher - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Zhang Yanlei

Postdoctorate - Université de Montréal

Co-supervisor :

Collaborating researcher - McGill University (assistant professor)

Exploring the COVID-19 Interferon Paradox with Dimensionality Reduction and Clustering

Blog Posts

Graph and representation of working methodology, and graph of data on deaths 60 days after onset of symptoms.

February 19, 2025

Sacha Morin

Elsa Brunet-Ratnasingham

Guy Wolf

Read the article

Publications

Data Visualization using Functional Data Analysis

Haozhe Chen

Andres Felipe Duque Correa

Kevin R. Moon

Data visualization via dimensionality reduction is an important tool in exploratory data analysis. However, when the data are noisy, many ex… (see more)isting methods fail to capture the underlying structure of the data. Furthermore, existing methods that can theoretically eliminate all noise are difficult to implement in high dimensions. Here we propose a new data visualization method called Functional Information Geometry (FIG) for dynamical processes that denoises the data by leveraging time information and mitigates the curse of dimensionality using approaches from functional data analysis. We experimentally demonstrate that FIG outperforms other methods in terms of capturing the true structure, hyperparameter robustness, and computational speed. We then use our method to visualize EEG brain measurements of sleep activity.

2025-03-25

SampTA/2025/Conference (poster)

Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings

Billy Joe Franks

Moshe Eliasof

Semih Cantürk

Carola-Bibiane Schönlieb

Sophie Fellenz

Marius Kloft

Recent advances in integrating positional and structural encodings (PSEs) into graph neural networks (GNNs) have significantly enhanced thei… (see more)r performance across various graph learning tasks. However, the general applicability of these encodings and their potential to serve as foundational representations for graphs remain uncertain. This paper investigates the fine-tuning efficiency, scalability with sample size, and generalization capability of learnable PSEs across diverse graph datasets. Specifically, we evaluate their potential as universal pre-trained models that can be easily adapted to new tasks with minimal fine-tuning and limited data. Furthermore, we assess the expressivity of the learned representations, particularly, when used to augment downstream GNNs. We demonstrate through extensive benchmarking and empirical analysis that PSEs generally enhance downstream models. However, some datasets may require specific PSE-augmentations to achieve optimal performance. Nevertheless, our findings highlight their significant potential to become integral components of future graph foundation models. We provide new insights into the strengths and limitations of PSEs, contributing to the broader discourse on foundation models in graph learning.

2025-03-07

TMLR (accepted)

Random Forest Autoencoders for Guided Representation Learning

Kevin R. Moon

Jake S. Rhodes

Decades of research have produced robust methods for unsupervised data visualization, yet supervised visualization…

2025-02-18

ArXiv (preprint)

Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation

Pedro Vianna

Muawiz Chaudhary

Paria Mehrbod

An Tang

Guy Cloutier

Michael Eickenberg

Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the … (see more)data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks. It is implemented by recalculating batch normalization statistics on test batches. Prior work has focused on analysis with test data that has the same label distribution as the training data. However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure. This presents a risk in applying test time adaptation methods in deployment. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. Our selection scheme is based on two principles that we empirically motivate: (1) later layers of networks are more sensitive to label shift (2) individual features can be sensitive to specific classes. We apply the proposed technique to three classification tasks, including CIFAR10-C, Imagenet-C, and diagnosis of fatty liver, where we explore both covariate and label distribution shifts. We find that our method allows to bring the benefits of TTA while significantly reducing the risk of failure common in other methods, while being robust to choice in hyperparameters.

2025-02-17

Proceedings of The 3rd Conference on Lifelong Learning Agents (published)

Principal Curvatures Estimation with Applications to Single Cell Data

Yanlei Zhang

Lydia Mezrag

Xingzhi Sun

Charles Xu

Kincaid MacDonald

Dhananjay Bhaskar

Bastian Rieck

2025-02-06

ArXiv (preprint)

Principal Curvatures Estimation with Applications to Single Cell Data

Yanlei Zhang

Lydia Mezrag

Xingzhi Sun

Charles Xu

Kincaid MacDonald

Dhananjay Bhaskar

Bastian Rieck

The rapidly growing field of single-cell transcriptomic sequencing (scRNAseq) presents challenges for data analysis due to its massive datas… (see more)ets. A common method in manifold learning consists in hypothesizing that datasets lie on a lower dimensional manifold. This allows to study the geometry of point clouds by extracting meaningful descriptors like curvature. In this work, we will present Adaptive Local PCA (AdaL-PCA), a data-driven method for accurately estimating various notions of intrinsic curvature on data manifolds, in particular principal curvatures for surfaces. The model relies on local PCA to estimate the tangent spaces. The evaluation of AdaL-PCA on sampled surfaces shows state-of-the-art results. Combined with a PHATE embedding, the model applied to single-cell RNA sequencing data allows us to identify key variations in the cellular differentiation.

2025-02-06

ArXiv (preprint)

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Damien MARTINS GOMES

Yanlei Zhang

Mahdi S. Hosseini

First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limi… (see more)ted curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD. However, their practicality in training DNNs are still limited due to increased per-iteration computations and suboptimal accuracy compared to the first order methods. We present AdaFisher--an adaptive second-order optimizer that leverages a block-diagonal approximation to the Fisher information matrix for adaptive gradient preconditioning. AdaFisher aims to bridge the gap between enhanced convergence capabilities and computational efficiency in second-order optimization framework for training DNNs. Despite the slow pace of second-order optimizers, we showcase that AdaFisher can be reliably adopted for image classification, language modelling and stand out for its stability and robustness in hyperparameter tuning. We demonstrate that AdaFisher outperforms the SOTA optimizers in terms of both accuracy and convergence speed. Code available from \href{https://github.com/AtlasAnalyticsLab/AdaFisher}{https://github.com/AtlasAnalyticsLab/AdaFisher}

2025-01-22

ICLR.cc/2025/Conference (poster)

Geometry-Aware Generative Autoencoder for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds

Xingzhi Sun

Danqi Liao

Kincaid MacDonald

Yanlei Zhang

Guillaume Huguet

Ian Adelstein

Tim G. J. Rudner

Rapid growth of high-dimensional datasets in fields such as single-cell RNA sequencing and spatial genomics has led to unprecedented opportu… (see more)nities for scientific discovery, but it also presents unique computational and statistical challenges. Traditional methods struggle with geometry-aware data generation, interpolation along meaningful trajectories, and transporting populations via feasible paths. To address these issues, we introduce Geometry-Aware Generative Autoencoder (GAGA), a novel framework that combines extensible manifold learning with generative modeling. GAGA constructs a neural network embedding space that respects the intrinsic geometries discovered by manifold learning and learns a novel warped Riemannian metric on the data space. This warped metric is derived from both the points on the data manifold and negative samples off the manifold, allowing it to characterize a meaningful geometry across the entire latent space. Using this metric, GAGA can uniformly sample points on the manifold, generate points along geodesics, and interpolate between populations across the learned manifold. GAGA shows competitive performance in simulated and real-world datasets, including a 30% improvement over SOTA in single-cell population-level trajectory inference.

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds

Xingzhi Sun

Danqi Liao

Kincaid MacDonald

Yanlei Zhang

Chen Liu

Guillaume Huguet

Ian Adelstein

Tim G. J. Rudner

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

Non-Uniform Parameter-Wise Model Merging

Albert Manuel Orozco Camacho

Stefan Horoi

Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Tradit… (see more)ional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods. We also extend NP Merge to handle the merging of multiple models, showcasing its scalability and robustness.

2024-12-20

ArXiv (preprint)