Guy Wolf

Biography

Guy Wolf is an associate professor in the Department of Mathematics and Statistics at Université de Montréal.

His research interests lie at the intersection of machine learning, data science and applied mathematics. He is particularly interested in data mining methods that use manifold learning and deep geometric learning, as well as applications for the exploratory analysis of biomedical data.

Wolf’s research focuses on exploratory data analysis and its applications in bioinformatics. His approaches are multidisciplinary and bring together machine learning, signal processing and applied math tools. His recent work has used a combination of diffusion geometries and deep learning to find emergent patterns, dynamics, and structure in big high dimensional- data (e.g., in single-cell genomics and proteomics).

Current Students

Ria Arora

Master's Research - Université de Montréal

Co-supervisor :

Liam Paull

Adrien Aumon

PhD - Université de Montréal

Semih Cantürk

PhD - Université de Montréal

semihcanturk00@gmail.com

Collaborating Alumni

Enrique Fita Sanmartin

Collaborating Alumni - Université de Montréal

Kameron Harris

Collaborating researcher - Western Washington University (faculty; assistant prof))

Co-supervisor :

PhD - Université de Montréal

Will Hua

Collaborating Alumni - McGill University

Xiaolong Huang

Master's Research - Concordia University

Principal supervisor :

Guillaume Huguet

PhD - Université de Montréal

Paul Janson

PhD - Concordia University

Principal supervisor :

Charles-Etienne Joseph

Master's Research - Université de Montréal

Principal supervisor :

M. Elyes Kanoun

Research Intern - Université de Montréal

Vincent Létourneau

Postdoctorate - Université de Montréal

Myriam Lizotte

PhD - Université de Montréal

Philippe Martin

PhD - Université de Montréal

Co-supervisor :

Paul François

Paria Mehrbod

Master's Research - Concordia University

Principal supervisor :

Lydia Mezrag

PhD - Université de Montréal

Sacha Morin

PhD - Université de Montréal

Co-supervisor :

Postdoctorate - Concordia University

Principal supervisor :

geraldin.nanfack@mila.quebec

Amine Natik

PhD - Université de Montréal

Principal supervisor :

Guillaume Lajoie

Shuang Ni

PhD - Université de Montréal

Albert Orozco Camacho

PhD - Concordia University

Principal supervisor :

Master's Research - Université de Montréal

Matthew Scicluna

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Research Intern - Western Washington University

Principal supervisor :

Postdoctorate - Université de Montréal

stephanie.zandee@mcgill.ca

Stephanie Zandee

Collaborating researcher - McGill University (assistant professor)

Exploring the COVID-19 Interferon Paradox with Dimensionality Reduction and Clustering

Blog Posts

Graph and representation of working methodology, and graph of data on deaths 60 days after onset of symptoms.

February 19, 2025

Sacha Morin

Elsa Brunet-Ratnasingham

Guy Wolf

Read the article

Publications

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the differ… (see more)ent learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural network loss landscapes are high-dimensional and non-convex and the minima found through learning are typically separated by high loss barriers. Numerous recent works have been focused on finding permutations matching one network features to the features of a second one, lowering the loss barrier on the linear path between them in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our alignment method leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder setting where more than 2 models are merged, and we find that CCA Merge works significantly better than past methods. Our code is publicly available at https://github.com/shoroi/align-n-merge

2024-07-07

ArXiv (preprint)

arxiv.org

Geometry-Aware Generative Autoencoders for Metric Learning and Generative Modeling on Data Manifolds

Xingzhi Sun

Danqi Liao

Kincaid MacDonald

Yanlei Zhang

Guillaume Huguet

Ian Adelstein

Tim G. J. Rudner

Smita Krishnaswamy

Non-linear dimensionality reduction methods have proven successful at learning low-dimensional representations of high-dimensional point clo… (see more)uds on or near data manifolds. However, existing methods are not easily extensible—that is, for large datasets, it is prohibitively expensive to add new points to these embeddings. As a result, it is very difficult to use existing embeddings generatively, to sample new points on and along these manifolds. In this paper, we propose GAGA (geometry-aware generative autoencoders) a framework which merges the power of generative deep learning with non-linear manifold learning by: 1) learning generalizable geometry-aware neural network embeddings based on non-linear dimensionality reduction methods like PHATE and diffusion maps, 2) deriving a non-euclidean pullback metric on the embedded space to generate points faithfully along manifold geodesics, and 3) learning a flow on the manifold that allows us to transport populations. We provide illustration on easily-interpretable synthetic datasets and showcase results on simulated and real single cell datasets. In particular, we show that the geodesic-based generation can be especially important for scientific datasets where the manifold represents a state space and geodesics can represent dynamics of entities over this space.

2024-06-17

ICML.cc/2024/Workshop/GRaM (published)

Simulating federated learning for steatosis detection using ultrasound images

Yijun Qi

Pedro Vianna

Alexandre Cadrin-Chênevert

Katleen Blanchet

Emmanuel Montagnon

Louis-Antoine Mullie

Guy Cloutier

Michael Chassé

An Tang

2024-06-10

Scientific Reports (published)

Noisy Data Visualization using Functional Data Analysis

Haozhe Chen

Andres Felipe Duque Correa

Kevin R. Moon

Data visualization via dimensionality reduction is an important tool in exploratory data analysis. However, when the data are noisy, many ex… (see more)isting methods fail to capture the underlying structure of the data. The method called Empirical Intrinsic Geometry (EIG) was previously proposed for performing dimensionality reduction on high dimensional dynamical processes while theoretically eliminating all noise. However, implementing EIG in practice requires the construction of high-dimensional histograms, which suffer from the curse of dimensionality. Here we propose a new data visualization method called Functional Information Geometry (FIG) for dynamical processes that adapts the EIG framework while using approaches from functional data analysis to mitigate the curse of dimensionality. We experimentally demonstrate that the resulting method outperforms a variant of EIG designed for visualization in terms of capturing the true structure, hyperparameter robustness, and computational speed. We then use our method to visualize EEG brain measurements of sleep activity.

2024-06-05

ArXiv (preprint)

arxiv.org

Towards a General GNN Framework for Combinatorial Optimization

Frederik Wenkel

Semih Cantürk

Michael Perlmutter

2024-05-31

ArXiv (preprint)

arxiv.org

Supervised latent factor modeling isolates cell-type-specific transcriptomic modules that underlie Alzheimer’s disease progression

Liam Hodgson

Yue Li

Yasser Iturria-Medina

Jo Anne Stratton

Smita Krishnaswamy

David A. Bennett

Danilo Bzdok

2024-05-17

Communications Biology (published)

Sustained IFN signaling is associated with delayed development of SARS-CoV-2-specific immunity

Elsa Brunet-Ratnasingham

Sacha Morin

Haley E. Randolph

Marjorie Labrecque

Justin Bélair

Raphaël Lima-Barbosa

Amélie Pagliuzza

Lorie Marchitto

Michael Hultström

Julia Niessl

Rose Cloutier

Alina M. Sreng Flores

Nathalie Brassard

Mehdi Benlarbi

Jérémie Prévost

Shilei Ding

Sai Priya Anand

Gérémy Sannier

Amanda Marks

Dick Wågsäter … (see 27 more)

Eric Bareke

Hugo Zeberg

Miklos Lipcsey

Robert Frithiof

Anders Larsson

Sirui Zhou

Tomoko Nakanishi

David R. Morrison

Dani Vezina

Catherine Bourassa

Gabrielle Gendron-Lepage

Halima Medjahed

Floriane Point

Jonathan Richard

Catherine Larochelle

Alexandre Prat

Janet L. Cunningham

Nathalie Arbour

Madeleine Durand

J. Brent Richards

Kevin R. Moon

Nicolas Chomont

Andrés Finzi

Martine Tétreault

Luis Barreiro

Daniel E. Kaufmann

Plasma RNAemia, delayed antibody responses and inflammation predict COVID-19 outcomes, but the mechanisms underlying these immunovirological… (see more) patterns are poorly understood. We profile 782 longitudinal plasma samples from 318 hospitalized COVID-19 patients. Integrated analysis using k-means reveal four patient clusters in a discovery cohort: mechanically ventilated critically-ill cases are subdivided into good prognosis and high-fatality clusters (reproduced in a validation cohort), while non-critical survivors are delineated by high and low antibody responses. Only the high-fatality cluster is enriched for transcriptomic signatures associated with COVID-19 severity, and each cluster has distinct RBD-specific antibody elicitation kinetics. Both critical and non-critical clusters with delayed antibody responses exhibit sustained IFN signatures, which negatively correlate with contemporaneous RBD-specific IgG levels and absolute SARS-CoV-2-specific B and CD4+ T cell frequencies. These data suggest that the Interferon paradox previously described in murine LCMV models is operative in COVID-19, with excessive IFN signaling delaying development of adaptive virus-specific immunity.

2024-05-16

Nature Communications (published)

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Ensembling multiple models enhances predictive performance by utilizing the varied learned features of the different models but incurs signi… (see more)ficant computational and storage costs. Model fusion, which combines parameters from multiple models into one, aims to mitigate these costs but faces practical challenges due to the complex, non-convex nature of neural network loss landscapes, where learned minima are often separated by high loss barriers. Recent works have explored using permutations to align network features, reducing the loss barrier in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our method of aligning models leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder many models setting where more than 2 models are merged, and we find that CCA Merge works significantly better in this setting than past methods.

2024-05-01

ICML.cc/2024/Conference (poster)

Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

Yoshua Bengio

Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized \textit{conditional flow matching} (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schrödinger bridge inference.

2024-03-11

TMLR (accepted)

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

2024-03-02

ICLR.cc/2024/Workshop/Re-Align (poster)

Learning and Aligning Structured Random Feature Networks

Vivian White

Muawiz Sajjad Chaudhary

Guillaume Lajoie

Kameron Decker Harris

Artificial neural networks (ANNs) are considered ``black boxes'' due to the difficulty of interpreting their learned weights. While choosin… (see more)g the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to devlop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.

2024-03-02

ICLR.cc/2024/Workshop/Re-Align (poster)

Learning and Aligning Structured Random Feature Networks

Vivian White

Muawiz Sajjad Chaudhary

Guillaume Lajoie

Kameron Decker Harris

Artificial neural networks (ANNs) are considered "black boxes'' due to the difficulty of interpreting their learned weights. While choosing… (see more) the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to develop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.

2024-03-02

ICLR.cc/2024/Workshop/Re-Align (poster)