Portrait of Guy Wolf

Guy Wolf

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, Université de Montréal, Department of Mathematics and Statistics
Concordia University
CHUM - Montreal University Hospital Center
Research Topics
Data Mining
Deep Learning
Dynamical Systems
Graph Neural Networks
Information Retrieval
Learning on Graphs
Machine Learning Theory
Medical Machine Learning
Molecular Modeling
Multimodal Learning
Representation Learning
Spectral Learning

Biography

Guy Wolf is an associate professor in the Department of Mathematics and Statistics at Université de Montréal.

His research interests lie at the intersection of machine learning, data science and applied mathematics. He is particularly interested in data mining methods that use manifold learning and deep geometric learning, as well as applications for the exploratory analysis of biomedical data.

Wolf’s research focuses on exploratory data analysis and its applications in bioinformatics. His approaches are multidisciplinary and bring together machine learning, signal processing and applied math tools. His recent work has used a combination of diffusion geometries and deep learning to find emergent patterns, dynamics, and structure in big high dimensional- data (e.g., in single-cell genomics and proteomics).

Current Students

Independent visiting researcher - University of Lorraine
Master's Research - Université de Montréal
Co-supervisor :
Collaborating Alumni
Principal supervisor :
PhD - Université de Montréal
Collaborating Alumni
Collaborating researcher - Western Washington University (faculty; assistant prof))
Co-supervisor :
PhD - Université de Montréal
Master's Research - McGill University
Principal supervisor :
PhD - Université de Montréal
PhD - Concordia University
Principal supervisor :
Master's Research - Université de Montréal
Principal supervisor :
Collaborating researcher - Yale
Postdoctorate - Université de Montréal
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Concordia University
Principal supervisor :
PhD - Université de Montréal
PhD - Université de Montréal
Co-supervisor :
Master's Research - Université de Montréal
Co-supervisor :
Postdoctorate - Concordia University
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
PhD - Concordia University
Principal supervisor :
Independent visiting researcher
Master's Research - Université de Montréal
Collaborating researcher - Concordia University
Principal supervisor :
Collaborating researcher - Université de Montréal
Co-supervisor :
Collaborating researcher - Yale
PhD - Université de Montréal
Research Intern - Western Washington University
Principal supervisor :
Postdoctorate - Université de Montréal
Collaborating researcher - McGill University (assistant professor)

Publications

AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Mahdi S. Hosseini
First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limi… (see more)ted curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD. However, their practicality in training DNNs are still limited due to increased per-iteration computations and suboptimal accuracy compared to the first order methods. We present AdaFisher--an adaptive second-order optimizer that leverages a block-diagonal approximation to the Fisher information matrix for adaptive gradient preconditioning. AdaFisher aims to bridge the gap between enhanced convergence capabilities and computational efficiency in second-order optimization framework for training DNNs. Despite the slow pace of second-order optimizers, we showcase that AdaFisher can be reliably adopted for image classification, language modelling and stand out for its stability and robustness in hyperparameter tuning. We demonstrate that AdaFisher outperforms the SOTA optimizers in terms of both accuracy and convergence speed. Code available from \href{https://github.com/AtlasAnalyticsLab/AdaFisher}{https://github.com/AtlasAnalyticsLab/AdaFisher}
Supervised latent factor modeling isolates cell-type-specific transcriptomic modules that underlie Alzheimer’s disease progression
Liam Hodgson
Yasser Iturria-Medina
Jo Anne Stratton
Smita Krishnaswamy
David A. Bennett
Sustained IFN signaling is associated with delayed development of SARS-CoV-2-specific immunity
Elsa Brunet-Ratnasingham
Sacha Morin
Haley E. Randolph
Marjorie Labrecque
Justin Bélair
Raphaël Lima-Barbosa
Amélie Pagliuzza
Lorie Marchitto
Michael Hultström
Julia Niessl
Rose Cloutier
Alina M. Sreng Flores
Nathalie Brassard
Mehdi Benlarbi
Jérémie Prévost
Shilei Ding
Sai Priya Anand
Gérémy Sannier
Amanda Marks
Dick Wågsäter … (see 27 more)
Eric Bareke
Hugo Zeberg
Miklos Lipcsey
Robert Frithiof
Anders Larsson
Sirui Zhou
Tomoko Nakanishi
David R. Morrison
Dani Vezina
Catherine Bourassa
Gabrielle Gendron-Lepage
Halima Medjahed
Floriane Point
Jonathan Richard
Catherine Larochelle
Alexandre Prat
Janet L. Cunningham
Nathalie Arbour
Madeleine Durand
J. Brent Richards
Kevin R. Moon
Nicolas Chomont
Andrés Finzi
Martine Tétreault
Luis Barreiro
Daniel E. Kaufmann
Plasma RNAemia, delayed antibody responses and inflammation predict COVID-19 outcomes, but the mechanisms underlying these immunovirological… (see more) patterns are poorly understood. We profile 782 longitudinal plasma samples from 318 hospitalized COVID-19 patients. Integrated analysis using k-means reveal four patient clusters in a discovery cohort: mechanically ventilated critically-ill cases are subdivided into good prognosis and high-fatality clusters (reproduced in a validation cohort), while non-critical survivors are delineated by high and low antibody responses. Only the high-fatality cluster is enriched for transcriptomic signatures associated with COVID-19 severity, and each cluster has distinct RBD-specific antibody elicitation kinetics. Both critical and non-critical clusters with delayed antibody responses exhibit sustained IFN signatures, which negatively correlate with contemporaneous RBD-specific IgG levels and absolute SARS-CoV-2-specific B and CD4+ T cell frequencies. These data suggest that the Interferon paradox previously described in murine LCMV models is operative in COVID-19, with excessive IFN signaling delaying development of adaptive virus-specific immunity.
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Ensembling multiple models enhances predictive performance by utilizing the varied learned features of the different models but incurs signi… (see more)ficant computational and storage costs. Model fusion, which combines parameters from multiple models into one, aims to mitigate these costs but faces practical challenges due to the complex, non-convex nature of neural network loss landscapes, where learned minima are often separated by high loss barriers. Recent works have explored using permutations to align network features, reducing the loss barrier in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our method of aligning models leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder many models setting where more than 2 models are merged, and we find that CCA Merge works significantly better in this setting than past methods.
Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport
Alexander Tong
Nikolay Malkin
Guillaume Huguet
Yanlei Zhang
Jarrid Rector-Brooks
Kilian FATRAS
Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized \textit{conditional flow matching} (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schrödinger bridge inference.
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Ensembling multiple models enhances predictive performance by utilizing the varied learned features of the different models but incurs signi… (see more)ficant computational and storage costs. Model fusion, which combines parameters from multiple models into one, aims to mitigate these costs but faces practical challenges due to the complex, non-convex nature of neural network loss landscapes, where learned minima are often separated by high loss barriers. Recent works have explored using permutations to align network features, reducing the loss barrier in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our method of aligning models leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder many models setting where more than 2 models are merged, and we find that CCA Merge works significantly better in this setting than past methods.
Learning and Aligning Structured Random Feature Networks
Vivian White
Muawiz Sajjad Chaudhary
Kameron Decker Harris
Artificial neural networks (ANNs) are considered "black boxes'' due to the difficulty of interpreting their learned weights. While choosing… (see more) the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to develop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.
Learning and Aligning Structured Random Feature Networks
Vivian White
Muawiz Sajjad Chaudhary
Kameron Decker Harris
Artificial neural networks (ANNs) are considered ``black boxes'' due to the difficulty of interpreting their learned weights. While choosin… (see more)g the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to devlop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.
Generalization of deep learning models for hepatic steatosis grading using B-mode ultrasound images
Pedro Vianna
Yue Qi
Michael Chassé
An Tang
Guy Cloutier
Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation
Pedro Vianna
Muawiz Chaudhary
Paria Mehrbod
An Tang
Guy Cloutier
Michael Eickenberg
Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the … (see more)data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks. It is implemented by recalculating batch normalization statistics on test batches. Prior work has focused on analysis with test data that has the same label distribution as the training data. However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure. This presents a risk in applying test time adaptation methods in deployment. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. Our selection scheme is based on two principles that we empirically motivate: (1) later layers of networks are more sensitive to label shift (2) individual features can be sensitive to specific classes. We apply the proposed technique to three classification tasks, including CIFAR10-C, Imagenet-C, and diagnosis of fatty liver, where we explore both covariate and label distribution shifts. We find that our method allows to bring the benefits of TTA while significantly reducing the risk of failure common in other methods, while being robust to choice in hyperparameters.
Effective Protein-Protein Interaction Exploration with PPIretrieval
Chenqing Hua
Connor Coley
Shuangjia Zheng
Gaining Biological Insights through Supervised Data Visualization
Jake S. Rhodes
Adrien Aumon
Sacha Morin
Marc Girard
Catherine Larochelle
Boaz Lahav
Elsa Brunet-Ratnasingham
Amélie Pagliuzza
Lorie Marchitto
Wei Zhang
Adele Cutler
F. Grand'Maison
Anhong Zhou
Andrés Finzi
Nicolas Chomont
Daniel E. Kaufmann
Stephanie Zandee
Alexandre Prat
Kevin R. Moon
Dimensionality reduction-based data visualization is pivotal in comprehending complex biological data. The most common methods, such as PHAT… (see more)E, t-SNE, and UMAP, are unsupervised and therefore reflect the dominant structure in the data, which may be independent of expert-provided labels. Here we introduce a supervised data visualization method called RF-PHATE, which integrates expert knowledge for further exploration of the data. RF-PHATE leverages random forests to capture intricate featurelabel relationships. Extracting information from the forest, RF-PHATE generates low-dimensional visualizations that highlight relevant data relationships while disregarding extraneous features. This approach scales to large datasets and applies to classification and regression. We illustrate RF-PHATE’s prowess through three case studies. In a multiple sclerosis study using longitudinal clinical and imaging data, RF-PHATE unveils a sub-group of patients with non-benign relapsingremitting Multiple Sclerosis, demonstrating its aptitude for time-series data. In the context of Raman spectral data, RF-PHATE effectively showcases the impact of antioxidants on diesel exhaust-exposed lung cells, highlighting its proficiency in noisy environments. Furthermore, RF-PHATE aligns established geometric structures with COVID-19 patient outcomes, enriching interpretability in a hierarchical manner. RF-PHATE bridges expert insights and visualizations, promising knowledge generation. Its adaptability, scalability, and noise tolerance underscore its potential for widespread adoption.