Guy Wolf

Biographie

Guy Wolf est professeur agrégé au Département de mathématiques et de statistique de l'Université de Montréal. Ses intérêts de recherche se situent au carrefour de l'apprentissage automatique, de la science des données et des mathématiques appliquées. Il s'intéresse particulièrement aux méthodes d'exploration de données qui utilisent l'apprentissage multiple et l'apprentissage géométrique profond, ainsi qu'aux applications pour l'analyse exploratoire des données biomédicales.

Ses recherches portent sur l'analyse exploratoire des données, avec des applications en bio-informatique. Ses approches sont multidisciplinaires et combinent l'apprentissage automatique, le traitement du signal et les outils mathématiques appliqués. En particulier, ses travaux récents utilisent une combinaison de géométries de diffusion et d'apprentissage profond pour trouver des modèles émergents, des dynamiques et des structures dans les mégadonnées à grande dimension (par exemple, dans la génomique et la protéomique de la cellule unique).

Étudiants actuels

Ria Arora

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

semihcanturk00@gmail.com

Collaborateur·rice alumni

Enrique Fita Sanmartin

Collaborateur·rice alumni - UdeM

Kameron Harris

Collaborateur·rice de recherche - Western Washington University (faculty; assistant prof))

Co-superviseur⋅e :

Doctorat - UdeM

Will Hua

Collaborateur·rice alumni - McGill

Xiaolong Huang

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Paul Janson

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Charles-Etienne Joseph

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

M. Elyes Kanoun

Stagiaire de recherche - UdeM

Vincent Létourneau

Postdoctorat - UdeM

Myriam Lizotte

Doctorat - UdeM

Philippe Martin

Doctorat - UdeM

Co-superviseur⋅e :

Paul François

Paria Mehrbod

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Lydia Mezrag

Doctorat - UdeM

Sacha Morin

Doctorat - UdeM

Co-superviseur⋅e :

Postdoctorat - Concordia

Superviseur⋅e principal⋅e :

geraldin.nanfack@mila.quebec

Amine Natik

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Guillaume Lajoie

Shuang Ni

Doctorat - UdeM

stephanie.zandee@mcgill.ca

Albert Orozco Camacho

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Matthew Scicluna

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Stagiaire de recherche - Western Washington University

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Collaborateur·rice de recherche - McGill (assistant professor)

Analyser le paradoxe des interférons inhérent à la COVID-19 au moyen de la réduction de la dimensionnalité et du regroupement

Billets de blogue

Graph and representation of working methodology, and graph of data on deaths 60 days after onset of symptoms.

19 février 2025

par

Sacha Morin

Elsa Brunet-Ratnasingham

Guy Wolf

Lire l'article

Publications

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the differ… (voir plus)ent learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural network loss landscapes are high-dimensional and non-convex and the minima found through learning are typically separated by high loss barriers. Numerous recent works have been focused on finding permutations matching one network features to the features of a second one, lowering the loss barrier on the linear path between them in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our alignment method leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder setting where more than 2 models are merged, and we find that CCA Merge works significantly better than past methods. Our code is publicly available at https://github.com/shoroi/align-n-merge

2024-07-07

ArXiv (prépublication)

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

2024-07-07

ArXiv (prépublication)

Geometry-Aware Generative Autoencoders for Metric Learning and Generative Modeling on Data Manifolds

Xingzhi Sun

Danqi Liao

Kincaid MacDonald

Yanlei Zhang

Guillaume Huguet

Ian Adelstein

Tim G. J. Rudner

Smita Krishnaswamy

Non-linear dimensionality reduction methods have proven successful at learning low-dimensional representations of high-dimensional point clo… (voir plus)uds on or near data manifolds. However, existing methods are not easily extensible—that is, for large datasets, it is prohibitively expensive to add new points to these embeddings. As a result, it is very difficult to use existing embeddings generatively, to sample new points on and along these manifolds. In this paper, we propose GAGA (geometry-aware generative autoencoders) a framework which merges the power of generative deep learning with non-linear manifold learning by: 1) learning generalizable geometry-aware neural network embeddings based on non-linear dimensionality reduction methods like PHATE and diffusion maps, 2) deriving a non-euclidean pullback metric on the embedded space to generate points faithfully along manifold geodesics, and 3) learning a flow on the manifold that allows us to transport populations. We provide illustration on easily-interpretable synthetic datasets and showcase results on simulated and real single cell datasets. In particular, we show that the geodesic-based generation can be especially important for scientific datasets where the manifold represents a state space and geodesics can represent dynamics of entities over this space.

2024-06-17

ICML.cc/2024/Workshop/GRaM (publié)

Simulating federated learning for steatosis detection using ultrasound images

Yijun Qi

Pedro Vianna

Alexandre Cadrin-Chênevert

Katleen Blanchet

Emmanuel Montagnon

Louis-Antoine Mullie

Guy Cloutier

Michael Chassé

An Tang

2024-06-10

Scientific Reports (publié)

Noisy Data Visualization using Functional Data Analysis

Haozhe Chen

Andres Felipe Duque Correa

Kevin R. Moon

Data visualization via dimensionality reduction is an important tool in exploratory data analysis. However, when the data are noisy, many ex… (voir plus)isting methods fail to capture the underlying structure of the data. The method called Empirical Intrinsic Geometry (EIG) was previously proposed for performing dimensionality reduction on high dimensional dynamical processes while theoretically eliminating all noise. However, implementing EIG in practice requires the construction of high-dimensional histograms, which suffer from the curse of dimensionality. Here we propose a new data visualization method called Functional Information Geometry (FIG) for dynamical processes that adapts the EIG framework while using approaches from functional data analysis to mitigate the curse of dimensionality. We experimentally demonstrate that the resulting method outperforms a variant of EIG designed for visualization in terms of capturing the true structure, hyperparameter robustness, and computational speed. We then use our method to visualize EEG brain measurements of sleep activity.

2024-06-05

ArXiv (prépublication)

Towards a General GNN Framework for Combinatorial Optimization

Frederik Wenkel

Semih Cantürk

Michael Perlmutter

2024-05-31

ArXiv (prépublication)

Supervised latent factor modeling isolates cell-type-specific transcriptomic modules that underlie Alzheimer’s disease progression

Liam Hodgson

Yue Li

Yasser Iturria-Medina

Jo Anne Stratton

Smita Krishnaswamy

David A. Bennett

Danilo Bzdok

2024-05-17

Communications Biology (publié)

Sustained IFN signaling is associated with delayed development of SARS-CoV-2-specific immunity

Elsa Brunet-Ratnasingham

Sacha Morin

Haley E. Randolph

Marjorie Labrecque

Justin Bélair

Raphaël Lima-Barbosa

Amélie Pagliuzza

Lorie Marchitto

Michael Hultström

Julia Niessl

Rose Cloutier

Alina M. Sreng Flores

Nathalie Brassard

Mehdi Benlarbi

Jérémie Prévost

Shilei Ding

Sai Priya Anand

Gérémy Sannier

Amanda Marks

Dick Wågsäter … (voir 27 de plus)

Eric Bareke

Hugo Zeberg

Miklos Lipcsey

Robert Frithiof

Anders Larsson

Sirui Zhou

Tomoko Nakanishi

David R. Morrison

Dani Vezina

Catherine Bourassa

Gabrielle Gendron-Lepage

Halima Medjahed

Floriane Point

Jonathan Richard

Catherine Larochelle

Alexandre Prat

Janet L. Cunningham

Nathalie Arbour

Madeleine Durand

J. Brent Richards

Kevin R. Moon

Nicolas Chomont

Andrés Finzi

Martine Tétreault

Luis Barreiro

Daniel E. Kaufmann

Plasma RNAemia, delayed antibody responses and inflammation predict COVID-19 outcomes, but the mechanisms underlying these immunovirological… (voir plus) patterns are poorly understood. We profile 782 longitudinal plasma samples from 318 hospitalized COVID-19 patients. Integrated analysis using k-means reveal four patient clusters in a discovery cohort: mechanically ventilated critically-ill cases are subdivided into good prognosis and high-fatality clusters (reproduced in a validation cohort), while non-critical survivors are delineated by high and low antibody responses. Only the high-fatality cluster is enriched for transcriptomic signatures associated with COVID-19 severity, and each cluster has distinct RBD-specific antibody elicitation kinetics. Both critical and non-critical clusters with delayed antibody responses exhibit sustained IFN signatures, which negatively correlate with contemporaneous RBD-specific IgG levels and absolute SARS-CoV-2-specific B and CD4+ T cell frequencies. These data suggest that the Interferon paradox previously described in murine LCMV models is operative in COVID-19, with excessive IFN signaling delaying development of adaptive virus-specific immunity.

2024-05-16

Nature Communications (publié)

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

Ensembling multiple models enhances predictive performance by utilizing the varied learned features of the different models but incurs signi… (voir plus)ficant computational and storage costs. Model fusion, which combines parameters from multiple models into one, aims to mitigate these costs but faces practical challenges due to the complex, non-convex nature of neural network loss landscapes, where learned minima are often separated by high loss barriers. Recent works have explored using permutations to align network features, reducing the loss barrier in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our method of aligning models leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder many models setting where more than 2 models are merged, and we find that CCA Merge works significantly better in this setting than past methods.

2024-05-01

ICML.cc/2024/Conference (poster)

Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

Yoshua Bengio

Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (voir plus)mulation-based maximum likelihood training. We introduce the generalized \textit{conditional flow matching} (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schrödinger bridge inference.

2024-03-11

TMLR (accepté)

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

2024-03-02

ICLR.cc/2024/Workshop/Re-Align (poster)

Learning and Aligning Structured Random Feature Networks

Vivian White

Muawiz Sajjad Chaudhary

Guillaume Lajoie

Kameron Decker Harris

Artificial neural networks (ANNs) are considered "black boxes'' due to the difficulty of interpreting their learned weights. While choosing… (voir plus) the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to develop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.

2024-03-02

ICLR.cc/2024/Workshop/Re-Align (poster)