Guy Wolf

Biographie

Guy Wolf est professeur agrégé au Département de mathématiques et de statistique de l'Université de Montréal. Ses intérêts de recherche se situent au carrefour de l'apprentissage automatique, de la science des données et des mathématiques appliquées. Il s'intéresse particulièrement aux méthodes d'exploration de données qui utilisent l'apprentissage multiple et l'apprentissage géométrique profond, ainsi qu'aux applications pour l'analyse exploratoire des données biomédicales.

Ses recherches portent sur l'analyse exploratoire des données, avec des applications en bio-informatique. Ses approches sont multidisciplinaires et combinent l'apprentissage automatique, le traitement du signal et les outils mathématiques appliqués. En particulier, ses travaux récents utilisent une combinaison de géométries de diffusion et d'apprentissage profond pour trouver des modèles émergents, des dynamiques et des structures dans les mégadonnées à grande dimension (par exemple, dans la génomique et la protéomique de la cellule unique).

Étudiants actuels

Sabeur Aridhi

Visiteur de recherche indépendant - University of Lorraine

Ria Arora

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Liam Paull

Nader Asadi

Collaborateur·rice alumni

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

semihcanturk00@gmail.com

Collaborateur·rice alumni

Kameron Harris

Collaborateur·rice de recherche - Western Washington University (faculty; assistant prof))

Co-superviseur⋅e :

Doctorat - UdeM

Will Hua

Maîtrise recherche - McGill

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Paul Janson

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

Charles-Etienne Joseph

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

Jake Kovalic

Collaborateur·rice de recherche - Yale

Smita Krishnaswamy Krishnaswamy

Visiteur de recherche indépendant - Yale University

Postdoctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Paul François

Lydia Mezrag

Doctorat - UdeM

Sacha Morin

Doctorat - UdeM

Co-superviseur⋅e :

Liam Paull

Hasti Nafisi

Maîtrise recherche - UdeM

Co-superviseur⋅e :

Yashar Hezaveh

Geraldin Nanfack

Postdoctorat - Concordia

Superviseur⋅e principal⋅e :

Amine Natik

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Guillaume Lajoie

Shuang Ni

Doctorat - UdeM

Albert Orozco Camacho

Doctorat - Concordia

Superviseur⋅e principal⋅e :

Visiteur de recherche indépendant

Paria Paria

Maîtrise recherche - Concordia

Superviseur⋅e principal⋅e :

stephanie.zandee@mcgill.ca

Thomas Sabourin

Maîtrise recherche - UdeM

Donald Shenaj

Collaborateur·rice de recherche - Concordia

Superviseur⋅e principal⋅e :

Collaborateur·rice de recherche - UdeM

Co-superviseur⋅e :

Collaborateur·rice de recherche - Yale

Doctorat - UdeM

Stagiaire de recherche - Western Washington University

Superviseur⋅e principal⋅e :

Postdoctorat - UdeM

Collaborateur·rice de recherche - McGill (assistant professor)

Publications

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Chenqing Hua

Yong Liu

Dinghuai Zhang

Odin Zhang

Sitao Luan

Kevin K. Yang

Doina Precup

Shuangjia Zheng

2024-10-01

ArXiv (prépublication)

Are Heterophily-Specific GNNs and Homophily Metrics Really Effective? Evaluation Pitfalls and New Benchmarks

Sitao Luan

Qincheng Lu

Chenqing Hua

Xinyu Wang

Jiaqi Zhu

Xiao-Wen Chang

Jian Tang

Over the past decade, Graph Neural Networks (GNNs) have achieved great success on machine learning tasks with relational data. However, rece… (voir plus)nt studies have found that heterophily can cause significant performance degradation of GNNs, especially on node-level tasks. Numerous heterophilic benchmark datasets have been put forward to validate the efficacy of heterophily-specific GNNs and various homophily metrics have been designed to help people recognize these malignant datasets. Nevertheless, there still exist multiple pitfalls that severely hinder the proper evaluation of new models and metrics. In this paper, we point out three most serious pitfalls: 1) a lack of hyperparameter tuning; 2) insufficient model evaluation on the real challenging heterophilic datasets; 3) missing quantitative evaluation benchmark for homophily metrics on synthetic graphs. To overcome these challenges, we first train and fine-tune baseline models on

2024-09-09

ArXiv (prépublication)

Reactzyme: A Benchmark for Enzyme-Reaction Prediction

Chenqing Hua

Bozitao Zhong

Sitao Luan

Liang Hong

Doina Precup

Shuangjia Zheng

2024-08-24

ArXiv (prépublication)

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

Sitao Luan

Chenqing Hua

Qincheng Lu

Liheng Ma

Lirong Wu

Xinyu Wang

Minkai Xu

Xiao-Wen Chang

Doina Precup

Rex Ying

Stan Z. Li

Jian Tang

Stefanie Jegelka

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to b… (voir plus)e the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

2024-07-12

ArXiv (prépublication)

Graph Positional and Structural Encoder

Renming Liu

Semih Cantürk

Olivier Lapointe-Gagné

Vincent Létourneau

Dominique Beaini

Ladislav Rampášek

Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node … (voir plus)ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

openreview.net

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

openreview.net

Geometry-Aware Generative Autoencoders for Metric Learning and Generative Modeling on Data Manifolds

Xingzhi Sun

Danqi Liao

Kincaid MacDonald

Yanlei Zhang

Guillaume Huguet

Ian Adelstein

Tim G. J. Rudner

Smita Krishnaswamy

Non-linear dimensionality reduction methods have proven successful at learning low-dimensional representations of high-dimensional point clo… (voir plus)uds on or near data manifolds. However, existing methods are not easily extensible—that is, for large datasets, it is prohibitively expensive to add new points to these embeddings. As a result, it is very difficult to use existing embeddings generatively, to sample new points on and along these manifolds. In this paper, we propose GAGA (geometry-aware generative autoencoders) a framework which merges the power of generative deep learning with non-linear manifold learning by: 1) learning generalizable geometry-aware neural network embeddings based on non-linear dimensionality reduction methods like PHATE and diffusion maps, 2) deriving a non-euclidean pullback metric on the embedded space to generate points faithfully along manifold geodesics, and 3) learning a flow on the manifold that allows us to transport populations. We provide illustration on easily-interpretable synthetic datasets and showcase results on simulated and real single cell datasets. In particular, we show that the geodesic-based generation can be especially important for scientific datasets where the manifold represents a state space and geodesics can represent dynamics of entities over this space.

2024-06-17

ICML.cc/2024/Workshop/GRaM (publié)

openreview.net

Simulating federated learning for steatosis detection using ultrasound images

Yue Qi

Pedro Vianna

Alexandre Cadrin-Chênevert

Katleen Blanchet

Emmanuel Montagnon

Louis-Antoine Mullie

Guy Cloutier

Michael Chassé

An Tang

2024-06-10

Scientific Reports (publié)

Enhancing Supervised Visualization through Autoencoder and Random Forest Proximities for Out-of-Sample Extension

Shuang Ni

Adrien Aumon

Kevin R. Moon

Jake S. Rhodes

The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Com… (voir plus)mon dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction method, RF-PHATE, combining information learned from the random forest model with the function-learning capabilities of autoencoders. Through quantitative assessment of various autoencoder architectures, we identify that networks that reconstruct random forest proximities are more robust for the embedding extension problem. Furthermore, by leveraging proximity-based prototypes, we achieve a 40% reduction in training time without compromising extension quality. Our method does not require label information for out-of-sample points, thus serving as a semi-supervised method, and can achieve consistent quality using only 10% of the training data.

2024-06-06

ArXiv (prépublication)

Noisy Data Visualization using Functional Data Analysis

Haozhe Chen

Andres Felipe Duque Correa

Kevin R. Moon

Data visualization via dimensionality reduction is an important tool in exploratory data analysis. However, when the data are noisy, many ex… (voir plus)isting methods fail to capture the underlying structure of the data. The method called Empirical Intrinsic Geometry (EIG) was previously proposed for performing dimensionality reduction on high dimensional dynamical processes while theoretically eliminating all noise. However, implementing EIG in practice requires the construction of high-dimensional histograms, which suffer from the curse of dimensionality. Here we propose a new data visualization method called Functional Information Geometry (FIG) for dynamical processes that adapts the EIG framework while using approaches from functional data analysis to mitigate the curse of dimensionality. We experimentally demonstrate that the resulting method outperforms a variant of EIG designed for visualization in terms of capturing the true structure, hyperparameter robustness, and computational speed. We then use our method to visualize EEG brain measurements of sleep activity.

2024-06-05

ArXiv (prépublication)

Towards a General GNN Framework for Combinatorial Optimization

Frederik Wenkel

Semih Cantürk

Michael Perlmutter

2024-05-31

ArXiv (prépublication)

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Damien Martins Gomes

Yanlei Zhang

Mahdi S. Hosseini

First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limi… (voir plus)ted curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD. However, their practicality in training DNNs are still limited due to increased per-iteration computations and suboptimal accuracy compared to the first order methods. We present AdaFisher--an adaptive second-order optimizer that leverages a block-diagonal approximation to the Fisher information matrix for adaptive gradient preconditioning. AdaFisher aims to bridge the gap between enhanced convergence capabilities and computational efficiency in second-order optimization framework for training DNNs. Despite the slow pace of second-order optimizers, we showcase that AdaFisher can be reliably adopted for image classification, language modelling and stand out for its stability and robustness in hyperparameter tuning. We demonstrate that AdaFisher outperforms the SOTA optimizers in terms of both accuracy and convergence speed. Code available from \href{https://github.com/AtlasAnalyticsLab/AdaFisher}{https://github.com/AtlasAnalyticsLab/AdaFisher}

2024-05-26

ArXiv (prépublication)