Guy Wolf

Biography

Guy Wolf is an associate professor in the Department of Mathematics and Statistics at Université de Montréal.

His research interests lie at the intersection of machine learning, data science and applied mathematics. He is particularly interested in data mining methods that use manifold learning and deep geometric learning, as well as applications for the exploratory analysis of biomedical data.

Wolf’s research focuses on exploratory data analysis and its applications in bioinformatics. His approaches are multidisciplinary and bring together machine learning, signal processing and applied math tools. His recent work has used a combination of diffusion geometries and deep learning to find emergent patterns, dynamics, and structure in big high dimensional- data (e.g., in single-cell genomics and proteomics).

Current Students

Ria Arora

Master's Research - Université de Montréal

Co-supervisor :

Liam Paull

Adrien Aumon

PhD - Université de Montréal

Semih Cantürk

PhD - Université de Montréal

semihcanturk00@gmail.com

Collaborating Alumni

Enrique Fita Sanmartin

Collaborating Alumni - Université de Montréal

Kameron Harris

Collaborating researcher - Western Washington University (faculty; assistant prof))

Co-supervisor :

PhD - Université de Montréal

Will Hua

Collaborating Alumni - McGill University

Xiaolong Huang

Master's Research - Concordia University

Principal supervisor :

Guillaume Huguet

PhD - Université de Montréal

Paul Janson

PhD - Concordia University

Principal supervisor :

Charles-Etienne Joseph

Master's Research - Université de Montréal

Principal supervisor :

M. Elyes Kanoun

Research Intern - Université de Montréal

Vincent Létourneau

Postdoctorate - Université de Montréal

Myriam Lizotte

PhD - Université de Montréal

Philippe Martin

PhD - Université de Montréal

Co-supervisor :

Paul François

Paria Mehrbod

Master's Research - Concordia University

Principal supervisor :

Lydia Mezrag

PhD - Université de Montréal

Sacha Morin

PhD - Université de Montréal

Co-supervisor :

Postdoctorate - Concordia University

Principal supervisor :

geraldin.nanfack@mila.quebec

Amine Natik

PhD - Université de Montréal

Principal supervisor :

Guillaume Lajoie

Shuang Ni

PhD - Université de Montréal

Albert Orozco Camacho

PhD - Concordia University

Principal supervisor :

Master's Research - Université de Montréal

Matthew Scicluna

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Research Intern - Western Washington University

Principal supervisor :

Postdoctorate - Université de Montréal

stephanie.zandee@mcgill.ca

Stephanie Zandee

Collaborating researcher - McGill University (assistant professor)

Exploring the COVID-19 Interferon Paradox with Dimensionality Reduction and Clustering

Blog Posts

Graph and representation of working methodology, and graph of data on deaths 60 days after onset of symptoms.

February 19, 2025

Sacha Morin

Elsa Brunet-Ratnasingham

Guy Wolf

Read the article

Publications

Reaction-conditioned De Novo Enzyme Design with GENzyme

Chenqing Hua

Jiarui Lu

Yong Liu

Odin Zhang

Rex Ying

Wengong Jin

Shuangjia Zheng

The introduction of models like RFDiffusionAA, AlphaFold3, AlphaProteo, and Chai1 has revolutionized protein structure modeling and interact… (see more)ion prediction, primarily from a binding perspective, focusing on creating ideal lock-and-key models. However, these methods can fall short for enzyme-substrate interactions, where perfect binding models are rare, and induced fit states are more common. To address this, we shift to a functional perspective for enzyme design, where the enzyme function is defined by the reaction it catalyzes. Here, we introduce \textsc{GENzyme}, a \textit{de novo} enzyme design model that takes a catalytic reaction as input and generates the catalytic pocket, full enzyme structure, and enzyme-substrate binding complex. \textsc{GENzyme} is an end-to-end, three-staged model that integrates (1) a catalytic pocket generation and sequence co-design module, (2) a pocket inpainting and enzyme inverse folding module, and (3) a binding and screening module to optimize and predict enzyme-substrate complexes. The entire design process is driven by the catalytic reaction being targeted. This reaction-first approach allows for more accurate and biologically relevant enzyme design, potentially surpassing structure-based and binding-focused models in creating enzymes capable of catalyzing specific reactions. We provide \textsc{GENzyme} code at https://github.com/WillHua127/GENzyme.

2024-11-10

ArXiv (preprint)

Reaction-conditioned De Novo Enzyme Design with GENzyme

Chenqing Hua

Jiarui Lu

Yong Liu

Odin Zhang

Rex Ying

Wengong Jin

Shuangjia Zheng

2024-11-10

ArXiv (preprint)

Effective Protein-Protein Interaction Exploration with PPIretrieval

Chenqing Hua

Connor W. Coley

Shuangjia Zheng

2024-10-13

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Chenqing Hua

Yong Liu

Dinghuai Zhang

Odin Zhang

Sitao Luan

Kevin K Yang

Shuangjia Zheng

2024-10-13

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

Neuro-GSTH: A Geometric Scattering and Persistent Homology Framework for Uncovering Spatiotemporal Signatures in Neural Activity

Dhananjay Bhaskar

Jessica Moore

Yanlei Zhang

Feng Gao

Bastian Rieck

Helen Pushkarskaya

Firas Khasawneh

Elizabeth Munch

Valentina Greco

Christopher Pittenger

Smita Krishnaswamy

2024-10-13

bioRxiv (preprint)

Learning Stochastic Rainbow Networks

Vivian White

Muawiz Sajjad Chaudhary

Guillaume Lajoie

Kameron Decker Harris

Random feature models are a popular approach for studying network learning that can capture important behaviors while remaining simpler than… (see more) traditional training. Guth et al. [2024] introduced “rainbow” networks which model the distribution of trained weights as correlated random features conditioned on previous layer activity. Sampling new weights from distributions fit to learned networks led to similar performance in entirely untrained networks, and the observed weight covariance were found to be low rank. This provided evidence that random feature models could be extended to some networks away from initialization, but White et al. [2024] failed to replicate their results in the deeper ResNet18 architecture. Here we ask whether the rainbow formulation can succeed in deeper networks by directly training a stochastic ensemble of random features, which we call stochastic rainbow networks. At every gradient descent iteration, new weights are sampled for all intermediate layers and features aligned layer-wise. We find: (1) this approach scales to deeper models, which outperform shallow networks at large widths; (2) ensembling multiple samples from the stochastic model is better than retraining the classifier head; and (3) low-rank parameterization of the learnable weight covariances can approach the accuracy of full-rank networks. This offers more evidence for rainbow and other structured random feature networks as reduced models of deep learning.

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (poster)

Reactzyme: A Benchmark for Enzyme-Reaction Prediction

Chenqing Hua

Bozitao Zhong

Sitao Luan

Liang Hong

Shuangjia Zheng

2024-09-26

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

Are Heterophily-Specific GNNs and Homophily Metrics Really Effective? Evaluation Pitfalls and New Benchmarks

Sitao Luan

Qincheng Lu

Chenqing Hua

Xinyu Wang

Jiaqi Zhu

Xiao-Wen Chang

Over the past decade, Graph Neural Networks (GNNs) have achieved great success on machine learning tasks with relational data. However, rece… (see more)nt studies have found that heterophily can cause significant performance degradation of GNNs, especially on node-level tasks. Numerous heterophilic benchmark datasets have been put forward to validate the efficacy of heterophily-specific GNNs and various homophily metrics have been designed to help people recognize these malignant datasets. Nevertheless, there still exist multiple pitfalls that severely hinder the proper evaluation of new models and metrics. In this paper, we point out three most serious pitfalls: 1) a lack of hyperparameter tuning; 2) insufficient model evaluation on the real challenging heterophilic datasets; 3) missing quantitative evaluation benchmark for homophily metrics on synthetic graphs. To overcome these challenges, we first train and fine-tune baseline models on

2024-09-09

ArXiv (preprint)

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

Sitao Luan

Chenqing Hua

Qincheng Lu

Liheng Ma

Lirong Wu

Xinyu Wang

Minkai Xu

Xiao-Wen Chang

Rex Ying

Stan Z. Li

Stefanie Jegelka

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to b… (see more)e the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

2024-07-12

ArXiv (preprint)

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

Sitao Luan

Chenqing Hua

Qincheng Lu

Liheng Ma

Lirong Wu

Xinyu Wang

Minkai Xu

Xiao-Wen Chang

Rex Ying

Stan Z. Li

Stefanie Jegelka

2024-07-12

ArXiv (preprint)

Graph Positional and Structural Encoder

Renming Liu

Semih Cantürk

Olivier Lapointe-Gagné

Vincent Létourneau

Dominique Beaini

Ladislav Rampášek

Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node … (see more)ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)