Portrait de Guy Wolf

Guy Wolf

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur agrégé, Université de Montréal, Département de mathématiques et statistiques
Concordia University
CHUM - Montreal University Hospital Center
Sujets de recherche
Apprentissage automatique médical
Apprentissage de représentations
Apprentissage multimodal
Apprentissage profond
Apprentissage spectral
Apprentissage sur graphes
Exploration des données
Modélisation moléculaire
Recherche d'information
Réseaux de neurones en graphes
Systèmes dynamiques
Théorie de l'apprentissage automatique

Biographie

Guy Wolf est professeur agrégé au Département de mathématiques et de statistique de l'Université de Montréal. Ses intérêts de recherche se situent au carrefour de l'apprentissage automatique, de la science des données et des mathématiques appliquées. Il s'intéresse particulièrement aux méthodes d'exploration de données qui utilisent l'apprentissage multiple et l'apprentissage géométrique profond, ainsi qu'aux applications pour l'analyse exploratoire des données biomédicales.

Ses recherches portent sur l'analyse exploratoire des données, avec des applications en bio-informatique. Ses approches sont multidisciplinaires et combinent l'apprentissage automatique, le traitement du signal et les outils mathématiques appliqués. En particulier, ses travaux récents utilisent une combinaison de géométries de diffusion et d'apprentissage profond pour trouver des modèles émergents, des dynamiques et des structures dans les mégadonnées à grande dimension (par exemple, dans la génomique et la protéomique de la cellule unique).

Étudiants actuels

Maîtrise recherche - UdeM
Co-superviseur⋅e :
Doctorat - UdeM
Collaborateur·rice alumni
Collaborateur·rice de recherche - Western Washington University (faculty; assistant prof))
Co-superviseur⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Doctorat - Concordia
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - Yale
Postdoctorat - UdeM
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - UdeM
Co-superviseur⋅e :
Postdoctorat - Concordia
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Doctorat - Concordia
Superviseur⋅e principal⋅e :
Maîtrise recherche - UdeM
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Co-superviseur⋅e :
Collaborateur·rice de recherche - Yale
Stagiaire de recherche - Western Washington University
Superviseur⋅e principal⋅e :
Postdoctorat - UdeM
Collaborateur·rice de recherche - McGill (assistant professor)

Publications

Towards a General Recipe for Combinatorial Optimization with Multi-Filter GNNs
Frederik Wenkel
Semih Cantürk
Stefan Horoi
Michael Perlmutter
Reaction-conditioned De Novo Enzyme Design with GENzyme
Chenqing Hua
Jiarui Lu
Yong Liu
Odin Zhang
Rex Ying
Wengong Jin
Shuangjia Zheng
The introduction of models like RFDiffusionAA, AlphaFold3, AlphaProteo, and Chai1 has revolutionized protein structure modeling and interact… (voir plus)ion prediction, primarily from a binding perspective, focusing on creating ideal lock-and-key models. However, these methods can fall short for enzyme-substrate interactions, where perfect binding models are rare, and induced fit states are more common. To address this, we shift to a functional perspective for enzyme design, where the enzyme function is defined by the reaction it catalyzes. Here, we introduce \textsc{GENzyme}, a \textit{de novo} enzyme design model that takes a catalytic reaction as input and generates the catalytic pocket, full enzyme structure, and enzyme-substrate binding complex. \textsc{GENzyme} is an end-to-end, three-staged model that integrates (1) a catalytic pocket generation and sequence co-design module, (2) a pocket inpainting and enzyme inverse folding module, and (3) a binding and screening module to optimize and predict enzyme-substrate complexes. The entire design process is driven by the catalytic reaction being targeted. This reaction-first approach allows for more accurate and biologically relevant enzyme design, potentially surpassing structure-based and binding-focused models in creating enzymes capable of catalyzing specific reactions. We provide \textsc{GENzyme} code at https://github.com/WillHua127/GENzyme.
Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds
Xingzhi Sun
Danqi Liao
Kincaid MacDonald
Yanlei Zhang
Chen Liu
Guillaume Huguet
Ian Adelstein
Tim G. J. Rudner
Smita Krishnaswamy
Effective Protein-Protein Interaction Exploration with PPIretrieval
Chenqing Hua
Connor W. Coley
Shuangjia Zheng
EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics
Chenqing Hua
Yong Liu
Dinghuai Zhang
Odin Zhang
Sitao Luan
Kevin K Yang
Shuangjia Zheng
Learning Stochastic Rainbow Networks
Vivian White
Muawiz Sajjad Chaudhary
Kameron Decker Harris
Random feature models are a popular approach for studying network learning that can capture important behaviors while remaining simpler than… (voir plus) traditional training. Guth et al. [2024] introduced “rainbow” networks which model the distribution of trained weights as correlated random features conditioned on previous layer activity. Sampling new weights from distributions fit to learned networks led to similar performance in entirely untrained networks, and the observed weight covariance were found to be low rank. This provided evidence that random feature models could be extended to some networks away from initialization, but White et al. [2024] failed to replicate their results in the deeper ResNet18 architecture. Here we ask whether the rainbow formulation can succeed in deeper networks by directly training a stochastic ensemble of random features, which we call stochastic rainbow networks. At every gradient descent iteration, new weights are sampled for all intermediate layers and features aligned layer-wise. We find: (1) this approach scales to deeper models, which outperform shallow networks at large widths; (2) ensembling multiple samples from the stochastic model is better than retraining the classifier head; and (3) low-rank parameterization of the learnable weight covariances can approach the accuracy of full-rank networks. This offers more evidence for rainbow and other structured random feature networks as reduced models of deep learning.
Reactzyme: A Benchmark for Enzyme-Reaction Prediction
Chenqing Hua
Bozitao Zhong
Sitao Luan
Liang Hong
Shuangjia Zheng
Are Heterophily-Specific GNNs and Homophily Metrics Really Effective? Evaluation Pitfalls and New Benchmarks
Sitao Luan
Qincheng Lu
Chenqing Hua
Xinyu Wang
Jiaqi Zhu
Xiao-Wen Chang
Over the past decade, Graph Neural Networks (GNNs) have achieved great success on machine learning tasks with relational data. However, rece… (voir plus)nt studies have found that heterophily can cause significant performance degradation of GNNs, especially on node-level tasks. Numerous heterophilic benchmark datasets have been put forward to validate the efficacy of heterophily-specific GNNs and various homophily metrics have been designed to help people recognize these malignant datasets. Nevertheless, there still exist multiple pitfalls that severely hinder the proper evaluation of new models and metrics. In this paper, we point out three most serious pitfalls: 1) a lack of hyperparameter tuning; 2) insufficient model evaluation on the real challenging heterophilic datasets; 3) missing quantitative evaluation benchmark for homophily metrics on synthetic graphs. To overcome these challenges, we first train and fine-tune baseline models on
The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges
Sitao Luan
Chenqing Hua
Qincheng Lu
Liheng Ma
Lirong Wu
Xinyu Wang
Minkai Xu
Xiao-Wen Chang
Rex Ying
Stan Z. Li
Stefanie Jegelka
Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to b… (voir plus)e the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.
Graph Positional and Structural Encoder
Renming Liu
Semih Cantürk
Olivier Lapointe-Gagné
Vincent Létourneau
Ladislav Rampášek
Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node … (voir plus)ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Geometry-Aware Generative Autoencoders for Metric Learning and Generative Modeling on Data Manifolds
Xingzhi Sun
Danqi Liao
Kincaid MacDonald
Yanlei Zhang
Guillaume Huguet
Ian Adelstein
Tim G. J. Rudner
Smita Krishnaswamy
Non-linear dimensionality reduction methods have proven successful at learning low-dimensional representations of high-dimensional point clo… (voir plus)uds on or near data manifolds. However, existing methods are not easily extensible—that is, for large datasets, it is prohibitively expensive to add new points to these embeddings. As a result, it is very difficult to use existing embeddings generatively, to sample new points on and along these manifolds. In this paper, we propose GAGA (geometry-aware generative autoencoders) a framework which merges the power of generative deep learning with non-linear manifold learning by: 1) learning generalizable geometry-aware neural network embeddings based on non-linear dimensionality reduction methods like PHATE and diffusion maps, 2) deriving a non-euclidean pullback metric on the embedded space to generate points faithfully along manifold geodesics, and 3) learning a flow on the manifold that allows us to transport populations. We provide illustration on easily-interpretable synthetic datasets and showcase results on simulated and real single cell datasets. In particular, we show that the geodesic-based generation can be especially important for scientific datasets where the manifold represents a state space and geodesics can represent dynamics of entities over this space.