Chenqing Hua

Graph Neural Networks Meet Probabilistic Graphical Models: A Survey

Qian Zhang

2025-04-05

IEEE International Conference on Acoustics, Speech, and Signal Processing (publié)

doi.org

Reaction-conditioned De Novo Enzyme Design with GENzyme

Chenqing Hua

Jiarui Lu

Yang Liu

Odin Zhang

Jian Tang

Rex Ying

Wengong Jin

Guy Wolf

Doina Precup

Shuangjia Zheng

The introduction of models like RFDiffusionAA, AlphaFold3, AlphaProteo, and Chai1 has revolutionized protein structure modeling and interact… (voir plus)ion prediction, primarily from a binding perspective, focusing on creating ideal lock-and-key models. However, these methods can fall short for enzyme-substrate interactions, where perfect binding models are rare, and induced fit states are more common. To address this, we shift to a functional perspective for enzyme design, where the enzyme function is defined by the reaction it catalyzes. Here, we introduce \textsc{GENzyme}, a \textit{de novo} enzyme design model that takes a catalytic reaction as input and generates the catalytic pocket, full enzyme structure, and enzyme-substrate binding complex. \textsc{GENzyme} is an end-to-end, three-staged model that integrates (1) a catalytic pocket generation and sequence co-design module, (2) a pocket inpainting and enzyme inverse folding module, and (3) a binding and screening module to optimize and predict enzyme-substrate complexes. The entire design process is driven by the catalytic reaction being targeted. This reaction-first approach allows for more accurate and biologically relevant enzyme design, potentially surpassing structure-based and binding-focused models in creating enzymes capable of catalyzing specific reactions. We provide \textsc{GENzyme} code at https://github.com/WillHua127/GENzyme.

2024-11-09

ArXiv (prépublication)

doi.org

arxiv.org

Effective Protein-Protein Interaction Exploration with PPIretrieval

Chenqing Hua

Connor W. Coley

Guy Wolf

Doina Precup

Shuangjia Zheng

2024-10-12

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

doi.org

openreview.net

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Yang Liu

Odin Zhang

Kevin K Yang

Shuangjia Zheng

2024-10-12

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

doi.org

openreview.net

ReactZyme: A Benchmark for Enzyme-Reaction Prediction

Bozitao Zhong

Liang Hong

Shuangjia Zheng

Enzymes, with their specific catalyzed reactions, are necessary for all aspects of life, enabling diverse biological processes and adaptatio… (voir plus)ns. Predicting enzyme functions is essential for understanding biological pathways, guiding drug development, enhancing bioproduct yields, and facilitating evolutionary studies. Addressing the inherent complexities, we introduce a new approach to annotating enzymes based on their catalyzed reactions. This method provides detailed insights into specific reactions and is adaptable to newly discovered reactions, diverging from traditional classifications by protein family or expert-derived reaction classes. We employ machine learning algorithms to analyze enzyme reaction datasets, delivering a much more refined view on the functionality of enzymes. Our evaluation leverages the largest enzyme-reaction dataset to date, derived from the SwissProt and Rhea databases with entries up to January 8, 2024. We frame the enzyme-reaction prediction as a retrieval problem, aiming to rank enzymes by their catalytic ability for specific reactions. With our model, we can recruit proteins for novel reactions and predict reactions in novel proteins, facilitating enzyme discovery and function annotation (https://github.com/WillHua127/ReactZyme).

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

doi.org

openreview.net

Are Heterophily-Specific GNNs and Homophily Metrics Really Effective? Evaluation Pitfalls and New Benchmarks

Sitao Luan

Qincheng Lu

Chenqing Hua

Xinyu Wang

Jiaqi Zhu

Xiao-Wen Chang

Guy Wolf

Jian Tang

Over the past decade, Graph Neural Networks (GNNs) have achieved great success on machine learning tasks with relational data. However, rece… (voir plus)nt studies have found that heterophily can cause significant performance degradation of GNNs, especially on node-level tasks. Numerous heterophilic benchmark datasets have been put forward to validate the efficacy of heterophily-specific GNNs and various homophily metrics have been designed to help people recognize these malignant datasets. Nevertheless, there still exist multiple pitfalls that severely hinder the proper evaluation of new models and metrics. In this paper, we point out three most serious pitfalls: 1) a lack of hyperparameter tuning; 2) insufficient model evaluation on the real challenging heterophilic datasets; 3) missing quantitative evaluation benchmark for homophily metrics on synthetic graphs. To overcome these challenges, we first train and fine-tune baseline models on

2024-09-08

ArXiv (prépublication)

doi.org

arxiv.org

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

Sitao Luan

Chenqing Hua

Qincheng Lu

Liheng Ma

Lirong Wu

Xinyu Wang

Minkai Xu

Xiao-Wen Chang

Doina Precup

Rex Ying

Stan Z. Li

Jian Tang

Guy Wolf

Stefanie Jegelka

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to b… (voir plus)e the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

2023-12-31

arXiv (prépublication)

doi.org

arxiv.org

MUDiff: Unified Diffusion for Complete Molecule Generation

Chenqing Hua

Sitao Luan

Minkai Xu

Zhitao Ying

Rex Ying

Jie Fu

Stefano Ermon

Doina Precup

2023-11-17

logconference.io/LOG/2023/Conference (poster)

doi.org

proceedings.mlr.press

When Do Graph Neural Networks Help with Node Classification? Investigating the Impact of Homophily Principle on Node Distinguishability

Sitao Luan

Chenqing Hua

Minkai Xu

Qincheng Lu

Jiaqi Zhu

Xiao-Wen Chang

Jie Fu

Jure Leskovec

Doina Precup

Homophily principle, i.e., nodes with the same labels are more likely to be connected, has been believed to be the main reason for the perfo… (voir plus)rmance superiority of Graph Neural Networks (GNNs) over Neural Networks on node classification tasks. Recent research suggests that, even in the absence of homophily, the advantage of GNNs still exists as long as nodes from the same class share similar neighborhood patterns. However, this argument only considers intra-class Node Distinguishability (ND) but neglects inter-class ND, which provides incomplete understanding of homophily on GNNs. In this paper, we first demonstrate such deficiency with examples and argue that an ideal situation for ND is to have smaller intra-class ND than inter-class ND. To formulate this idea and study ND deeply, we propose Contextual Stochastic Block Model for Homophily (CSBM-H) and define two metrics, Probabilistic Bayes Error (PBE) and negative generalized Jeffreys divergence, to quantify ND. With the metrics, we visualize and analyze how graph filters, node degree distributions and class variances influence ND, and investigate the combined effect of intra- and inter-class ND. Besides, we discovered the mid-homophily pitfall, which occurs widely in graph datasets. Furthermore, we verified that, in real-work tasks, the superiority of GNNs is indeed closely related to both intra- and inter-class ND regardless of homophily levels. Grounded in this observation, we propose a new hypothesis-testing based performance metric beyond homophily, which is non-linear, feature-based and can provide statistical threshold value for GNNs' the superiority. Experiments indicate that it is significantly more effective than the existing homophily metrics on revealing the advantage and disadvantage of graph-aware modes on both synthetic and benchmark real-world datasets.

2023-04-24

ArXiv (prépublication)

openreview.net

Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Networks

Mingde Zhao

Xiao-Wen Chang

The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filte… (voir plus)rs the neighborhood node information. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN methods for learning on certain datasets, as they force the node representations similar, making the nodes gradually lose their identity and become indistinguishable. Hence, we augment the aggregation operations with their dual, i.e. diversification operators that make the node more distinct and preserve the identity. Such augmentation replaces the aggregation with a two-channel filtering process that, in theory, is beneficial for enriching the node representations. In practice, the proposed two-channel filters can be easily patched on existing GNN methods with diverse training strategies, including spectral and spatial (message passing) methods. In the experiments, we observe desired characteristics of the models and significant performance boost upon the baselines on 9 node classification tasks.

2022-11-21

NeurIPS.cc/2022/Workshop/GLFrontiers (publié)

doi.org

openreview.net

When Do We Need Graph Neural Networks for Node Classification?

Sitao Luan

Chenqing Hua

Qincheng Lu

Jiaqi Zhu

Xiao-Wen Chang

Doina Precup

2022-10-29

arXiv.org (prépublication)

doi.org

arxiv.org

High-Order Pooling for Graph Neural Networks with Tensor Decomposition

Chenqing Hua

Guillaume Rabusseau

Jian Tang

Graph Neural Networks (GNNs) are attracting growing attention due to their effectiveness and flexibility in modeling a variety of graph-stru… (voir plus)ctured data. Exiting GNN architectures usually adopt simple pooling operations (eg. sum, average, max) when aggregating messages from a local neighborhood for updating node representation or pooling node representations from the entire graph to compute the graph representation. Though simple and effective, these linear operations do not model high-order non-linear interactions among nodes. We propose the Tensorized Graph Neural Network (tGNN), a highly expressive GNN architecture relying on tensor decomposition to model high-order non-linear node interactions. tGNN leverages the symmetric CP decomposition to efficiently parameterize permutation-invariant multilinear maps for modeling node interactions. Theoretical and empirical analysis on both node and graph classification tasks show the superiority of tGNN over competitive baselines. In particular, tGNN achieves the most solid results on two OGB node classification datasets and one OGB graph classification dataset.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

doi.org

openreview.net

Mila Techaide 2026

Désinformation 2.0 : quand l’IA brouille nos ondes

Avantage IA : productivité dans la fonction publique

Chenqing Hua

Publications

Mila Techaide 2026

Désinformation 2.0 : quand l’IA brouille nos ondes

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Chenqing Hua

Publications