Sitao Luan

Qian Zhang

Jie Fu

2025-04-06

IEEE International Conference on Acoustics, Speech, and Signal Processing (published)

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Chenqing Hua

Yong Liu

Odin Zhang

Kevin K Yang

Shuangjia Zheng

2024-10-13

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

Reactzyme: A Benchmark for Enzyme-Reaction Prediction

Chenqing Hua

Bozitao Zhong

Liang Hong

Shuangjia Zheng

2024-09-26

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

Are Heterophily-Specific GNNs and Homophily Metrics Really Effective? Evaluation Pitfalls and New Benchmarks

Qincheng Lu

Chenqing Hua

Xinyu Wang

Jiaqi Zhu

Xiao-Wen Chang

Jian Tang

Over the past decade, Graph Neural Networks (GNNs) have achieved great success on machine learning tasks with relational data. However, rece… (see more)nt studies have found that heterophily can cause significant performance degradation of GNNs, especially on node-level tasks. Numerous heterophilic benchmark datasets have been put forward to validate the efficacy of heterophily-specific GNNs and various homophily metrics have been designed to help people recognize these malignant datasets. Nevertheless, there still exist multiple pitfalls that severely hinder the proper evaluation of new models and metrics. In this paper, we point out three most serious pitfalls: 1) a lack of hyperparameter tuning; 2) insufficient model evaluation on the real challenging heterophilic datasets; 3) missing quantitative evaluation benchmark for homophily metrics on synthetic graphs. To overcome these challenges, we first train and fine-tune baseline models on

2024-09-09

ArXiv (preprint)

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

Chenqing Hua

Qincheng Lu

Liheng Ma

Lirong Wu

Xinyu Wang

Minkai Xu

Xiao-Wen Chang

Rex Ying

Stan Z. Li

Jian Tang

Stefanie Jegelka

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to b… (see more)e the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

2024-07-12

ArXiv (preprint)

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

Chenqing Hua

Qincheng Lu

Liheng Ma

Lirong Wu

Xinyu Wang

Minkai Xu

Xiao-Wen Chang

Rex Ying

Stan Z. Li

Jian Tang

Stefanie Jegelka

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to b… (see more)e the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

2024-07-12

ArXiv (preprint)

Training Matters: Unlocking Potentials of Deeper Graph Convolutional Neural Networks

Mingde Zhao

Xiao-Wen Chang

2024-02-20

Complex Networks & Their Applications XII (published)

When Do We Need Graph Neural Networks for Node Classification?

Chenqing Hua

Qincheng Lu

Jiaqi Zhu

Xiao-Wen Chang

2024-02-20

Complex Networks & Their Applications XII (published)

MUDiff: Unified Diffusion for Complete Molecule Generation

Chenqing Hua

Minkai Xu

Zhitao Ying

Rex Ying

Jie Fu

Stefano Ermon

2023-11-18

logconference.io/LOG/2023/Conference (poster)

When Do Graph Neural Networks Help with Node Classification? Investigating the Impact of Homophily Principle on Node Distinguishability

Chenqing Hua

Minkai Xu

Qincheng Lu

Jiaqi Zhu

Xiao-Wen Chang

Jie Fu

Jure Leskovec

2023-04-25

ArXiv (preprint)

When Do Graph Neural Networks Help with Node Classification: Investigating the Homophily Principle on Node Distinguishability

Chenqing Hua

Minkai Xu

Qincheng Lu

Jiaqi Zhu

Xiao-Wen Chang

Jie Fu

Jure Leskovec

Homophily principle, i.e., nodes with the same labels are more likely to be connected, was believed to be the main reason for the performanc… (see more)e superiority of Graph Neural Networks (GNNs) over Neural Networks (NNs) on Node Classiﬁcation (NC) tasks. Recently, people have developed theoretical results arguing that, even though the homophily principle is broken, the advantage of GNNs can still hold as long as nodes from the same class share similar neighborhood patterns [29], which questions the validity of homophily. However, this argument only considers intra-class Node Distinguishability (ND) and ignores inter-class ND, which is insufﬁcient to study the effect of homophily. In this paper, we ﬁrst demonstrate the aforementioned insufﬁciency with examples and argue that an ideal situation for ND is to have smaller intra-class ND than inter-class ND. To formulate this idea and have a better understanding of homophily, we propose Contextual Stochastic Block Model for Homophily (CSBM-H) and deﬁne two metrics, Probabilistic Bayes Error (PBE) and Expected Negative KL-divergence (ENKL), to quantify ND, through which we can also ﬁnd how intra- and inter-class ND inﬂuence ND together. We visualize the results and give detailed analysis. Through experiments, we veriﬁed that the superiority of GNNs is

Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Networks

Harry Zhao

Mingde Zhao

Chenqing Hua

Xiao-Wen Chang

The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filte… (see more)rs the neighborhood node information. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN methods for learning on certain datasets, as they force the node representations similar, making the nodes gradually lose their identity and become indistinguishable. Hence, we augment the aggregation operations with their dual, i.e. diversification operators that make the node more distinct and preserve the identity. Such augmentation replaces the aggregation with a two-channel filtering process that, in theory, is beneficial for enriching the node representations. In practice, the proposed two-channel filters can be easily patched on existing GNN methods with diverse training strategies, including spectral and spatial (message passing) methods. In the experiments, we observe desired characteristics of the models and significant performance boost upon the baselines on 9 node classification tasks.

2022-11-22

NeurIPS.cc/2022/Workshop/GLFrontiers (published)