Minkai Xu

Xiao-Wen Chang

Doina Precup

Rex Ying

Stan Z. Li

Guy Wolf

Stefanie Jegelka

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to b… (voir plus)e the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

2023-12-31

arXiv (prépublication)

arxiv.org

MUDiff: Unified Diffusion for Complete Molecule Generation

Chenqing Hua

Sitao Luan

Zhitao Ying

Rex Ying

Jie Fu

Stefano Ermon

Doina Precup

2023-11-17

logconference.io/LOG/2023/Conference (poster)

proceedings.mlr.press

When Do Graph Neural Networks Help with Node Classification? Investigating the Impact of Homophily Principle on Node Distinguishability

Sitao Luan

Chenqing Hua

Qincheng Lu

Jiaqi Zhu

Xiao-Wen Chang

Jie Fu

Jure Leskovec

Doina Precup

Homophily principle, i.e., nodes with the same labels are more likely to be connected, has been believed to be the main reason for the perfo… (voir plus)rmance superiority of Graph Neural Networks (GNNs) over Neural Networks on node classification tasks. Recent research suggests that, even in the absence of homophily, the advantage of GNNs still exists as long as nodes from the same class share similar neighborhood patterns. However, this argument only considers intra-class Node Distinguishability (ND) but neglects inter-class ND, which provides incomplete understanding of homophily on GNNs. In this paper, we first demonstrate such deficiency with examples and argue that an ideal situation for ND is to have smaller intra-class ND than inter-class ND. To formulate this idea and study ND deeply, we propose Contextual Stochastic Block Model for Homophily (CSBM-H) and define two metrics, Probabilistic Bayes Error (PBE) and negative generalized Jeffreys divergence, to quantify ND. With the metrics, we visualize and analyze how graph filters, node degree distributions and class variances influence ND, and investigate the combined effect of intra- and inter-class ND. Besides, we discovered the mid-homophily pitfall, which occurs widely in graph datasets. Furthermore, we verified that, in real-work tasks, the superiority of GNNs is indeed closely related to both intra- and inter-class ND regardless of homophily levels. Grounded in this observation, we propose a new hypothesis-testing based performance metric beyond homophily, which is non-linear, feature-based and can provide statistical threshold value for GNNs' the superiority. Experiments indicate that it is significantly more effective than the existing homophily metrics on revealing the advantage and disadvantage of graph-aware modes on both synthetic and benchmark real-world datasets.

2023-04-24

ArXiv (prépublication)

openreview.net

FusionRetro: Molecule Representation Fusion via Reaction Graph for Retrosynthetic Planning

Songtao Liu

Zhengkai Tu

Zuobai Zhang

Peilin Zhao

Rex Ying

Lu Lin

Dinghao Wu

Retrosynthetic planning is a fundamental problem in drug discovery and organic chemistry, which aims to ﬁnd a complete multi-step syntheti… (voir plus)c route from a set of starting materials to the target molecule, determining crucial process ﬂow in chemical production. Existing approaches combine single-step retrosynthesis models and search algorithms to ﬁnd synthetic routes. However, these approaches generally consider the two pieces in a decoupled manner, taking only the product as the input to predict the reactants per planning step and largely ignoring the important context information from other intermediates along the synthetic route. In this work, we perform a series of experiments to identify the limitations of this decoupled view and propose a novel retrosynthesis framework that also exploits context information for retrosynthetic planning. We view synthetic routes as reaction graphs, and propose to incorporate the context by three principled steps: encode molecules into embeddings, aggregate information over routes, and readout to predict reactants. The whole framework can be efﬁciently optimized in an end-to-end fashion. Comprehensive experiments show that by fusing in context information over routes, our model sig-niﬁcantly improves the performance of retrosyn-thetic planning over baselines that are not context-aware, especially for long synthetic routes.

2022-12-31

(publié)

www.semanticscholar.org

An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming

Wujie Wang

Shitong Luo

Chence Shi

Yoshua Bengio

Rafael Gomez-Bombarelli

Predicting molecular conformations (or 3D structures) from molecular graphs is a fundamental problem in many applications. Most existing app… (voir plus)roaches are usually divided into two steps by first predicting the distances between atoms and then generating a 3D structure through optimizing a distance geometry problem. However, the distances predicted with such two-stage approaches may not be able to consistently preserve the geometry of local atomic neighborhoods, making the generated structures unsatisfying. In this paper, we propose an end-to-end solution for molecular conformation prediction called ConfVAE based on the conditional variational autoencoder framework. Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program. Extensive experiments on several benchmark data sets prove the effectiveness of our proposed approach over existing state-of-the-art approaches. Code is available at https://github.com/MinkaiXu/ConfVAE-ICML21

2020-12-31

ICML (publié)

proceedings.mlr.press

Learning Neural Generative Dynamics for Molecular Conformation Generation

Shitong Luo

Yoshua Bengio

Jian Peng

We study how to generate molecule conformations (i.e., 3D structures) from a molecular graph. Traditional methods, such as molecular dynamic… (voir plus)s, sample conformations via computationally expensive simulations. Recently, machine learning methods have shown great potential by training on a large collection of conformation data. Challenges arise from the limited model capacity for capturing complex distributions of conformations and the difficulty in modeling long-range dependencies between atoms. Inspired by the recent progress in deep generative models, in this paper, we propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph. We propose a method combining the advantages of both flow-based and energy-based models, enjoying: (1) a high model capacity to estimate the multimodal conformation distribution; (2) explicitly capturing the complex long-range dependencies between atoms in the observation space. Extensive experiments demonstrate the superior performance of the proposed method on several benchmarks, including conformation generation and distance modeling tasks, with a significant improvement over existing generative models for molecular conformation sampling.

2020-12-31

ICLR (publié)