Portrait of Shenyang Huang

Shenyang Huang

Collaborating Alumni - McGill University
Supervisor
Co-supervisor
Research Topics
Data Mining
Deep Learning
Graph Neural Networks
Molecular Modeling
Spectral Learning

Publications

Temporal Graph Analysis with TGX
Real-world networks, with their evolving relations, are best captured as temporal graphs. However, existing software libraries are largely d… (see more)esigned for static graphs where the dynamic nature of temporal graphs is ignored. Bridging this gap, we introduce TGX, a Python package specially designed for analysis of temporal networks that encompasses an automated pipeline for data loading, data processing, and analysis of evolving graphs. TGX provides access to eleven built-in datasets and eight external Temporal Graph Benchmark (TGB) datasets as well as any novel datasets in the .csv format. Beyond data loading, TGX facilitates data processing functionalities such as discretization of temporal graphs and node subsampling to accelerate working with larger datasets. For comprehensive investigation, TGX offers network analysis by providing a diverse set of measures, including average node degree and the evolving number of nodes and edges per timestamp. Additionally, the package consolidates meaningful visualization plots indicating the evolution of temporal patterns, such as Temporal Edge Appearance (TEA) and Temporal Edge Trafficc (TET) plots. The TGX package is a robust tool for examining the features of temporal graphs and can be used in various areas like studying social networks, citation networks, and tracking user interactions. We plan to continuously support and update TGX based on community feedback. TGX is publicly available on: https://github.com/ComplexData-MILA/TGX.
UTG: Towards a Unified View of Snapshot and Event Based Models for Temporal Graphs
Many real world graphs are inherently dynamic, constantly evolving with node and edge additions. These graphs can be represented by temporal… (see more) graphs, either through a stream of edge events or a sequence of graph snapshots. Until now, the development of machine learning methods for both types has occurred largely in isolation, resulting in limited experimental comparison and theoretical crosspollination between the two. In this paper, we introduce Unified Temporal Graph (UTG), a framework that unifies snapshot-based and event-based machine learning models under a single umbrella, enabling models developed for one representation to be applied effectively to datasets of the other. We also propose a novel UTG training procedure to boost the performance of snapshot-based models in the streaming setting. We comprehensively evaluate both snapshot and event-based models across both types of temporal graphs on the temporal link prediction task. Our main findings are threefold: first, when combined with UTG training, snapshot-based models can perform competitively with event-based models such as TGN and GraphMixer even on event datasets. Second, snapshot-based models are at least an order of magnitude faster than most event-based models during inference. Third, while event-based methods such as NAT and DyGFormer outperforms snapshot-based methods on both types of temporal graphs, this is because they leverage joint neighborhood structural features thus emphasizing the potential to incorporate these features into snapshotbased models as well. These findings highlight the importance of comparing model architectures independent of the data format and suggest the potential of combining the efficiency of snapshot-based models with the performance of event-based models in the future.
Laplacian Change Point Detection for Single and Multi-view Dynamic Graphs
Dynamic graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly dete… (see more)ction in temporal graphs is crucial for many real world applications such as intrusion identification in network systems, detection of ecosystem disturbances and detection of epidemic outbreaks. In this paper, we focus on change point detection in dynamic graphs and address three main challenges associated with this problem: i). how to compare graph snapshots across time, ii). how to capture temporal dependencies, and iii). how to combine different views of a temporal graph. To solve the above challenges, we first propose Laplacian Anomaly Detection (LAD) which uses the spectrum of graph Laplacian as the low dimensional embedding of the graph structure at each snapshot. LAD explicitly models short term and long term dependencies by applying two sliding windows. Next, we propose MultiLAD, a simple and effective generalization of LAD to multi-view graphs. MultiLAD provides the first change point detection method for multi-view dynamic graphs. It aggregates the singular values of the normalized graph Laplacian from different views through the scalar power mean operation. Through extensive synthetic experiments, we show that i). LAD and MultiLAD are accurate and outperforms state-of-the-art baselines and their multi-view extensions by a large margin, ii). MultiLAD's advantage over contenders significantly increases when additional views are available, and iii). MultiLAD is highly robust to noise from individual views. In five real world dynamic graphs, we demonstrate that LAD and MultiLAD identify significant events as top anomalies such as the implementation of government COVID-19 interventions which impacted the population mobility in multi-view traffic networks.
Temporal Graph Benchmark for Machine Learning on Temporal Graphs
Matthias Fey
Weihua Hu
Emanuele Rossi
Jure Leskovec
Michael M. Bronstein
We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and r… (see more)obust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/.
GPS++: Reviving the Art of Message Passing for Molecular Property Prediction
Dominic Masters
Josef Dean
Kerstin Klaser
Zhiyi Li
Samuel Maddrell-Mander
Adam Sanders
Hatem Helal
Deniz Beker
Andrew William Fitzgibbon
Fast and Attributed Change Detection on Dynamic Graphs with Density of States
Towards Better Evaluation for Dynamic Link Prediction
Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In th… (see more)is work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization-based baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings which highlights that the negative edges used in the current evaluation are easy. To sample more challenging negative edges, we introduce two novel negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.
Understanding Capacity Saturation in Incremental Learning
Incorporating dynamic flight network in SEIR to model mobility between populations
Xiaoye Ding
Abby Leung
Current efforts of modelling COVID-19 are often based on the standard compartmental models such as SEIR and their variations. As pre-symptom… (see more)atic and asymptomatic cases can spread the disease between populations through travel, it is important to incorporate mobility between populations into the epidemiological modelling. In this work, we propose to modify the commonly-used SEIR model to account for the dynamic flight network, by estimating the imported cases based on the air traffic volume as well as the test positive rate at the source. This modification, called Flight-SEIR, can potentially enable 1). early detection of outbreaks due to imported pre-symptomatic and asymptomatic cases, 2). more accurate estimation of the reproduction number and 3). evaluation of the impact of travel restrictions and the implications of lifting these measures. The proposed Flight-SEIR is essential in navigating through this pandemic and the next ones, given how interconnected our world has become.
Scalable Change Point Detection for Dynamic Graphs
Real world networks often evolve in complex ways over time. Understanding anomalies in dynamic networks is crucial for applications such as … (see more)traffic accident detection, intrusion identification and detection of ecosystem disturbances. In this work, we focus on the problem of change point detection in dynamic graphs. The goal is to identify time steps where the graph structure deviates significantly from the norm. Despite empirical success of recent methods, building a change point detection method for real world dynamic graphs, which often scale to millions of nodes, remains an open question. To fill this gap, we propose LADdos, a scalable method for change point detection in dynamic graphs. LADdos brings together ideas from two recent works: an accurate change point detection method for graphs called LAD [10] which detects the changes in the full Laplacian spectrum of the graph in each timestamp, and the general framework of network density of states (DOS) [5] which models the distribution of the singular values through efficient approximation methods. In experiments with two common graph models –the Stochastic Block Model (SBM) and the Barabási-Albert (BA) model – we show that LADdos has equal performance to LAD, which is the current state-of-the-art, while being orders of magnitude faster. For instance, on a dynamic graph with total 21 million edges over 150 timestamps, LADdos achieves 100x speedup when compared to LAD.
Contact Graph Epidemic Modelling of COVID-19 for Transmission and Intervention Strategies
Abby Leung
Xiaoye Ding
The coronavirus disease 2019 (COVID-19) pandemic has quickly become a global public health crisis unseen in recent years. It is known that t… (see more)he structure of the human contact network plays an important role in the spread of transmissible diseases. In this work, we study a structure aware model of COVID-19 CGEM. This model becomes similar to the classical compartment-based models in epidemiology if we assume the contact network is a Erdos-Renyi (ER) graph, i.e. everyone comes into contact with everyone else with the same probability. In contrast, CGEM is more expressive and allows for plugging in the actual contact networks, or more realistic proxies for it. Moreover, CGEM enables more precise modelling of enforcing and releasing different non-pharmaceutical intervention (NPI) strategies. Through a set of extensive experiments, we demonstrate significant differences between the epidemic curves when assuming different underlying structures. More specifically we demonstrate that the compartment-based models are overestimating the spread of the infection by a factor of 3, and under some realistic assumptions on the compliance factor, underestimating the effectiveness of some of NPIs, mischaracterizing others (e.g. predicting a later peak), and underestimating the scale of the second peak after reopening.
Laplacian Change Point Detection for Dynamic Graphs