Portrait de Shenyang Huang n'est pas disponible

Shenyang Huang

Doctorat - McGill University
Superviseur⋅e principal⋅e
Co-superviseur⋅e

Publications

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Shenyang Huang
Joao Alex Cunha
Zhiyi Li
Gabriela Moisescu-Pareja
Oleksandr Dymov
Samuel Maddrell-Mander
Callum McLean
Frederik Wenkel
Luis Müller
Jama Hussein Mohamud
Ali Parviz
Michael Craig
Michał Koziarski
Jiarui Lu
Zhaocheng Zhu
Cristian Gabellini
Kerstin Klaser
Josef Dean
Cas Wognum … (voir 15 de plus)
Maciej Sypetkowski
Christopher Morris
Ioannis Koutis
Prudencio Tossou
Hadrien Mary
Therence Bois
Andrew William Fitzgibbon
Blazej Banaszewski
Chad Martin
Dominic Masters
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, wh… (voir plus)ere datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks. The Graphium library is publicly available on Github and the dataset links are available in Part 1 and Part 2.
Laplacian Change Point Detection for Single and Multi-view Dynamic Graphs
Shenyang Huang
Samy Coulombe
Yasmeen Hitti
Dynamic graphs are rich data structures that are used to model complex relationships between entities over time. In particular, anomaly dete… (voir plus)ction in temporal graphs is crucial for many real-world applications such as intrusion identification in network systems, detection of ecosystem disturbances, and detection of epidemic outbreaks. In this article, we focus on change point detection in dynamic graphs and address three main challenges associated with this problem: (i) how to compare graph snapshots across time, (ii) how to capture temporal dependencies, and (iii) how to combine different views of a temporal graph. To solve the above challenges, we first propose Laplacian Anomaly Detection (LAD) which uses the spectrum of graph Laplacian as the low dimensional embedding of the graph structure at each snapshot. LAD explicitly models short-term and long-term dependencies by applying two sliding windows. Next, we propose MultiLAD, a simple and effective generalization of LAD to multi-view graphs. MultiLAD provides the first change point detection method for multi-view dynamic graphs. It aggregates the singular values of the normalized graph Laplacian from different views through the scalar power mean operation. Through extensive synthetic experiments, we show that (i) LAD and MultiLAD are accurate and outperforms state-of-the-art baselines and their multi-view extensions by a large margin, (ii) MultiLAD’s advantage over contenders significantly increases when additional views are available, and (iii) MultiLAD is highly robust to noise from individual views. In five real-world dynamic graphs, we demonstrate that LAD and MultiLAD identify significant events as top anomalies such as the implementation of government COVID-19 interventions which impacted the population mobility in multi-view traffic networks.
Temporal Graph Analysis with TGX
Razieh Shirzadkhani
Shenyang Huang
Elahe Kooshafar
Farimah Poursafaei
Real-world networks, with their evolving relations, are best captured as temporal graphs. However, existing software libraries are largely d… (voir plus)esigned for static graphs where the dynamic nature of temporal graphs is ignored. Bridging this gap, we introduce TGX, a Python package specially designed for analysis of temporal networks that encompasses an automated pipeline for data loading, data processing, and analysis of evolving graphs. TGX provides access to eleven built-in datasets and eight external Temporal Graph Benchmark (TGB) datasets as well as any novel datasets in the .csv format. Beyond data loading, TGX facilitates data processing functionalities such as discretization of temporal graphs and node subsampling to accelerate working with larger datasets. For comprehensive investigation, TGX offers network analysis by providing a diverse set of measures, including average node degree and the evolving number of nodes and edges per timestamp. Additionally, the package consolidates meaningful visualization plots indicating the evolution of temporal patterns, such as Temporal Edge Appearance (TEA) and Temporal Edge Trafficc (TET) plots. The TGX package is a robust tool for examining the features of temporal graphs and can be used in various areas like studying social networks, citation networks, and tracking user interactions. We plan to continuously support and update TGX based on community feedback. TGX is publicly available on: https://github.com/ComplexData-MILA/TGX.
Temporal Graph Benchmark for Machine Learning on Temporal Graphs
Shenyang Huang
Farimah Poursafaei
Jacob Danovitch
Matthias Fey
Weihua Hu
Emanuele Rossi
Jure Leskovec
Michael M. Bronstein
We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and r… (voir plus)obust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/.
GPS++: Reviving the Art of Message Passing for Molecular Property Prediction
Dominic Masters
Josef Dean
Kerstin Klaeser
Zhiyi Li
Samuel Maddrell-Mander
Adam Sanders
Hatem Helal
Deniz Beker
Andrew William Fitzgibbon
Shenyang Huang
Ladislav Rampášek
Fast and Attributed Change Detection on Dynamic Graphs with Density of States
Shenyang Huang
Jacob Danovitch
Towards Better Evaluation for Dynamic Link Prediction
Farimah Poursafaei
Andy Huang
Shenyang Huang
Kellin Pelrine
Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In th… (voir plus)is work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization-based baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings which highlights that the negative edges used in the current evaluation are easy. To sample more challenging negative edges, we introduce two novel negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.
Incorporating dynamic flight network in SEIR to model mobility between populations
Xiaoye Ding
Shenyang Huang
Abby Leung
Understanding Capacity Saturation in Incremental Learning
Shenyang Huang
Vincent Francois-Lavet
Scalable Change Point Detection for Dynamic Graphs
Real world networks often evolve in complex ways over time. Understanding anomalies in dynamic networks is crucial for applications such as … (voir plus)traffic accident detection, intrusion identification and detection of ecosystem disturbances. In this work, we focus on the problem of change point detection in dynamic graphs. The goal is to identify time steps where the graph structure deviates significantly from the norm. Despite empirical success of recent methods, building a change point detection method for real world dynamic graphs, which often scale to millions of nodes, remains an open question. To fill this gap, we propose LADdos, a scalable method for change point detection in dynamic graphs. LADdos brings together ideas from two recent works: an accurate change point detection method for graphs called LAD [10] which detects the changes in the full Laplacian spectrum of the graph in each timestamp, and the general framework of network density of states (DOS) [5] which models the distribution of the singular values through efficient approximation methods. In experiments with two common graph models –the Stochastic Block Model (SBM) and the Barabási-Albert (BA) model – we show that LADdos has equal performance to LAD, which is the current state-of-the-art, while being orders of magnitude faster. For instance, on a dynamic graph with total 21 million edges over 150 timestamps, LADdos achieves 100x speedup when compared to LAD.
Contact Graph Epidemic Modelling of COVID-19 for Transmission and Intervention Strategies
Abby Leung
Xiaoye Ding
Shenyang Huang
The coronavirus disease 2019 (COVID-19) pandemic has quickly become a global public health crisis unseen in recent years. It is known that t… (voir plus)he structure of the human contact network plays an important role in the spread of transmissible diseases. In this work, we study a structure aware model of COVID-19 CGEM. This model becomes similar to the classical compartment-based models in epidemiology if we assume the contact network is a Erdos-Renyi (ER) graph, i.e. everyone comes into contact with everyone else with the same probability. In contrast, CGEM is more expressive and allows for plugging in the actual contact networks, or more realistic proxies for it. Moreover, CGEM enables more precise modelling of enforcing and releasing different non-pharmaceutical intervention (NPI) strategies. Through a set of extensive experiments, we demonstrate significant differences between the epidemic curves when assuming different underlying structures. More specifically we demonstrate that the compartment-based models are overestimating the spread of the infection by a factor of 3, and under some realistic assumptions on the compliance factor, underestimating the effectiveness of some of NPIs, mischaracterizing others (e.g. predicting a later peak), and underestimating the scale of the second peak after reopening.
Laplacian Change Point Detection for Dynamic Graphs
Shenyang Huang
Yasmeen Hitti