Mila Releases a General and High-performance Graph Embedding System

Graphs are general and flexible data structures for encoding the relationships between different objects and are ubiquitous in the real world. Examples of real-world graphs include social networks, citation graphs, protein-protein interaction graphs, and knowledge graphs, covering a variety of applications and domains. Recently, there is growing interest in learning effective representations of graphs due to their effectiveness in a variety of tasks. However, the problem is challenging as real-world graphs could be very large and heterogeneous. Therefore, there are strong needs to develop scalable and general graph representation systems for different tasks and applications in both academia and industry communities. 

At Mila, professor Jian Tang’s group developed a general and high-performance graph embedding system called GraphVite. Compared to existing machine learning systems such as Tensorflow and Pytorch, which are mainly designed for data with regular structures (e.g., images, speech, and natural language), GraphVite is specifically designed for large-scale graphs. It runs on the CPU-GPU hybrid architectures and scales linearly to the number of GPUs. The system is one or two magnitudes faster than existing implementations. For example, for a graph with one million nodes, it only takes around one minute to learn the node representations with 4 GPUs. Besides the superior efficiency, GraphVite also supports a variety of applications, including:

  • Node Embedding, which aims at learning node embeddings of large-scale graphs. GraphVite now includes some state-of-the-art node embedding methods such as DeepWalk, LINE, and node2vec. We plan to add more methods in the future.

  • Knowledge Graph Embedding. The goal is to learn representations of both entities and relations. Now we support a variety of representative methods including TransE, DisMult, CompEx, SimplE and RotatE. We will also add more approaches in the future. 

  • Graph and High-dimensional Data Visualization. GraphVite also supports learning 2D or 3D coordinates of nodes to visualize graphs, which can be generalized to visualize any high-dimensional data. This can be particularly useful to visualize the representations learned by deep neural networks. GraphVite now implements one of the state-of-the-art visualization algorithms--LargeVis. In the future, we plan to further include other visualization algorithms such as t-SNE and UMAP.

Besides the amazing speed, GraphVite also provides complete, user-friendly application pipelines for research and development. With modules like datasets and evaluation tasks, the system is a self-contained environment for embedding models and experiments. There are more than 30 baseline benchmarks of existing well known models on the standard datasets. It is very easy to reproduce these models, deploy them on large real-world datasets, and develop new models for graph representation learning. We wish to accelerate the research and development of graph representation learning through GraphVite.

The development of GraphVite is led by the first-year P.h.D. student Zhaocheng Zhu and also contributed by Shizhen Xu and Meng Qu. The repository of GraphVite is available at: https://github.com/DeepGraphLearning/graphvite, and the original paper is available at https://arxiv.org/abs/1903.00757. For more information about GraphVite, please visit the website https://graphvite.io/