GraphScope
๐จ ๐ ๐ป ๐ GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | ไธ็ซๅผๅพ่ฎก็ฎ็ณป็ป
Install / Use
/learn @alibaba/GraphScopeREADME
๐ See our ongoing GraphScope Flex: a LEGO-inspired, modular, and user-friendly GraphScope evolution. ๐
GraphScope is a unified distributed graph computing platform that provides a one-stop environment for performing diverse graph operations on a cluster of computers through a user-friendly Python interface. GraphScope makes multi-staged processing of large-scale graph data on compute clusters simply by combining several important pieces of Alibaba technology: including GRAPE, MaxGraph, and Graph-Learn (GL) for analytics, interactive, and graph neural networks (GNN) computation, respectively, and the Vineyard store that offers efficient in-memory data transfers.
Visit our website at graphscope.io to learn more.
Latest News
- [21/04/2025] ๐ GraphScope achieved record-breaking results on the LDBC Social Network Benchmark Interactive workload using declarative query language CYPHER, with a 2.0ร higher throughput on SF300 than the previous record holder! ๐
- [31/07/2024] ๐ข Weโve launched a webpage visualizing GraphScopeโs journey in graph computing. Check it out!
- [30/05/2024] ๐ GraphScope Flex set new record-breaking SNB Interactive audit results, as announced by LDBC on X (Twitter)!
- [25/03/2024] ๐๐ป We donated the graph file format GraphAr to Apache Software Foundation as an Incubating Project.
- [05/02/2024] ๐ GraphScope Flex paper was accepted by SIGMOD 2024 Industry Track. See you in ๐จ๐ฑ!
- [19/12/2023] ๐ A paper introducing GraphScope Flex released on arXiv.org.
- [20/07/2023] ๐ GraphScope achieved record-breaking results on the LDBC Social Network Benchmark Interactive workload, with a 2.45ร higher throughput on SF300 than the previous record holder! ๐
- [04/07/2023] ๐ GraphScope Flex tech preview released with v0.23.0.
Table of Contents
- Getting Started
- Demo: Node Classification on Citation Network
- Graph Processing on Kubernetes
- Development
- Documentation
- License
- Publications
- Joining our Community!
Getting Started
We provide a Playground with a managed JupyterLab. Try GraphScope straight away in your browser!
GraphScope supports running in standalone mode or on clusters managed by Kubernetes within containers. For quickly getting started, let's begin with the standalone mode.
Installation for Standalone Mode
GraphScope pre-compiled package is distributed as a python package and can be easily installed with pip.
pip3 install graphscope
Note that graphscope requires Python >= 3.8 and pip >= 19.3. The package is built for and tested on the most popular Linux (Ubuntu 20.04+ / CentOS 7+) and macOS 12+ (Intel/Apple silicon) distributions. For Windows users, you may want to install Ubuntu on WSL2 to use this package.
Next, we will walk you through a concrete example to illustrate how GraphScope can be used by data scientists to effectively analyze large graphs.
Demo: Node Classification on Citation Network
ogbn-mag is a heterogeneous network composed of a subset of the Microsoft Academic Graph. It contains 4 types of entities(i.e., papers, authors, institutions, and fields of study), as well as four types of directed relations connecting two entities.
Given the heterogeneous ogbn-mag data, the task is to predict the class of each paper. Node classification can identify papers in multiple venues, which represent different groups of scientific work on different topics. We apply both the attribute and structural information to classify papers. In the graph, each paper node contains a 128-dimensional word2vec vector representing its content, which is obtained by averaging the embeddings of words in its title and abstract. The embeddings of individual words are pre-trained. The structural information is computed on-the-fly.
Loading a graph
GraphScope models graph data as property graph, in which the edges/vertices are labeled and have many properties.
Taking ogbn-mag as example, the figure below shows the model of the property graph.
This graph has four kinds of vertices, labeled as paper, author, institution and field_of_study. There are four kinds of edges connecting them, each kind of edges has a label and specifies the vertex labels for its two ends. For example, cites edges connect two vertices labeled paper. Another example is writes, it requires the source vertex is labeled author and the destination is a paper vertex. All the vertices and edges may have properties. e.g., paper vertices have properties like features, publish year, subject label, etc.
To load this graph to GraphScope with our retrieval module, please use these code:
import graphscope
from graphscope.dataset import load_ogbn_mag
g = load_ogbn_mag()
We provide a set of functions to load graph datasets from ogb and snap for convenience. Please find all the available graphs here. If you want to use your own graph data, please refer this doc to load vertices and edges by labels.
Interactive query
Interactive queries allow users to directly explore, examine, and present graph data in an exploratory manner in order to locate specific or in-depth information in time. GraphScope adopts a high-level language called Gremlin for graph traversal, and provides efficient execution at scale.
In this example, we use graph traversal to count the number of papers two given authors have co-authored. To simplify the query, we assume the authors can be uniquely identified by ID 2 and 4307, respectively.
# get the endpoint for submitting Gremlin queries on graph g.
interactive = graphscope.gremlin(g)
# count the number of papers two authors (with id 2 and 4307) have co-authored
papers = interactive.execute("g.V().has('author', 'id', 2).out('writes').where(__.in('writes').has('id', 4307)).count()").one()
Graph analytics
Graph analytics is widely used in real world. Many algorithms, like community detection, paths and connectivity, centrality are proven to be very useful in various businesses. GraphScope ships with a set of [built-in algorithms](https://graphscope.io/
