SkillAgentSearch skills...

GraphGym

Platform for designing and evaluating Graph Neural Networks (GNN)

Install / Use

/learn @snap-stanford/GraphGym
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

GraphGym

GraphGym is a platform for designing and evaluating Graph Neural Networks (GNN). GraphGym is proposed in Design Space for Graph Neural Networks, Jiaxuan You, Rex Ying, Jure Leskovec, NeurIPS 2020 Spotlight.

Please also refer to PyG for a tightly integrated version of GraphGym and PyG.

Highlights

1. Highly modularized pipeline for GNN

  • Data: Data loading, data splitting
  • Model: Modularized GNN implementation
  • Tasks: Node / edge / graph level GNN tasks
  • Evaluation: Accuracy, ROC AUC, ...

2. Reproducible experiment configuration

  • Each experiment is fully described by a configuration file

3. Scalable experiment management

  • Easily launch thousands of GNN experiments in parallel
  • Auto-generate experiment analyses and figures across random seeds and experiments.

4. Flexible user customization

  • Easily register your own modules in graphgym/contrib/, such as data loaders, GNN layers, loss functions, etc.

News

  • GraphGym 0.3.0 has been released. Now you may install stable version of GraphGym via pip install graphgym.
  • GraphGym 0.2.0 has been released. Now GraphGym supports Pytorch Geometric backend, in addition to the default DeepSNAP backend. You may try it out in run_single_pyg.sh.
cd run
bash run_single_pyg.sh 

Example use cases

Why GraphGym?

TL;DR: GraphGym is great for GNN beginners, domain experts and GNN researchers.

Scenario 1: You are a beginner to GNN, who wants to understand how GNN works.

You probably have read many exciting papers on GNN, and try to write your own GNN implementation. Using existing packages for GNN, you still have to code up the essential pipeline on your own. GraphGym is a perfect place for your to start learning standardized GNN implementation and evaluation.

<div align="center"> <img align="center" src="https://github.com/snap-stanford/GraphGym/raw/master/docs/design_space.png" width="400px" /> <b><br>Figure 1: Modularized GNN implementation.</b> </div> <br>

Scenario 2: You want to apply GNN to your exciting applications.

You probably know that there are hundreds of possible GNN models, and selecting the best model is notoriously hard. Even worse, we have shown in our paper that the best GNN designs for different tasks differ drastically. GraphGym provides a simple interface to try out thousands of GNNs in parallel and understand the best designs for your specific task. GraphGym also recommends a "go-to" GNN design space, after investigating 10 million GNN model-task combinations.

<div align="center"> <img align="center" src="https://github.com/snap-stanford/GraphGym/raw/master/docs/rank.png" width="1000px" /> <b><br>Figure 2: A guideline for desirable GNN design choices.</b> <br>(Sampling from 10 million GNN model-task combinations.) </div> <br>

Scenario 3: You are a GNN researcher, who wants to innovate GNN models / propose new GNN tasks.

Say you have proposed a new GNN layer ExampleConv. GraphGym can help you convincingly argue that ExampleConv is better than say GCNConv: when randomly sample from 10 million possible model-task combinations, how often ExampleConv will outperform GCNConv, when everything else is fixed (including the computational cost). Moreover, GraphGym can help you easily do hyper-parameter search, and visualize what design choices are better. In sum, GraphGym can greatly facilitate your GNN research.

<div align="center"> <img align="center" src="https://github.com/snap-stanford/GraphGym/raw/master/docs/evaluation.png" width="1000px" /> <b><br>Figure 3: Evaluation of a given GNN design dimension</b> (BatchNorm here). </div> <br>

Installation

Requirements

  • CPU or NVIDIA GPU, Linux, Python3
  • PyTorch, various Python packages; Instructions for installing these dependencies are found below

1. Python environment (Optional): We recommend using Conda package manager

conda create -n graphgym python=3.7
source activate graphgym

2. Pytorch: Install PyTorch. We have verified GraphGym under PyTorch 1.8.0, and GraphGym should work with PyTorch 1.4.0+. For example:

# CUDA versions: cpu, cu92, cu101, cu102, cu101, cu111
pip install torch==1.8.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

3. Pytorch Geometric: Install PyTorch Geometric, follow their instructions. For example:

# CUDA versions: cpu, cu92, cu101, cu102, cu101, cu111
# TORCH versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0
CUDA=cu101
TORCH=1.8.0
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric

4. GraphGym and other dependencies:

git clone https://github.com/snap-stanford/GraphGym
cd GraphGym
pip install -r requirements.txt
pip install -e .  # From latest verion
pip install graphgym # (Optional) From pypi stable version

5. Test the installation

Run a single experiment. Run a test GNN experiment using GraphGym run_single.sh. Configurations are specified in example.yaml. The experiment is about node classification on Cora dataset (random 80/20 train/val split).

cd run
bash run_single.sh # run a single experiment

Run a batch of experiments. Run a batch of GNN experiments using GraphGym run_batch.sh. Configurations are specified specified in example.yaml (controls the basic architecture) and example.txt (controls how to do grid search). The experiment examines 96 models in the recommended GNN design space, on 2 graph classification datasets. Each experiment is repeated 3 times, and we set that 8 jobs can be concurrently run. Depending on your infrastructure, finishing all the experiments may take a long time; you can quit the experiment by Ctrl-C (GraphGym will properly kill all the processes).

cd run
bash run_batch.sh # run a batch of experiments 

(Optional) Run GraphGym with CPU backend. GraphGym supports cpu backend as well -- you only need to add one line device: cpu to the .yaml file. Here we provide an example.

cd run
bash run_single_cpu.sh # run a single experiment using CPU backend

(Optional) Run GraphGym with PyG backend. Run GraphGym with Pytorch Geometric (PyG) backend run_single_pyg.sh and run_batch_pyg.sh, instead of the default DeepSNAP backend. The PyG backend follows the native PyG implementation, and is slightly more efficient than the DeepSNAP backend. Currently the PyG backend only supports user-provided dataset splits, such as PyG native datasets or OGB datasets.

cd run
bash run_single_pyg.sh # run a single experiment using PyG backend
bash run_batch_pyg.sh # run a batch of experiments using PyG backend 

GraphGym In-depth Usage

1 Run a single GNN experiment

A full example is specified in run/run_single.sh.

1.1 Specify a configuration file. In GraphGym, an experiment is fully specified by a .yaml file. Unspecified configurations in the .yaml file will be populated by the default values in graphgym/config.py. For example, in run/configs/example.yaml, there are configurations on dataset, training, model, GNN, etc. Concrete description for each configuration is described in graphgym/config.py.

1.2 Launch an experiment. For example, in run/run_single.sh:

python main.py --cfg configs/example.yaml --repeat 3

You can specify the number of different random seeds to repeat via --repeat.

1.3 Understand the results. Experimental results will be automatically saved in directory run/results/${CONFIG_NAME}/; in the example above, it is run/results/example/. Results for different random seeds will be saved in different subdirectories, such as run/results/example/2. The aggregated results over all the random seeds are automatically generated into run/results/example/agg, including the mean and standard deviation _std for each metric. Train/val/test results are further saved into subdirectories, such as run/results/example/agg/val; here, stats.json stores the results after each epoch aggregated across random seeds, best.json stores the results at the epoch with the highest validation accuracy.

2 Run a batch of GNN experiments

A full example is specified in run/run_batch.sh.

2.1 Specify a base file. GraphGym supports running a batch of experiments. To start, a user needs to select a base architecture --config. The batch of experiments will be cr

Related Skills

View on GitHub
GitHub Stars1.9k
CategoryDesign
Updated17h ago
Forks198

Languages

Python

Security Score

80/100

Audited on Mar 24, 2026

No findings