CogDL
CogDL: A Comprehensive Library for Graph Deep Learning (WWW 2023)
Install / Use
/learn @THUDM/CogDLREADME
Homepage | Paper | Documentation | Discussion Forum | Dataset | 中文
CogDL is a graph deep learning toolkit that allows researchers and developers to easily train and compare baseline or customized models for node classification, graph classification, and other important tasks in the graph domain.
We summarize the contributions of CogDL as follows:
- Efficiency: CogDL utilizes well-optimized operators to speed up training and save GPU memory of GNN models.
- Ease of Use: CogDL provides easy-to-use APIs for running experiments with the given models and datasets using hyper-parameter search.
- Extensibility: The design of CogDL makes it easy to apply GNN models to new scenarios based on our framework.
❗ News
-
The CogDL paper was accepted by WWW 2023. Find us at WWW 2023! We also release the new v0.6 release which adds more examples of graph self-supervised learning, including GraphMAE, GraphMAE2, and BGRL.
-
A free GNN course provided by CogDL Team is present at this link. We also provide a discussion forum for Chinese users.
-
The new v0.5.3 release supports mixed-precision training by setting \textit{fp16=True} and provides a basic example written by Jittor. It also updates the tutorial in the document, fixes downloading links of some datasets, and fixes potential bugs of operators.
-
The new v0.5.2 release adds a GNN example for ogbn-products and updates geom datasets. It also fixes some potential bugs including setting devices, using cpu for inference, etc.
-
The new v0.5.1 release adds fast operators including SpMM (cpu version) and scatter_max (cuda version). It also adds lots of datasets for node classification which can be found in this link. 🎉
-
The new v0.5.0 release designs and implements a unified training loop for GNN. It introduces
DataWrapperto help prepare the training/validation/test data andModelWrapperto define the training/validation/test steps. 🎉 -
The new v0.4.1 release adds the implementation of Deep GNNs and the recommendation task. It also supports new pipelines for generating embeddings and recommendation. Welcome to join our tutorial on KDD 2021 at 10:30 am - 12:00 am, Aug. 14th (Singapore Time). More details can be found in https://kdd2021graph.github.io/. 🎉
-
The new v0.4.0 release refactors the data storage (from
DatatoGraph) and provides more fast operators to speed up GNN training. It also includes many self-supervised learning methods on graphs. BTW, we are glad to announce that we will give a tutorial on KDD 2021 in August. Please see this link for more details. 🎉 -
CogDL supports GNN models with Mixture of Experts (MoE). You can install FastMoE and try MoE GCN in CogDL now!
-
The new v0.3.0 release provides a fast spmm operator to speed up GNN training. We also release the first version of CogDL paper in arXiv. You can join our slack for discussion. 🎉🎉🎉
-
The new v0.2.0 release includes easy-to-use
experimentandpipelineAPIs for all experiments and applications. TheexperimentAPI supports automl features of searching hyper-parameters. This release also providesOAGBertAPI for model inference (OAGBertis trained on large-scale academic corpus by our lab). Some features and models are added by the open source community (thanks to all the contributors 🎉). -
The new v0.1.2 release includes a pre-training task, many examples, OGB datasets, some knowledge graph embedding methods, and some graph neural network models. The coverage of CogDL is increased to 80%. Some new APIs, such as
TrainerandSampler, are developed and being tested. -
The new v0.1.1 release includes the knowledge link prediction task, many state-of-the-art models, and
optunasupport. We also have a Chinese WeChat post about the CogDL release.
Getting Started
Requirements and Installation
- Python version >= 3.7
- PyTorch version >= 1.7.1
Please follow the instructions here to install PyTorch (https://github.com/pytorch/pytorch#installation).
When PyTorch has been installed, cogdl can be installed using pip as follows:
pip install cogdl
Install from source via:
pip install git+https://github.com/thudm/cogdl.git
Or clone the repository and install with the following commands:
git clone git@github.com:THUDM/cogdl.git
cd cogdl
pip install -e .
Usage
API Usage
You can run all kinds of experiments through CogDL APIs, especially experiment. You can also use your own datasets and models for experiments.
A quickstart example can be found in the quick_start.py. More examples are provided in the examples/.
from cogdl import experiment
# basic usage
experiment(dataset="cora", model="gcn")
# set other hyper-parameters
experiment(dataset="cora", model="gcn", hidden_size=32, epochs=200)
# run over multiple models on different seeds
experiment(dataset="cora", model=["gcn", "gat"], seed=[1, 2])
# automl usage
def search_space(trial):
return {
"lr": trial.suggest_categorical("lr", [1e-3, 5e-3, 1e-2]),
"hidden_size": trial.suggest_categorical("hidden_size", [32, 64, 128]),
"dropout": trial.suggest_uniform("dropout", 0.5, 0.8),
}
experiment(dataset="cora", model="gcn", seed=[1, 2], search_space=search_space)
Command-Line Usage
You can also use python scripts/train.py --dataset example_dataset --model example_model to run example_model on example_data.
- --dataset, dataset name to run, can be a list of datasets with space like
cora citeseer. Supported datasets include 'cora', 'citeseer', 'pumbed', 'ppi', 'wikipedia', 'blogcatalog', 'flickr'. More datasets can be found in the cogdl/datasets. - --model, model name to run, can be a list of models like
gcn gat. Supported models include 'gcn', 'gat', 'graphsage', 'deepwalk', 'node2vec', 'hope', 'grarep', 'netmf', 'netsmf', 'prone'. More models can be found in the cogdl/models.
For example, if you want to run GCN and GAT on the Cora dataset, with 5 different seeds:
python scripts/train.py --dataset cora --model gcn gat --seed 0 1 2 3 4
Expected output:
| Variant | test_acc | val_acc | |------------------|----------------|----------------| | ('cora', 'gcn') | 0.8050±0.0047 | 0.7940±0.0063 | | ('cora', 'gat') | 0.8234±0.0042 | 0.8088±0.0016 |
If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.
❗ FAQ
<details> <summary> How to contribute to CogDL? </summary> <br/>If you have a well-performed algorithm and are willing to implement it in our toolkit to help more people, you can first open an issue and then create a pull request, detailed information can be found here.
Before committing your modification, please first run pre-commit install to setup the git hook for checking code format and style using black and flake8. Then the pre-commit will run automatically on git commit! Detailed information of pre-commit can be found here.
If you want to run parallel experiments on your server with multiple GPUs on multiple models, GCN and GAT, on the Cora dataset:
$ python scripts/train.py --dataset cora --model gcn gat --hid

