SkillAgentSearch skills...

GraphGenerator

Awesome Deep Graph Generator

Install / Use

/learn @AI4Risk/GraphGenerator
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

DeepGraphGenerator

Awesome Deep Graph Generator

Source codes implementation of papers:

  • BTGAE: Divide and Conquer: A Topological Heterogeneity-based Framework for Scalable and Realistic Graph Generation. (Official PyTorch Implementation)
  • VRDAG: Efficient Dynamic Attributed Graph Generation, in ICDE 2025.
  • TGAE: Efficient Learning-based Graph Simulation for Temporal Graphs, in ICDE 2025.
  • CPGAE: Efficient Learning-based Community-Preserving Graph Generation, in ICDE 2022. (GAE version of CPGAN)

Implementation of baselines:

  • GraphRNN: GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models, in ICML 2018.

Usage

Data processing

  1. Run python experiment/preprocess.py to process a graph into a sparse matrix and save the matrix as an .npz file or .pkl file.

Training

To test implementations of the methods, run

python main.py --method BTGAE
python main.py --method VRDAG
python main.py --method TGAE
python main.py --method CPGAE
python main.py --method GraphRNN
python main.py --method GraphRNN-S

All predefined configurations are in config/. Dynamically override parameters using:

python train.py --method <name> --update <key1>=<value1> <key2>=<value2> ...

Key values shall exactly match (case-sensitive) the corresponding parameters defined in configuration files.

Examples:

python main.py --method BTGAE --update data=cora epochs=150 learning_rate=3e-3

python main.py --method TGAE --update epochs=100 lr=1e-3

Evaluaion

The evaluation tools are located in the experiment/graph_metrics directory. For usage instructions, see README.md.

Data Description

The following datasets are mainly from linqs and snap.

| Data | #Nodes | #Edges | $d_{mean}$ | GINI | PWE | | -------- | ------- | --------- | ---------- | ----- | ----- | | citeseer | 3,327 | 4,732 | 2.774 | 0.435 | 2.420 | | cora | 2,708 | 5,429 | 3.898 | 0.405 | 1.932 | | pubmed | 19,717 | 44,338 | 4.496 | 0.604 | 2.176 | | Epinions | 75,879 | 508,837 | 10.694 | 0.805 | 2.026 | | google | 875,713 | 5,105,039 | 9.871 | 0.587 | 1.617 | | YelpChi | 45,954 | 3,846,979 | 167.427 | 0.322 | 1.205 |

$d_{mean}$: mean degree.

GINI: GINI index, which is a common measure for inequality in a degree distribution.

PWE: power-law exponent.

The following temporal network datasets are from snap and Network Repository. For more information, please refer to the VRDAG paper.

| Data | #Nodes | #Edges | T | | ------------- | ------ | ------- | ---- | | Emails-DNC | 1,891 | 39,264 | 14 | | Bitcoin-Alpha | 3,783 | 24,186 | 37 | | Wiki-Vote | 7,115 | 103,689 | 43 |

Test Result

The performance of models tested on static graph datasets are listed as follows:

| Dataset | Method | Deg. dist. | Clus. dist. | Wedge count | Triangle count | LCC | PLE | Gini | Clus. coef. | | :------: | :----: | :--------: | :---------: | :---------: | :------------: | :----: | :----: | :----: | :---------: | | Citeseer | CPGAE | 0.0035 | 0.0124 | 0.0812 | 0.0274 | 0.066 | 0.003 | 0.0026 | 0.1182 | | | BTGAE | 0.0029 | 0.0119 | 0.0402 | 0.335 | 0.0085 | 0.0711 | 0.0208 | 0.2473 | | Cora | CPGAE | 0.004 | 0.0088 | 0.1333 | 0.056 | 0.0193 | 0.0026 | 0.0252 | 0.2161 | | | BTGAE | 0.0061 | 0.0078 | 0.0026 | 0.054 | 0.0129 | 0.0028 | 0.0113 | 0.0376 | | Pubmed | CPGAE | 0.0156 | 0.0144 | 0.423 | 0.2835 | 0 | 0.0807 | 0.0994 | 1.2245 | | | BTGAE | 0.0164 | 0.0093 | 0.227 | 0.1541 | 0.0512 | 0.1233 | 0.0737 | 0.0307 | | Epinions | CPGAE | 0.0175 | 0.0362 | 0.6739 | 0.3723 | 0 | 0.1768 | 0.1058 | 0.9246 | | | BTGAE | 0.0189 | 0.0265 | 3.3051 | 3.3247 | 0.0538 | 0.1394 | 0.0597 | 0.0104 | | YelpChi | CPGAE | 0.0236 | 0.02 | 0.1487 | 0.133 | 0.0012 | 0.0416 | 0.3062 | 0.0186 | | | BTGAE | 0.0286 | 0.0396 | 0.0361 | 0.6578 | 0.0057 | 0.0037 | 0.1709 | 0.6429 |

The second table shows the test results of dynamic graph generation methods.

| Dataset | Method | Deg. dist. | Clus. dist. | Wedge count | Triangle count | LCC | PLE | Gini | Clus. coef. | | :-----------: | :----: | :--------: | :---------: | :---------: | :------------: | :-----: | :----: | :----: | :---------: | | Emails-DNC | VRDAG | 0.0084 | 0.0473 | 0.8118 | 0.8705 | 1.5973 | 0.0735 | 0.0837 | 0.2465 | | | TGAE | 0.0027 | 0.0144 | 0.4768 | 0.4273 | 0.2105 | 0.1353 | 0.0199 | 0.2641 | | Bitcoin-Alpha | VRDAG | 0.0052 | 0.0121 | 0.8572 | 0.7422 | 0.8909 | 0.1263 | 0.0652 | 0.9037 | | | TGAE | 0.0019 | 0.0035 | 0.3014 | 0.3636 | 0.2712 | 0.2009 | 0.0096 | 0.5274 | | Wiki-Vote | VRDAG | 0.0404 | 0.0656 | 4.2868 | 0.8584 | 12.2631 | 0.1463 | 0.3611 | 0.9466 | | | TGAE | 0.0037 | 0.0075 | 0.2768 | 0.2843 | 0.1786 | 0.4262 | 0.0024 | 0.1824 |

The evaluation metrics used in these tests are as follows:

  • Deg. dist.: It measures the degree distribution similarity between the generated graph and the real graph using Maximum Mean Discrepancy (MMD). A smaller MMD value indicates a closer degree distribution.

  • Clus. dist.: This metric focuses on the clustering coefficient distribution similarity using MMD.

  • Wedge count: It counts the number of wedges in the graph.

  • Triangle count: The number of triangles in the graph is counted.

  • LCC: It represents the size of the largest connected component in the graph.

  • PLE: It is the power-law exponent associated with the degree distribution of the graph.

  • Gini: GINI index, which is a common measure for inequality in a degree distribution.

  • Clus. coef.: The global clustering coefficient of the graph.

For temporal graphs, given a metric $f_m(\cdot)$, the real graph $\widetilde{G}$, and the synthetic one $\widetilde{G^{\prime}}$, we construct a sequence of snapshots $\widetilde{S}^t$ ($\widetilde{S^{\prime}}^t$), $t = 1, ...,T$, of $\widetilde{G}$ ($\widetilde{G^{\prime}}$) by aggregating edges from the initial timestamp to the current timestamp $t$. Then, we measure the average difference (in percentage) of the given metric $f_m(\cdot)$ between two graphs as follows:

$$ f_{avg}(\widetilde{G},\widetilde{G^{\prime}},f_m)=Mean_{t=1:T}(|\frac{f_m(\widetilde{S}^t)-f_m(\widetilde{S^{\prime}}^t)}{f_m(\widetilde{S}^t)}|) $$

Repo Structure

The repository is organized as follows:

  • main.py: organize all models.
  • experiment/: preprocessing and evaluation.
  • methods/: implementations of models.
  • config/: configuration files for different models.
  • models/: the checkpoints or the trained models for each method.
  • data/: dataset files.
  • requirements.txt: package dependencies.

Requirements

torch           2.3.1+cu121
networkx        2.8
scipy           1.14.1
scikit-learn    1.5.2
numpy           1.26.4
community       1.0.0b1
python-louvain  0.16
dgl             2.4.0+cu121

Contributors :

<a href="https://github.com/AI4Risk/GraphGenerator/graphs/contributors"> <img src="https://contrib.rocks/image?repo=AI4Risk/GraphGenerator" /></a>

Citing

If you find GraphGenerator is useful for your research, please consider citing the following papers:

@inproceedings{li2025efficient,
  title={Efficient dynamic attributed graph generation},
  author={Li, Fan and Wang, Xiaoyang and Cheng, Dawei and Chen, Cong and Zhang, Ying and Lin, Xuemin},
  booktitle={2025 IEEE 41st International Conference on Data Engineering (ICDE)},
  pages={1415--1428},
  year={2025},
  organization={IEEE}
}

@inproceedings{xiang2025efficient,
  title={Efficient learning-based graph simulation for temporal graphs},
  author={Xiang, Sheng and Xu, Chenhao and Cheng, Dawei and Wang, Xiaoyang and Zhang, Ying},
  booktitle={2025 IEEE 41st International Conference on Data Engineering (ICDE)},
  pages={251--264},
  year={2025},
  organization={IEEE}
}

@inproceedings{xiang2022efficient,
  title={Efficient learning-based community-preserving graph generation},
  author={Xiang, Sheng and Cheng, Dawei and Zhang, Jianfu and Ma, Zhenwei and Wang, Xiaoyang and Zhang, Ying},
  booktitle={2022 IEEE 38th International Conference on Data Engineering (ICDE)},
  pages={1982--1994},
  year={2022},
  organization={IEEE}
}

@article{xiang2022general,
  title={General graph generators: experiments, analyses, and improvements},
  author={Xiang, Sheng and Wen, Dong and Cheng, Dawei and Zhang, Ying and Qin, Lu and Qian, Zhengping and Lin, Xuemin},
  journal={The VLDB Journal},
  pages={1--29},
  year={2022},
  publisher={Springer}
}

Related Skills

View on GitHub
GitHub Stars8
CategoryDevelopment
Updated2mo ago
Forks0

Languages

Python

Security Score

85/100

Audited on Jan 29, 2026

No findings