DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting

This repository is the official implementation of the paper: DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting. Please consider staring us if you find it interesting.

The paper is accepted by WSDM'24. You can view our project page.

DeSCo workflow

Code Structure

main.py is the implementation of DeSCo.

subgraph_counting contains all the modules needed by python scripts.

baseline.py is the implementation of two neural baselines (DIAMNet and LRP) that is compared with DeSCo in the paper. ablation_gnns.py is used for the ablation study of the expressive power of SHMP. It implements other expressive GNNs. ablation_wo_canonical.py is used for the ablation study of canonical partition. It implements DeSCo's neighborhood counting stage without canonical partition.

Requirements

Python >= 3.9

To install requirements:

pip install -r requirements.txt

Pre-trained Models

The neighborhood counting and gossip propagation model in our paper is trained on our synthetic dataset. Users can download our pre-trained model from here

Evaluation

To evaluate the trained models on real-world datasets, please run the following command:

python main.py --test_dataset COX2 --neigh_checkpoint ckpt/{checkpoint_path}/neigh/{model_name}.ckpt --gossip_checkpoint ckpt/{checkpoint_path}/gossip/{model_name}.ckpt --test_gossip

The above command gives an example of evaluating the trained models on COX2. The path of checkpoints should be replaced by the real path of your trained model checkpoints.

The code comes with analysis methods in subgraph_counting/workload.py, which outputs the inference count of the model. Users should be able to get any desired metrics with these count easily.

Train from Scratch

Alternatively, if you wish to train your own model instead of using our pre-trained version, here are the instructions you may need.

Dataset

To benefit future research, we release the large synthetic dataset with subgraph count ground-truth that we used in our pre-trained model. Users can download the dataset zip file from here and move the unziped folder under DeSCo/data/ to train from scratch.

Code and configurations

If you desire to train with the official configuration of DeSCo, simply run this command:

python main.py --train_dataset Syn_1827 --valid_dataset Syn_1827 --test_dataset MUTAG --train_neigh --train_gossip --test_gossip

To train the model(s) in the paper with other configurations, please specifies the parameters in the command.

The bool parameters train_neigh, train_gossip, and test_gossip, determine whether to train and to test the neighborhood counting and gossip propagation model.

Please refer to the Appendix for the detailed training parameters.

Citation

If you find our work useful, please consider citing:

@inproceedings{fu2024desco,
  title={DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting},
  author={Fu, Tianyu and Wei, Chiyue and Wang, Yu and Ying, Rex},
  booktitle={Proceedings of the 17th ACM International Conference on Web Search and Data Mining},
  pages={218--227},
  year={2024}
}

Contributing

Welcome to use the code or contribute to the project!

DeSCo

Install / Use

README