SkillAgentSearch skills...

CBGBench

Official code repository of < CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph >

Install / Use

/learn @EDAPINENUT/CBGBench
About this skill

Quality Score

0/100

Category

Design

Supported Platforms

Universal

README

CBGBench

<p align="center"> <img src="docs/cbg_logo.png" width="400" class="center" alt="CBGBench Logo"/> <br/> </p>

<a href="https://pytorch.org/get-started/locally/"><img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch&logoColor=white"></a> Paper DeepModeling <a href="https://img.shields.io/github/stars/EDAPINENUT/CBGBench" alt="arXiv"> <img src="https://img.shields.io/github/stars/EDAPINENUT/CBGBench" /></a>

</p>

CBGBench: Complex Binding Graph Benchmark is a benchmark for generative target-aware molecule design.

Paper [DiffBP] | Paper [D3FG] | Benchmark

[!NOTE]

  • Feb. 2025: Accepted by ICLR2025 as spotlight presentation. Congrats! Some of pretrained checkpoints of the edge-cutting methods have been released, pls download from Google Drive for further usage.
  • Dec. 2024: We have uploaded the molecules generated by the SOTA Methods for de novo design (LiGAN, Pocket2Mol, TargetDiff, D3FG, DecomptDiff, VoxBind, and MolCraft) for evaluation. For most of them, the generated number is 100/pockets, while for some 200 (2 times) in order to check if it is necessary to report multi-seeds' results. If it is useful for your research, pls download from the Google Drive and cite the paper.
  • Oct. 2024: CBGBench paper v3 has just integrated the state-of-the-art MolCraft and VoxBind into evaluation, and the integration of their codes is in process.
  • Sep. 2024: CBGBench has joined the DeepModeling community, a community devoted to AI for science, as a sandbox-level project. Learn more about DeepModeling

Introduction

This is the official code repository of the paper 'CBGBench: Fill in the Blank of Protein-Molecule Binding Graph', which aims to unify target-aware molecule design with single code implementation. Until now, we have included 7 methods as shown below: | Model | Paper link | Github | |------------|---------------------------------------------|-------------------------------------------| | Pocket2Mol | https://arxiv.org/abs/2205.07249 | https://github.com/pengxingang/Pocket2Mol | | GraphBP | https://arxiv.org/abs/2204.09410 | https://github.com/divelab/GraphBP | | DiffSBDD | https://arxiv.org/abs/2210.13695 | https://github.com/arneschneuing/DiffSBDD | | DiffBP | https://arxiv.org/abs/2211.11214 | Here is the official implementation. | | TargetDiff | https://arxiv.org/abs/2303.03543 | https://github.com/guanjq/targetdiff | | FLAG | https://openreview.net/forum?id=Rq13idF0F73 | https://github.com/zaixizhang/FLAG | | D3FG | https://arxiv.org/abs/2306.13769 | Here is the official implementation. |

These models are initially established for de novo molecule generation, and we extend more tasks including linker design, fragment growing, scaffold hopping, and side chain decoration.

<p align="center"> <img src="scripts/tasks.jpg" width="600" class="center" alt="Extend Tasks"/> <br/> </p>

Installation

Create environment with basic packages.

conda env create -f environment.yml
conda activate cbgbench

Install pytorch and torch_geometric

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pyg pytorch-scatter pytorch-cluster -c pyg

Install tools for chemistry

# install rdkit, efgs, obabel, etc.
pip install --use-pep517 EFGs
pip install biopython
pip install lxml
conda install rdkit openbabel tensorboard tqdm pyyaml easydict python-lmdb -c conda-forge

# install plip
mkdir tools
cd tools
git clone https://github.com/pharmai/plip.git
cd plip
python setup.py install
alias plip='python plip/plip/plipcmd.py'
cd ..

### Note that if there is an error in setup.py, it can be ignored as long as openbabel is installed.

# install docking tools
conda install -c conda-forge numpy swig boost-cpp sphinx sphinx_rtd_theme
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
pip install meeko==0.1.dev3 scipy pdb2pqr vina

# If you are unable to install vina, you can try: conda install vina
# If you encounter the following error:
# ImportError: libtiff.so.5: cannot open shared object file: No such file or directory
# Please try the following steps to resolve it:
pip uninstall pillow
pip install pillow

Prepare Dataset

CrossDocked2020

1. Download processed datasets

(i) Download the processed dataset from Google Drive. For each dataset, it corresponds to different methods and tasks. Please refer to the Table below to find the processed data that you need to use. Note that to train D3FG, it is a two-stage model that firstly generates functional groups and then generates linker atoms, so we named the two-stage model 'D3FG_fg' and 'D3FG_linker', and the training data is different for them.

Table: The data file name corresponding to different methods and tasks. | model | task | file name | |-------------|------------|---------------------------------------------------------------------------| | Pocket2Mol | de novo | data/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb | | GraphBP | de novo | data/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb | | DiffSBDD | de novo | data/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb | | DiffBP | de novo | data/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb | | TargetDiff | de novo | data/pl/crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb | | FLAG | de novo | data/pl_arfg/crossdocked_v1.1_rmsd1.0_pocket10_processed_arfuncgroup.lmdb | | D3FG | de novo | data/pl_fg/crossdocked_v1.1_rmsd1.0_pocket10_processed_funcgroup.lmdb | | Pocket2Mol | fragment | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb | | GraphBP | fragment | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb | | DiffSBDD | fragment | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb | | DiffBP | fragment | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb | | TargetDiff | fragment | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb | | Pocket2Mol | linker | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb | | GraphBP | linker | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb | | DiffSBDD | linker | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb | | DiffBP | linker | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb | | TargetDiff | linker | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb | | Pocket2Mol | scaffold | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb | | GraphBP | scaffold | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb | | DiffSBDD | scaffold | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb | | DiffBP | scaffold | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb | | TargetDiff | scaffold | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb | | Pocket2Mol | side chain | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb | | GraphBP | side chain | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb | | DiffSBDD | side chain | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb | | DiffBP | side chain | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb | | TargetDiff | side chain | data/pl_decomp/crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb |

(ii) Copy the datasets to ./data dir, in which the complete data file directory will look like

- CGBBench
    - data
        - pl
            - crossdocked_name2id.pt
            - crossdocked_v1.1_rmsd1.0_pocket10_processed_fullatom.lmdb
            - crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb
        - pl_arfg
            - crossdocked_name2id_arfuncgroup.pt
            - crossdocked_v1.1_rmsd1.0_pocket10_processed_arfuncgroup.lmdb
        - pl_decomp
            - crossdocked_name2id_frag.pt
            - crossdocked_name2id_linker.pt
            - crossdocked_name2id_scaffold.pt
            - crossdocked_name2id_sidechain.pt
            - crossdocked_v1.1_rmsd1.0_pocket10_processed_frag.lmdb
            - crossdocked_v1.1_rmsd1.0_pocket10_processed_linker.lmdb
            - crossdocked_v1.1_rmsd1.0_pocket10_processed_scaffold.lmdb
            - crossdocked_v1.1_rmsd1.0_pocket10_processed_sidechain.lmdb
        - pl_fg
            - crossdocked_name2id_funcgroup.pt
            - crossdocked_v1.1_rmsd1.0_pocket10_processed_funcgroup.lmdb

2. Download raw data and prepare from scratch

(i) Download crossdocked_v1.1_rmsd1.0.tar.gz from TargetDiff Drive, and copy it to ./raw_data/ with

mkdir raw_data
tar -xzvf crossdocked_v1.1_rmsd1.0.tar.gz ./raw_data

(ii) Run the following:

python ./scripts/extract_pockets.py --source raw_data/crossdocked_v1.1_rmsd1.0 --dest raw_data/crossdocked_v1.1_rmsd1.0_pocket10

(iii) In training, the Dataset will be prepared for each task. Pleas

Related Skills

View on GitHub
GitHub Stars326
CategoryDesign
Updated3d ago
Forks38

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings