DDGScan
DDGScan: an integrated parallel workflow for the in silico point mutation scan of protein
Install / Use
/learn @JinyuanSun/DDGScanREADME
DDGScan: an integrated parallel workflow for the in silico point mutation scan of protein
Table of Contents
I am testing this repo with some different input structures, if you encountered any failure please post a issue.
The GUI plugin for FoldX
GUI only work for FoldX.
Installation
To ensure successful usage of our tool, please make sure you have added the FoldX executable to your environment. Additionally, for cartesian_ddg calculations in slow mode, or ddg_monomer row1 protocol in fast mode, Rosetta is required (note: mpi build is necessary or relax step will be skipped). ABACUS is an excellent software option for protein design, providing a great statistical energy function. Please be aware that structures downloaded from RCSB may contain errors, which can directly affect energy calculations - one common issue is breaks in chains. To address this, we have implemented a loop closure module using modeller, a reliable software option with a long history, as a backend. However, please note that due to their licenses, we cannot redistribute these programs. On the bright side, openmm is open source! And we have good news - the ABACUS2 database is now available at https://zenodo.org/record/4533424. Please note that the necessary module is not available in the Zenodo version, so you may use the online server at https://biocomp.ustc.edu.cn/servers/abacus-design.php to run ABACUS2.
Install DDGScan:
To ensure that there are no possible conflicts, it is recommended that you create a new conda environment. Additionally, using the mamba package manager will result in faster installation times. To create a new conda environment for DDGScan, you can use the following commands:
conda create -n ddgscan python=3.9
conda activate ddgscan
Once the new environment is activated, you can install mamba and other required packages using the following commands:
conda install -c conda-forge mamba
mamba install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia
mamba install -c conda-forge openmm pdbfixer
Next, you can clone the DDGScan repository and install the required Python packages using the following commands:
git clone https://github.com/JinyuanSun/DDGScan.git
pip install pandas numpy joblib seaborn matplotlib venn logomaker mdtraj bio scikit-learn
python setup.py install
Finally, you can create a cache directory for DDGScan and copy some necessary data files using the following command:
mkdir ~/.cache/ddgscan && cp utils/data/nn/* ~/.cache/ddgscan/
To ensure that DDGScan is installed properly and working correctly, you can run the following command and confirm that the help message is displayed:
DDGScan -h
FoldX:
Register and download the executable.
Rosetta:
Follow the Rosetta document
I will recommend that users export ROSETTADB before runing grape-fast.py by appending this into ~/.bashrc:
export ROSETTADB="/path/to/rosetta/database"
ABACUS1/2:
Send email to the authors for source code.
Modeller:
Get the Modeller license key at https://salilab.org/modeller/registration.html
export KEY_MODELLER=<your_key>
conda config --add channels salilab
conda install modeller
Usage
Grape phase I
There are many options available for DDGScan users, particularly for those who know what they want. Here is a quick walk-through of some important options:
pdbandchainare positional arguments that must be set, depending on the input PDB file you want to analyze.- The
-Eflag must be set according to the software you have installed on your operating system. - It is strongly recommended that users set the
-seqflag to provide sequence information for the input PDB file. - For best performance, it is highly recommended to add the
-MDflag and use-P CUDAif a powerful GPU is available (e.g., better than an RTX2060). This will be much faster than using a 48-core CPU. - If the
-fillflag is used, the input structure will be automatically fixed using information from the SEQRES record in the native PDB file downloaded from RCSB using modeller. The model with the lowestmolpdfenergy will be used for further analysis.
usage: DDGScan grape_phaseI [-h] [-fill] [-seq SEQUENCE] [-T THREADS] [-fc FOLDX_CUTOFF] [-rc ROSETTA_CUTOFF] [-ac ABACUS_CUTOFF] [-a2c ABACUS2_CUTOFF] [-nstruct RELAX_NUMBER] [-nruns NUMOFRUNS]
[-E {abacus,foldx,rosetta,abacus2,abacus2_nn} [{abacus,foldx,rosetta,abacus2,abacus2_nn} ...]] [-M {run,rerun,analysis,test}] [-S {fast,slow}] [-MD] [-P {CUDA,CPU}] [-fix_mm]
pdb chain
positional arguments:
pdb Input PDB
chain Input PDB Chain to do in silico DMS
optional arguments:
-h, --help show this help message and exit
-fill, --fill_break_in_pdb
Use modeller to fill missing residues in your pdb file. Use this option with caution!
-seq SEQUENCE, --sequence SEQUENCE
The exact sequence of protein you want to design. All mutation will be named according to this sequence.
-T THREADS, --threads THREADS
Number of threads to run FoldX, Rosetta
-fc FOLDX_CUTOFF, --foldx_cutoff FOLDX_CUTOFF
Cutoff of FoldX ddg(kcal/mol)
-rc ROSETTA_CUTOFF, --rosetta_cutoff ROSETTA_CUTOFF
Cutoff of Rosetta ddg(R.E.U.)
-ac ABACUS_CUTOFF, --abacus_cutoff ABACUS_CUTOFF
Cutoff of ABACUS SEF(A.E.U.)
-a2c ABACUS2_CUTOFF, --abacus2_cutoff ABACUS2_CUTOFF
Cutoff of ABACUS2 SEF(A.E.U.)
-nstruct RELAX_NUMBER, --relax_number RELAX_NUMBER
Number of how many relaxed structure
-nruns NUMOFRUNS, --numofruns NUMOFRUNS
Number of runs in FoldX BuildModel
-E {abacus,foldx,rosetta,abacus2,abacus2_nn} [{abacus,foldx,rosetta,abacus2,abacus2_nn} ...], --engine {abacus,foldx,rosetta,abacus2,abacus2_nn} [{abacus,foldx,rosetta,abacus2,abacus2_nn} ...]
-M {run,rerun,analysis,test}, --mode {run,rerun,analysis,test}
Run, Rerun or analysis
-S {fast,slow}, --preset {fast,slow}
Fast or Slow
-MD, --molecular_dynamics
Run 1ns molecular dynamics simulations for each mutation using openmm.
-P {CUDA,CPU}, --platform {CUDA,CPU}
CUDA or CPU
-fix_mm, --fix_mainchain_missing
fixing missing backbone bone using pdbfixer
List distribute
usage: DDGScan list_distribute [-h] [-msaddg] [-fill] [-fix_mm] [-T THREADS] [-nstruct RELAX_NUMBER] [-nruns NUMOFRUNS]
[-E {foldx,rosetta,abacus2,rosetta_fast,abacus2_nn} [{foldx,rosetta,abacus2,rosetta_fast,abacus2_nn} ...]] [-repair] [-relax] [-MD] [-P {CUDA,CPU}]
pdb mutation_list_file
positional arguments:
pdb Input PDB
mutation_list_file Mutation list file, see README for details
optional arguments:
-h, --help show this help message and exit
-msaddg, --output_of_MSAddg
The format of MSAddg *.scan.txt, and there may be mismatch between your pdb and sequence
-fill, --fill_break_in_pdb
Use modeller to fill missing residues in your pdb file. Use this option with caution!
-fix_mm, --fix_mainchain_missing
fixing missing backbone bone using pdbfixer
-T THREADS, --threads THREADS
Number of threads to run FoldX, Rosetta or ABACUS2
-nstruct RELAX_NUMBER, --relax_number RELAX_NUMBER
Number of how many relaxed structure
-nruns NUMOFRUNS, --numofruns NUMOFRUNS
Number of runs in FoldX BuildModel
-E {foldx,rosetta,abacus2,rosetta_fast,abacus2_nn} [{foldx,rosetta,abacus2,rosetta_fast,abacus2_nn} ...], --engine {foldx,rosetta,abacus2,rosetta_fast,abacus2_nn} [{foldx,rosetta,abacus2,rosetta_fast,abacus2_nn} ...]
-repair, --foldx_repair
Run Repair before ddG calculation
-relax, --rosetta_relax
Run relax before ddG calculation
-MD, --molecular_dynamics
Run 1ns molecular dynamics simulations for each mutation using openmm.
-P {CUDA,CPU}, --platform {CUDA,CPU}
CUDA or CPU
Analysis and plot
usage: DDGScan analysis_and_plot [-h] [--residue_position RESIDUE_POSITION]
[--plot_type {all,venn,residue_bar,heatmap,position_avg_boxplot,variance_lineplot,kde_plot,residue_logo} [{all,venn,residue_bar,heatmap,position_avg_boxplot,variance_lineplot,kde_plot,residue_logo} ...]]
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
