TamGent
Tailoring Molecules for Protein Pockets: a Transformer-based Generative Solution for Structured-based Drug Design
Install / Use
/learn @HankerWu/TamGentREADME
TamGent
Tailoring Molecules for Protein Pockets: a Transformer-based Generative Solution for Structured-based Drug Design
Introduction
Code base: fairseq-v0.8.0
Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.
Installation
git clone https://github.com/HankerWu/TamGent.git
cd TamGent
git checkout main
conda create -n TamGent python=3.7 -y
conda activate TamGent
conda install rdkit -c conda-forge -y
python -m pip install -e .[chem]
Dataset
The dataset is available at data.
Build customized dataset
You can build your customized dataset through the following methods:
-
Build customized dataset based on pdb ids, the script will automatically find the binding sites according to the ligands in the structure file.
python scripts/build_data/prepare_pdb_ids.py ${PDB_ID_LIST} ${DATASET_NAME} -o ${OUTPUT_PATH} -t ${threshold}PDB_ID_LISTformat: CSV format with columns ([] means optional):pdb_id,[ligand_inchi,uniprot_id] -
Build customized dataset based on pdb ids using the center coordinates of the binding site of each pdb.
python scripts/build_data/prepare_pdb_ids_center.py ${PDB_ID_LIST} ${DATASET_NAME} -o ${OUTPUT_PATH} -t ${threshold}PDB_ID_LISTformat: CSV format with columns ([] means optional):pdb_id, center_x, center_y, center_z, [uniprot_id] -
Build dataset from PDB ID list using the residue ids(indexes) of the binding site of each pdb.
python scripts/build_data/prepare_pdb_ids_res_ids.py ${PDB_ID_LIST} ${DATASET_NAME} -o ${OUTPUT_PATH} --res-ids-fn ${RES_IDS_FN}PDB_ID_LISTformat: CSV format with columns ([] means optional):pdb_id,[uniprot_id]RES_IDS_FNformat: residue ids filename, a dict like:{ 0: { chain_id_A: Array[res_id_A1, res_id_A2, ...], chain_id_B: Array[res_id_B1, res_id_B2, ...], ... }, 1: { ... }, ... }stored as pickle file. The order is the same as
PDB_ID_LIST.For customized pdb strcuture files, you can put your structure files to the
--pdb-pathfolder, and in thePDB_ID_LISTcsv file, put the filenames in thepdb_idcolumn.
Model
The pretrained model is available at model.
Run scripts
# train a new model
bash scripts/train.sh -D ${DATA_PATH} --savedir ${SAVED_MODEL_PATH}
# generate molecules
bash scripts/generate.sh -b ${BEAM_SIZE} -s ${SEED} -D ${DATA_PATH} --dataset ${TESTSET_NAME} --ckpt ${MODEL_PATH} --savedir ${OUTPUT_PATH}
Citation
Please cite as:
@inproceedings{TamGent,
title = {Tailoring Molecules for Protein Pockets: A Transformer-based Generative Solution for Structured-based Drug Design},
author = {Kehan Wu, Yingce Xia, Yang Fan, Pan Deng, Lijun Wu, Shufang Xie, Tong Wang, Haiguang Liu, Tao Qin and Tie-Yan Liu},
year = {2022},
}
Related Skills
diffs
337.4kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
1.8kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
ui-ux-pro-max-skill
51.9kAn AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
