Rag2Mol
Structure-based drug design based on Retrieval Augmented Generation
Install / Use
/learn @CQ-zhang-2016/Rag2MolREADME
Retrieval Augmented Structure-based Drug Design (Rag2Mol)
About
This directory contains the code and resources of the following paper:
<i>"Structure-based drug design based on Retrieval Augmented Generation ". Oral in RECOMB 25. Publishen on Briefings in Bioinformatics. </i>
- Rag2Mol is a structure-based drug design (SBDD) model using retrieval augmented generation (RAG). It uses a two-level retriever and an augmented autoregressive SBDD to generate molecule within targeting protein pocket.
- The default network is a well-recognized GVP-based autoregressive network [1]. This software also supported user-defined network.
- Our experimental results are based on CrossDock data. A collection of pre-processed CrossDock dataset can be obtained following (/data/README)
- Please contact Peidong Zhang zpd24@mails.tsinghua.edu.cn if you have issues using this software.
Overview of the Model
We introduced the Rag2Mol algorithm to address the prevalent issue of non-synthesizable compounds in SBDD models. Our solution involves retrieving reference molecules to guide the AI model for more accurate navigation within a broad search space. Additionally, by using the generated molecules as templates for similarity search, we overcome the limitations of traditional virtual screening methods.
<p align="center"> <img src="figure/drug_new.png" width="600" height="800" > </p>Step 1. Construct a pocket-specific database
Briefly, given target pocket, we use a pre-trained global retriever to search for small molecules with potential affinity and dock them into the pocket (Figure a). The advantage of this workflow is that the retrieved molecules have both potential interaction affinity and synthetic accessibility, implicitly assisting the AI model in learning structural knowledge and topological rules.
Step 2. Guide network generation based on molecular retriever
During the generation, the molecular retriever is to rank and choose reference molecule from pocket-specific database as context information (Figure b). The message-passing module would aggregate information from reference molecule to generated molecular fragment through cross-KNN graph (Figure e).
Step 3. Screen according to preset biochemical indicators
We subject the filtered drug candidates to precise binding affinity calculations and subsequent wet-lab experiments. A set of criteria for reference: $Vina\in[−20, −5], QED\in[0.5, 2], SA\in[0.5, 2], Lipinski\in[4, 5], LogP\in[0, 4]$.
Step 4. Similarity search based on AI-generated molecules (Rag2Mol-R)
We then randomly select representative molecule as scaffold template from each molecular cluster. Based on these templates, we search for the similar molecules within existing synthesizable compounds.
For further details, see Online Methods of our paper.
Sub-directories
- [src] contains impelmentation of Rag2Mol used for the CrossDock dataset.
train.pyandsample4pdb.pyinclude the training and inference of Rag2Mol,rag2mol_r.pyis the function to run Rag2Mol_R. - [data] contains the pre-processed CrossDock data and retrieval databases which can be used to reproduce our results.
Data & Parameters
- Data: We provide pre-processed CrossDock dataset, which is widely used in SBDD field. This should be sufficient for reproducing our results. Please refer to
datafor detailed explanation and download. - Parameters: We provide the paramters of pre-trained two-level retriever, and you could directly predict by correctly downloading the Rag2Mol checkpoint. Please refer to
parametersfor detailed explanation and download.
Important Note: all data is for research-purpose only.
<br>Code Usage
Following these steps to run the code:
- Clone the Rag2Mol repo.
- Install the required packages. Please refer to
rag2mol.ymland we recommendconda env create -f rag2mol.yml. - Download the corresponding
dataand modelparametersand place them in the correct paths. - Run
python train.py'to train the Rag2Mol, ortaskset -c 1 python sample4pdb.pyfor inference. For Rag2Mol-R, please ensure that theresultsfolder has been automatically created before runningpython rag2mol_r.py.
License
Rag2Mol is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0
Reference
@article{zhang2025rag2mol,
title={Rag2Mol: structure-based drug design based on retrieval augmented generation},
author={Zhang, Peidong and Peng, Xingang and Han, Rong and Chen, Ting and Ma, Jianzhu},
journal={Briefings in Bioinformatics},
volume={26},
number={3},
year={2025},
publisher={Oxford Academic}
}
[1]. Peng, Xingang, et al. "Pocket2mol: Efficient molecular sampling based on 3d protein pockets." International Conference on Machine Learning. PMLR, 2022.
Related Skills
diffs
337.3kUse the diffs tool to produce real, shareable diffs (viewer URL, file artifact, or both) instead of manual edit summaries.
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
1.8kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
HappyColorBlend
HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to
