RxnFlow
Synthesis-oriented GFlowNets on a large action space: "Generative Flows on Synthetic Pathway for Drug Design" (ICLR 2025)
Install / Use
/learn @SeonghwanSeo/RxnFlowREADME
RxnFlow: Generative Flows on Synthetic Pathway for Drug Design
<img src="image/overview.png" width=600>Official implementation of Generative Flows on Synthetic Pathway for Drug Design by Seonghwan Seo, Minsu Kim, Tony Shen, Martin Ester, Jinkyu Park, Sungsoo Ahn, and Woo Youn Kim. [paper]
RxnFlow are a synthesis-oriented generative framework that aims to discover diverse drug candidates through GFlowNet objective and a large action space comprising 1M building blocks and 100 reaction templates without computational overhead.
This project is based on Recursion's GFlowNet Repository; src/gflownet/ is a clone of recursionpharma/gflownet@v0.2.0.
Notice
With a collaboration with eMolecules and HITS, we developed the Hyper Screening X (HyperLab), which identifies candidate compounds from eMolecules' make-on-demand eXplore library.
We will release our in-house model architecture used in Hyper Screening X soon.
Setup
Installation
# python>=3.12,<3.13
pip install -e . --find-links https://data.pyg.org/whl/torch-2.5.1+cu121.html
# For GPU-accelerated UniDock(Vina) scoring.
conda install unidock==1.1.2
pip install -e '.[unidock]' --find-links https://data.pyg.org/whl/torch-2.5.1+cu121.html
# For Pocket conditional generation
pip install -e '.[pmnet]' --find-links https://data.pyg.org/whl/torch-2.5.1+cu121.html
# Install all dependencies
pip install -e '.[unidock,pmnet,dev]' --find-links https://data.pyg.org/whl/torch-2.5.1+cu121.html
Data Preparation
To construct datas, please follow the process in data/README.md.
Reaction Template
We provide the two reaction template sets:
- Real: We provide the 109-size reaction template set templates/real.txt from Enamine REAL synthesis protocol (Gao et al.).
- HB: The reaction template used in this paper contains 13 uni-molecular reactions and 58 bi-molecular reactions, which is constructed by Cretu et al. The template set is available under templates/hb_edited.txt.
Building Block Library
We support two building block libraries.
- ZINCFrag: For reproducible benchmark study, we propose a new public building block library, which is a subset of ZINC22 fragment set. All fragments are also included in AiZynthFinder's built-in ZINC stock.
- Enamine: We support the Enamine building block library, which is available upon request at https://enamine.net/building-blocks/building-blocks-catalog.
Download pre-trained model
We provide some pre-trained GFlowNet models which are trained on QED and pocket-conditional proxy (see ./weights/README.md). Each model weight is also automatically downloaded through its name.
Experiments
<details> <summary><h3 style="display:inline-block">Custom optimization</h3></summary>If you want to train RxnFlow with your custom reward function, you can use the base classes from rxnflow.base. The reward should be Non-negative.
Example codes are provided in src/rxnflow/tasks/ and scripts/examples/.
-
Single-objective optimization
You can find example codes in
seh.pyandunidock_vina.py.import torch from rdkit.Chem import Mol, QED from gflownet import ObjectProperties from rxnflow.base import RxnFlowTrainer, RxnFlowSampler, BaseTask class QEDTask(BaseTask): def compute_obj_properties(self, mols: list[Chem.Mol]) -> tuple[ObjectProperties, torch.Tensor]: is_valid = [filter_fn(mol) for mol in mols] # True for valid objects is_valid_t = torch.tensor(is_valid, dtype=torch.bool) valid_mols = [mol for mol, valid in zip(mols, is_valid) if valid] fr = torch.tensor([QED.qed(mol) for mol in valid_mols], dtype=torch.float) fr = fr.reshape(-1, 1) # reward dimension should be [Nvalid, Nprop] return ObjectProperties(fr), is_valid_t class QEDTrainer(RxnFlowTrainer): # For online training def setup_task(self): self.task = QEDTask(self.cfg) class QEDSampler(RxnFlowSampler): # Sampling with trained GFlowNet def setup_task(self): self.task = QEDTask(self.cfg) -
Multi-objective optimization (Multiplication-based)
You can perform multi-objective optimization by designing the reward function as follows:
$$R(x) = \prod R_{prop}(x)$$
You can find example codes in
unidock_vina_moo.pyandmulti_pocket.py. -
Multi-objective optimization (Multi-objective GFlowNets (MOGFN))
You can find example codes in
seh_moo.pyandunidock_vina_mogfn.py.import torch from rdkit.Chem import Mol as RDMol from gflownet import ObjectProperties from rxnflow.base import RxnFlowTrainer, RxnFlowSampler, BaseTask class MOGFNTask(BaseTask): is_moo=True def compute_obj_properties(self, mols: list[RDMol]) -> tuple[ObjectProperties, torch.Tensor]: is_valid = [filter_fn(mol) for mol in mols] is_valid_t = torch.tensor(is_valid, dtype=torch.bool) valid_mols = [mol for mol, valid in zip(mols, is_valid) if valid] fr1 = torch.tensor([reward1(mol) for mol in valid_mols], dtype=torch.float) fr2 = torch.tensor([reward2(mol) for mol in valid_mols], dtype=torch.float) fr = torch.stack([fr1, fr2], dim=-1) assert fr.shape == (len(valid_mols), self.num_objectives) return ObjectProperties(fr), is_valid_t class MOOTrainer(RxnFlowTrainer): # For online training def set_default_hps(self, base: Config): super().set_default_hps(base) base.task.moo.objectives = ["obj1", "obj2"] # set the objective names def setup_task(self): self.task = MOGFNTask(self.cfg) class MOOSampler(RxnFlowSampler): # Sampling with trained GFlowNet def setup_task(self): self.task = MOGFNTask(self.cfg) -
Finetuning a pre-trained model (non-MOGFN)
We observed that pre-training can be helpful for initial model training. It can be done by setting
config.pretrained_model_path:from rxnflow.utils.download import download_pretrained_weight # download GFN (temperature=U(0,64)) trained on qed reward qed_model_path = download_pretrained_weight('qed-unif-0-64') config.pretrained_model_path = qed_model_path
Single-objective optimization
To train GFlowNet with Vina score using GPU-accelerated UniDock, run:
python scripts/opt_unidock.py -h
python scripts/opt_unidock.py \
--env_dir <Environment directory> \
--out_dir <Output directory> \
-n <Num iterations (64 molecules per iterations; default: 1000)> \
-p <Protein PDB path> \
-c <Center X> <Center Y> <Center Z> \
-l <Reference ligand, required if center is empty. > \
-s <Size X> <Size Y> <Size Z> \
--search_mode <Unidock mode; choice=(fast, balance, detail); default: fast> \
--filter <Drug filter; choice=(lipinski, veber, null); default: lipinski> \
--subsampling_ratio <Subsample ratio; memory-variance trade-off; default: 0.02> \
--pretrained_model <Pretrained model path; optional>
Multi-objective optimization
To perform multi-objective optimization for Vina and QED, we provide two reward designs:
-
Multiplication-based Reward:
$$R(x) = \text{QED}(x) \times \widehat{\text{Vina}}(x)$$
python scripts/opt_unidock_moo.py -h -
Multi-objective GFlowNet (MOGFN):
$$R(x;\alpha) = \alpha \text{QED}(x) + (1-\alpha) \widehat{\text{Vina}}(x)$$
python scripts/opt_unidock_mogfn.py -h
Example (KRAS G12C mutation)
-
Use center coordinates
python scripts/opt_unidock.py -o ./log/kras --filter veber \ -p ./data/examples/6oim_protein.pdb -c 1.872 -8.260 -1.361 -
Use center of the reference ligand
python scripts/opt_unidock_mogfn.py -o ./log/kras_moo \ -p ./data/examples/6oim_protein.pdb -l ./data/examples/6oim_ligand.pdb -
Use pretrained model (see weights/README.md)
We provided pretrained model trained on QED for non-MOGFN :
# fine-tune pretrained model python scripts/opt_unidock.py ... --pretrained_model 'qed-unif-0-64' python scripts/opt_unidock_moo.py ... --pretrained_model 'qed-unif-0-64'
Sampling
Sample high-affinity molecules in a zero-shot manner (no training iterations):
python scripts/sampling_zeroshot.py \
--model_path <Checkpoint path; default: qvina-unif-0-64> \
--env_dir <Environment directory> \
-p <Protein
