SynGFN
No description available
Install / Use
/learn @ChemloverYuchen/SynGFNREADME
SynGFN
This repository contains the implementation code for our research paper, SynGFN: learning across chemical space with generative flow-based molecular discovery
Introduction
SynGFN features two key ingredients: (1) a hierarchically pre-trained policy network that significantly accelerates learning across diverse distributions of desirable molecules in chemical spaces, and (2) a multi-fidelity active learning framework to alleviate the cost of reward evaluations.
Installation
Follow these steps to set up the environment and install all dependencies for this project.
conda create --name syngfn python==3.9.18
conda activate syngfn
bash install_dependencies.sh
Data
SynGFN requires the reaction template set and the building block library.
The reaction template set we selected mainly come from 58 robust reactions published by Hartenfeller et al and the virtual reaction database published by Button et al, which includes 64 reactions. Solvents, catalysts, and constant reagents are omitted from the Reaction SMARTS. None of the reactions have defined the stereochemistry of the products. The template set is available under data/template.py.
The Enamine building block library is available upon request at https://enamine.net/building-blocks/building-blocks-catalog. We used the "Global Stock" released at 2023.07.17. In our work, we further divided the building block library into four different scales, denoted as S (Small), M (Medium), L (Large), and XL (Extreme Large). The specific data processing and construction process is available in dataprocess/README.md.
Main Components
The SynGFN framework consists of four core components: Environment, Proxy, Policy Models, and the GFlowNet Agent. 1.Environment The environment defines the state space and the hierarchical action space. In SynGFN, we treat the selection of reactions and reactants as two separate actions: Action 1 and Action 2. 2.Proxy The proxy refers to the model that provides rewards for the states of the environment. In this work, we use QSAR model as the scoring function (proxy) for generated molecules. 3.Policy Models The policy models define neural networks that model state transitions. For the current task, simple multi-layer perceptrons (MLPs) with a few layers are sufficient to meet the requirements. 4.GFlowNet Agent The GFlowNet agent is responsible for orchestrating the interactions between the environment, proxy, and policy models. In this implementation, we adopt the trajectory balance loss as the training objective.
Usage
To train a SynGFN model with the default configuration, simply run
python main.py user.logdir.root=<path/to/log/files/>
Alternatively, you can create a user configuration file in config/user/<username>.yaml specifying a logdir.root and run
python main.py user=<username>
SynGFN uses Hydra to handle configuration files. The main.yaml file provides a set of basic adjustable parameters. For different modules, such as env and policy, users can further include additional configurable parameters in main.yaml for unified adjustments.
If you have already completed the model training, we provide a script for sampling only. You just need to replace the two model weight paths in sample.py (e.g., model_weights = torch.load('path/to/SynGFN/logs/xx/ckpts/ck_f_x_0_iterxx.ckpt')), and then you can run
python sample.py user=<username>
For more detailed explanations, please refer to each folder's README.md file.
Explanation of Results
To better illustrate the generation results of SynGFN, we have included an example case logs/example. This example provides detailed explanations of the three main output files generated by SynGFN.
Citation
If you find the models useful in your research, please cite our paper. We have developed our codes based on the gflownet and mf-al-gfn. We really appreciate these excellent works!
Contact
If you have any question, please feel free to email us (yuchenzhu@zju.edu.cn).
SpaceGFN: The Next Step in Molecular Design
We are excited to announce that SpaceGFN, an upgraded version of SynGFN, is currently in development.
Building on the foundation of SynGFN, SpaceGFN offers significant improvements and new features for de novo molecular design and molecular optimization.
The SpaceGFN framework introduces new operational modes, supports customizable reaction steps, and integrates advanced docking-based scoring systems. We believe these updates will provide researchers and developers with a more powerful and flexible tool for exploring chemical spaces.
Stay tuned! The code and related research articles will be released soon. We encourage you to follow the development and explore the exciting possibilities SpaceGFN offers.
