SynGFN

This repository contains the implementation code for our research paper, SynGFN: learning across chemical space with generative flow-based molecular discovery

Introduction

SynGFN features two key ingredients: (1) a hierarchically pre-trained policy network that significantly accelerates learning across diverse distributions of desirable molecules in chemical spaces, and (2) a multi-fidelity active learning framework to alleviate the cost of reward evaluations.

Installation

Follow these steps to set up the environment and install all dependencies for this project.

conda create --name syngfn python==3.9.18
conda activate syngfn
bash install_dependencies.sh

Data

SynGFN requires the reaction template set and the building block library.

The reaction template set we selected mainly come from 58 robust reactions published by Hartenfeller et al and the virtual reaction database published by Button et al, which includes 64 reactions. Solvents, catalysts, and constant reagents are omitted from the Reaction SMARTS. None of the reactions have defined the stereochemistry of the products. The template set is available under data/template.py.

The Enamine building block library is available upon request at https://enamine.net/building-blocks/building-blocks-catalog. We used the "Global Stock" released at 2023.07.17. In our work, we further divided the building block library into four different scales, denoted as S (Small), M (Medium), L (Large), and XL (Extreme Large). The specific data processing and construction process is available in dataprocess/README.md.

Main Components

The SynGFN framework consists of four core components: Environment, Proxy, Policy Models, and the GFlowNet Agent. 1.Environment The environment defines the state space and the hierarchical action space. In SynGFN, we treat the selection of reactions and reactants as two separate actions: Action 1 and Action 2. 2.Proxy The proxy refers to the model that provides rewards for the states of the environment. In this work, we use QSAR model as the scoring function (proxy) for generated molecules. 3.Policy Models The policy models define neural networks that model state transitions. For the current task, simple multi-layer perceptrons (MLPs) with a few layers are sufficient to meet the requirements. 4.GFlowNet Agent The GFlowNet agent is responsible for orchestrating the interactions between the environment, proxy, and policy models. In this implementation, we adopt the trajectory balance loss as the training objective.

Usage

To train a SynGFN model with the default configuration, simply run

python main.py user.logdir.root=<path/to/log/files/>

Alternatively, you can create a user configuration file in config/user/<username>.yaml specifying a logdir.root and run

python main.py user=<username>

SynGFN uses Hydra to handle configuration files. The main.yaml file provides a set of basic adjustable parameters. For different modules, such as env and policy, users can further include additional configurable parameters in main.yaml for unified adjustments.

If you have already completed the model training, we provide a script for sampling only. You just need to replace the two model weight paths in sample.py (e.g., model_weights = torch.load('path/to/SynGFN/logs/xx/ckpts/ck_f_x_0_iterxx.ckpt')), and then you can run

python sample.py user=<username>

For more detailed explanations, please refer to each folder's README.md file.

Explanation of Results

To better illustrate the generation results of SynGFN, we have included an example case logs/example. This example provides detailed explanations of the three main output files generated by SynGFN.

Citation

If you find the models useful in your research, please cite our paper. We have developed our codes based on the gflownet and mf-al-gfn. We really appreciate these excellent works!

Contact

If you have any question, please feel free to email us (yuchenzhu@zju.edu.cn).

SpaceGFN: The Next Step in Molecular Design

We are excited to announce that SpaceGFN, an upgraded version of SynGFN, is currently in development.
Building on the foundation of SynGFN, SpaceGFN offers significant improvements and new features for de novo molecular design and molecular optimization.

The SpaceGFN framework introduces new operational modes, supports customizable reaction steps, and integrates advanced docking-based scoring systems. We believe these updates will provide researchers and developers with a more powerful and flexible tool for exploring chemical spaces.

Stay tuned! The code and related research articles will be released soon. We encourage you to follow the development and explore the exciting possibilities SpaceGFN offers.

SynGFN

Install / Use

README

SynGFN

Introduction

Installation

Data

Main Components

Usage

Explanation of Results

Citation

Contact

SpaceGFN: The Next Step in Molecular Design