Trio
Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search
Install / Use
/learn @SZU-ADDG/TrioREADME
Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search
This is the official code repository for the paper: Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search.
In our paper, we introduce:
-
Fragment-based Generative Pre-trained Transformer (FragGPT): A molecular language model designed for context-aware fragment assembly, enabling the construction of novel molecular structures from a learned vocabulary of chemical fragments.
<img src="image/image-20251013211318322.png" alt="image-20251013211318322" style="zoom: 50%;" /> -
Chemical Property Alignment with Direct Preference Optimization (DPO): A reinforcement learning technique to align the generative process with desirable pharmacological properties, enforcing physicochemical and synthetic feasibility to produce more drug-like candidates.
<img src="image/image-20251013211340520.png" alt="image-20251013211340520" style="zoom:50%;" /> -
Target-aware Molecular Generation via Monte Carlo Tree Search (MCTS): A guided search strategy that balances the exploration of novel chemotypes and the exploitation of promising intermediates, optimizing ligand generation directly within the context of a specific protein binding pocket.
<img src="image/image-20251013211354767.png" alt="image-20251013211354767" style="zoom:50%;" />
Installation
The required environmental dependencies for this project are listed in the environment.yml file. You can easily create and activate the environment using Conda:
conda env create -f environment.yml
conda activate your_env_name
Hardware Requirements
A single run of the code requires less than 2000MB of VRAM. An NVIDIA RTX 3090 or a GPU with equivalent performance is sufficient.
Pre-trained Weights
The pre-trained weight files required for the project can be downloaded from the following link:
Click here to download the weight files
After downloading, please place the weight files in the ./weights.
Usage
1. De Novo Generation
For unconstrained de novo molecular generation, run the generate.py script:
python generate.py
2. Constrained Generation
For conditional constrained generation tasks, navigate to the constrained_generation directory. This folder contains the relevant Python scripts and Jupyter Notebooks for you to run.
3. Target-based Generation
To generate molecules for specific protein targets, run the run_mcts.py script:
python run_mcts.py
You can specify different protein targets by modifying the ligand name in the run_mcts.py file. The project currently supports the following 5 proteins, which have been validated in the paper:
parp1jak2fa75ht1bbraf
Important Note:
/utils/docking/qvina02 is the executable file for molecular docking. Before running, please ensure you grant it executable permissions:
chmod +x ./utils/docking/qvina02
4. Custom Target Generation
If you wish to generate molecules for a custom target, please follow these steps:
- Prepare your protein file in
pdbqtformat. - Open the
utils/docking/docking_utils.pyfile. - In this file, add the name of your custom protein, its pocket's central position, and the pocket size.
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
