CombiMOTS

[ICML2025] Official Implementation of "CombiMOTS: Combinatorial Multi-Objective Tree Search for Dual-Target Molecule Generation"

Generate Convert Improve

Install / Use

/learn @Tibogoss/CombiMOTS

About this skill

Quality Score

0/100

README

<h1 align="center">CombiMOTS: Combinatorial Multi-Objective Tree Search for Dual-Target Molecule Generation</h1> <p align="center"> <a href="https://openreview.net/forum?id=FSlTEObdLl"><img src="https://img.shields.io/badge/OpenReview-ICML'25.16227-b31b1b.svg" alt="Paper"></a> <a href="https://icml.cc/media/PosterPDFs/ICML%202025/45885.png?t=1752232241.6172879"> <img src="https://img.shields.io/badge/Poster-grey?logo=airplayvideo&logoColor=white" alt="Poster"></a> <a href="./assets/ICML2025-CombiMOTS_Slides.pdf"> <img src="https://img.shields.io/badge/Slides-grey?&logo=MicrosoftPowerPoint&logoColor=white" alt="Slides"></a> </p> Official implementation of CombiMOTS for Fragment-based Monte Carlo Tree Search for Dual-Inhibitors Molecular Graph Generation.

Refer to Poster or Slides for a more in-depth overview of our work!

<p align="center"><img src="./assets/overview.png" width=80%></p> <p align="center">Project overview.</p>

Broader Applications

We release a pretrained ensemble ChemProp ClinTox model in models/clintox that can be used for Toxicity Optimization as described in our original manuscript.
We created another repository (<a href="https://github.com/Tibogoss/KinSel"> <img src="https://img.shields.io/badge/KinSel-grey?&logo=MicrosoftPowerPoint&logoColor=white" alt="KinSel"> </a>) using CombiMOTS for the Selective Molecular Generation (using CDK7 as the target). Our main manuscript also discusses motivation background and implementation details.

Baseline papers

Activity-aware fragments are obtained with Graph Information Bottleneck - Adapted from https://arxiv.org/abs/2310.00841

Our Pareto MCTS pipeline is adapted from SyntheMol https://www.nature.com/articles/s42256-024-00809-7

The 13 Enamine (https://enamine.net/) REAL Space and corresponding reactions are also provided by the work above.

To accelerate molecular docking simulation, we utilize QuickVina-GPU-2.1 from https://pubmed.ncbi.nlm.nih.gov/39320991/

Install Environment

Implementation was originally conducted with Python3.10 and CUDA11.7 on a single NVIDIA RTX A6000 GPU or CPU.

conda create -n combimots
conda activate combimots

conda install -c bioconda mgltools -y
conda install python=3.10 -y
conda install -c nvidia/label/cuda-11.7.0 cuda-nvcc -y
conda install -c nvidia cuda-opencl -y
conda install -c conda-forge ocl-icd-system -y

conda install -c conda-forge boost=1.77.0 boost-cpp=1.77.0 pdbfixer openbabel openmm -y

export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH

####### Installation of QuickVina-GPU-2.1
# In combimots/pmcts/docking, install and compile QuickVina-GPU-2.1 following https://github.com/DeltaGroupNJUPT/Vina-GPU-2.1

# Once installed, modify the DOCKING_PATH_PREFIX in 
# [combimots/pmcts/docking/docking_utils.py @l.16]
# [6-precompute_docking_scores.py @l.16]

pip install torch==2.0.1+cu117 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-2.0.1+cu117.html
pip install torch-geometric==2.0.4

pip install -r requirements.txt
pip install -e combimots/. # setup combimots in-line command

Note to the user

The next sections describe all pre-processing steps (running scripts 0 to 8).

If you only want to run generation and evaluation, we provide processed data and model checkpoints. You may skip these steps and directly go to the generation section.

Pipeline

In /data you may place a .csv file containing:

smiles
{target1}_activity
{target2}_activity

For demonstration, we provide data for the GSK3B-JNK3, EGFR-MET and PIK3CA-mTOR target pairs.

This data is curated from ExCAPE-DB v2 (https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0203-5)

Train Chemprop (D-MPNN) Property Predictor

chemprop_train --data_path {data_path} \
--dataset_type classification \
--split_type cv \
--num_folds 10 \
--seed 42 \
--gpu 0 \
--save_dir models/${model_name}

# $ chemprop_train --data_path data/GSK3B_JNK3.csv --dataset_type classification --split_type cv --num_folds 10 --seed 42 --gpu 0 --save_dir models/gsk3b_jnk3

Data Preparation

Search Space Reduction

Fragment-based Graph Information Bottleneck

Process the .csv to a .pt:

python utils_fgib/data.py --csv_path $data/{YOUR_DATA.csv} --target ${activity}

# $ python utils_fgib/data.py --csv_path data/GSK3B_JNK3.csv --target gsk3b_activity
# $ python utils_fgib/data.py --csv_path data/GSK3B_JNK3.csv --target jnk3_activity

Train the modules:

python 1-train_fgib.py -g ${gpu_id} --target ${target_activity}

# $ python 1-train_fgib.py -g 0 --target gsk3b_activity
# $ python 1-train_fgib.py -g 0 --target jnk3_activity

Extract and save the building blocks for both targets:

python 2-get_frags.py -g ${gpu_id} -t ${target_activity} -m ${target_pt_path} -v ${frags_path}

# $ python 2-get_frags.py -g 0 -t gsk3b_activity -m ckpt/gsk3b_activity_10.pt -v data/gsk3b.txt
# $ python 2-get_frags.py -g 0 -t jnk3_activity -m ckpt/jnk3_activity_10.pt -v data/jnk3.txt

Clean and merge both targets' building blocks:

python 3-frags_to_blocks.py ${frags_path1} ${frags_path2} ${fragments_path}

# $ python 3-frags_to_blocks.py data/gsk3b.txt data/jnk3.txt data/fgib_frags.csv

Map FGIB fragments to Enamine's REAL Space Building Blocks

python 4-get_similar_blocks.py --custom_path ${fragments_path} \
--real_path combimots/pmcts/resources/real/building_blocks.csv \
--output_path models/${model_name}/${blocks_path} \
--threshold ${tanim_thresh} --batch_size ${bs}

# $ python 4-get_similar_blocks.py --custom_path data/fgib_frags.csv --real_path combimots/pmcts/resources/real/building_blocks.csv --output_path models/gsk3b_jnk3/similar.csv --threshold 0.4 --batch_size 2500

Remove Salts:

chemfunc canonicalize_smiles --data_path models/${model_name}/${blocks_path} \
--save_path models/${model_name}/${blocks_path} \
--remove_salts --delete_disconnected_mols

# $ chemfunc canonicalize_smiles --data_path models/gsk3b_jnk3/similar.csv --save_path models/gsk3b_jnk3/similar.csv --remove_salts --delete_disconnected_mols

Remove Br, Si and Li atoms for QuickVina-GPU-2.1 compatibility:

python 5-remove_B_Si_Li_blocks.py models/${model_name}/${blocks_path} models/${model_name}/${blocks_path}

# $ python 5-remove_B_Si_Li_blocks.py models/gsk3b_jnk3/similar.csv models/gsk3b_jnk3/similar.csv

Precompute activities and docking scores

Bioactivities using Chemprop (D-MPNN):

chemprop_predict --test_path models/${model_name}/${blocks_path} \
--preds_path models/${model_name}/${blocks_path} \
--checkpoint_dir models/${model_name}

# $ chemprop_predict --test_path models/gsk3b_jnk3/similar.csv --preds_path models/gsk3b_jnk3/precompute.csv --checkpoint_dir models/gsk3b_jnk3

Docking Scores using QuickVina-GPU-2.1 This step is very important as docking oracles are the most expensive components during generation.

python 6-precompute_docking_scores.py models/${model_name}/${blocks_path} models/${model_name}/${blocks_path} --target_pair ${target_pair}

# $ python 6-precompute_docking_scores.py models/gsk3b_jnk3/precompute.csv models/gsk3b_jnk3/final_blocks.csv --target_pair gsk3b_jnk3
# target_pair: str= ["gsk3b_jnk3", "egfr_met", "pik3ca_mtor", "dhodh_rorgt"]

Map building blocks to Enamine's reactions

# Map building blocks -> .pkl
python 7-map_bbs_to_search_space.py --input models/${model_name}/${blocks_path} \
--real_path combimots/pmcts/resources/real/reaction_to_building_blocks.pkl \
--save_path combimots/pmcts/resources/real/${target_pair}.pkl \
--smiles_column smiles

# $ python 7-map_bbs_to_search_space.py --input models/gsk3b_jnk3/final_blocks.csv --real_path combimots/pmcts/resources/real/reaction_to_building_blocks.pkl --save_path combimots/pmcts/resources/real/gsk3b_jnk3.pkl --smiles_column smiles

# Filter non-matching BBs w.r.t. the provided reacions
python 8-filter_reactions_to_blocks.py \
--reaction_to_building_blocks_path combimots/pmcts/resources/real/${target_pair}.pkl \
--save_path combimots/pmcts/resources/real/${target_pair}.pkl

# $ python 8-filter_reactions_to_blocks.py --reaction_to_building_blocks_path combimots/pmcts/resources/real/gsk3b_jnk3.pkl --save_path combimots/pmcts/resources/real/gsk3b_jnk3.pkl

Generation: Pareto Monte-Carlo Tree Search

pmcts --model_path models/${model_name} --save_dir generations/${model_name}/ \
--target_activities ${activity1, activity2} \
--target_pair ${target_pair} \
--building_blocks_path models/${model_name}/final_blocks.csv \
--n_rollout 5000

# $ pmcts --model_path models/${model_name} --save_dir generations/${model_name}/ \
--target_activities gsk3b_activity jnk3_activity \
--target_pair gsk3b_jnk3 \
--building_blocks_path models/gsk3b_jnk3/final_blocks.csv \
--n_rollout 5000

Evaluation

Filter out the molecules predicted as dual actives

python 9-filter_dual_actives.py generations/${model_name}/pareto_molecules.csv generations/${model_name}/pareto_dual_actives.csv

# $ python 9-filter_dual_actives.py generations/gsk3b_jnk3/pareto_molecules.csv generations/gsk3b_jnk3/pareto_dual_actives.csv

Optionally, re-docking simulations have to be run separately.

For all other metrics (Validity, Uniqueness, Novelty, Diversity, Avg.QED, Avg.SA), we provide evaluate.py:

python 10-evaluate.py --model models/${model_name} \
--generation generations/${model_name}/pareto_dual_actives.csv \
--training {path_to_dual_positives_of_training_set_csv}

# $ python 10-evaluate.py --generation generations/gsk3b_jnk3/pareto_dual_actives.csv --training data/GSK3B_dual_actives.csv

If you find our paper/repo useful or use it for personal projects/research, please cite our original paper: -->

@inproceedings{
southiratn2025combimots,
title={Combi{MOTS}: Combinatorial Multi-Objective Tree Search for Dual-Target Molecule Generation},
author={

Related Skills

node-connect

345.4k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

104.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

345.4k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

345.4k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。