RxnIM

This is the offical code of the paper "Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model"

Generate Convert Improve

Install / Use

/learn @CYF2000127/RxnIM

About this skill

Quality Score

0/100

README

RxnIM

This is the offical code of following paper "Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model".

:sparkles: Highlights

<p align="justify"> In this paper, we present RxnIM, a multimodal large language model for different reaction image data extraction tasks such as reaction extraction task, condition OCR and role identification task. We first formulate these tasks into different task instructions. The model then aligns the task instructions with features extracted from reaction images. An LLM-based decoder can further make predictions based on these instructions. For the reaction extraction task, our model can achieve over 84%-92% soft match F1 score on multiple test sets, which significantly outperforms the previous works. The experiments also show the outstanding condition OCR and role identification abilities.

visualization

<div align="center"> Overall Architecture of our RxnIM. </div>

:sparkles::sparkles: Please check out our newest work on versatile and multimodal information extraction from the chemical literature using a multi-agent system named ChemEAGLE: paper, code!

:rocket: Using the code

Please clone the following repositories:

git clone https://github.com/CYF2000127/RxnIM

:fire: Experiments

Requirement

First create and activate a conda environment with the following command in a Linux, Windows, or MacOS environment (Linux is the most recommended):

conda create -n rxnim python=3.10
conda activate rxnim

2. Then Install requirements:

pip install -r requirements.txt

Data preparation

For training and inference, please download the following datasets to your own path.

Datasets

Synthetic: Pistachio
Real: ACS

Data generation

Or use the codes in data_generation to generate any number of synthetic reaction images. Note that you should download the original Pistachio dataset first and put it into the same file with the codes.

Model

Download the model checkpoint from our Hugging Face Repo and put in your own path

Training

Change the dataset path and jasonl file path in DEFAULT_TRAIN_DATASET.py for different training stages.
Change the parameters in shikra_fsdp.py for different training stages according to the paper.
Run the following command:

sh train.sh

Inference

Run the following command:

sh eval.sh

🤗 Reaction image parsing using RxnIM.Web

Go to our RxnIM.Web demo to directly use our tool!

The input is a chemical reaction image: visualization

<div align="center",width="50"> Example input chemical reaction image. </div>

The output includes the SMILES of reactants and products, and the detailed condition roles:

Reaction: 1
Reactants: CC(C)(C)OC(=O)N[C@H]1C=C[C@H](C(=O)O)C1
Conditions: Br2, Pyridine[reagent], DME/H2O[solvent], 0-5°C[temperature], 68%[yield]
Products: CC(C)(C)OC(=O)N[C@@H]1C[C@H]2C(=O)O[C@H]2[C@@H]1Br
Full Reaction: CC(C)(C)OC(=O)N[C@H]1C=C[C@H](C(=O)O)C1>>CC(C)(C)OC(=O)N[C@@H]1C[C@H]2C(=O)O[C@H]2[C@@H]1Br | Br2, Pyridine[reagent], DME/H2O[solvent], 0-5°C[temperature], 68%[yield]

Reaction: 2
Reactants: CC(C)(C)OC(=O)N[C@@H]1C[C@H]2C(=O)O[C@H]2[C@@H]1Br
Conditions: LiBH4[reagent], THF/H2O[solvent], -5°C[temperature], 90%[yield]
Products: CC(C)(C)OC(=O)N[C@@H]1C[C@@H](CO)[C@@H](O)[C@@H]1Br
Full Reaction: CC(C)(C)OC(=O)N[C@@H]1C[C@H]2C(=O)O[C@H]2[C@@H]1Br>>CC(C)(C)OC(=O)N[C@@H]1C[C@@H](CO)[C@@H](O)[C@@H]1Br | LiBH4[reagent], THF/H2O[solvent], -5°C[temperature], 90%[yield]

Reaction: 3
Reactants: CC(C)(C)OC(=O)N[C@@H]1C[C@@H](CO)[C@@H](O)[C@@H]1Br
Conditions: 48% aq. HBr[reagent], IPA[solvent], 55°C[temperature]
Products: Br.N[C@@H]1C[C@@H](CO)[C@@H](O)[C@@H]1Br
Full Reaction: CC(C)(C)OC(=O)N[C@@H]1C[C@@H](CO)[C@@H](O)[C@@H]1Br>>Br.N[C@@H]1C[C@@H](CO)[C@@H](O)[C@@H]1Br | 48% aq. HBr[reagent], IPA[solvent], 55°C[temperature]

Reaction: 4
Reactants: Br.N[C@@H]1C[C@@H](CO)[C@@H](O)[C@@H]1Br
Conditions: DIPEA[reagent], Pd/C, H2[reagent], IPA/MeOH[solvent], 80% over two steps[yield]
Products: Br.N[C@@H]1C[C@@H](CO)[C@@H](O)C1
Full Reaction: Br.N[C@@H]1C[C@@H](CO)[C@@H](O)[C@@H]1Br>>Br.N[C@@H]1C[C@@H](CO)[C@@H](O)C1 | DIPEA[reagent], Pd/C, H2[reagent], IPA/MeOH[solvent], 80% over two steps[yield]

We also provide the source json file output and using Rdkit to visualize the reaction diagram for better inference and usage:

visualization

<div align="center",width="50"> Rdkit reaction diagram output </div>

:mag: Visualization

We also show some qualitative results of our method below

visualization

<div align="center"> visualization examples of the model’s prediction on the reaction component identification task. </div>

visualization

<div align="center"> Visualization examples of the model’s prediction on the reaction condition interpretation task. </div>

:warning: Acknowledgement

Our code is based on Shikra and VisionLLM, thanks their great jobs!

✅ Citation

@Article{D5SC04173B,
author ="Chen, Yufan and Leung, Ching Ting and Sun, Jianwei and Huang, Yong and Li, Linyan and Chen, Hao and Gao, Hanyu",
title  ="Towards large-scale chemical reaction image parsing via a multimodal large language model",
journal  ="Chem. Sci.",
year  ="2025",
volume  ="16",
issue  ="45",
pages  ="21464-21474",
publisher  ="The Royal Society of Chemistry",
doi  ="10.1039/D5SC04173B",
url  ="http://dx.doi.org/10.1039/D5SC04173B",
abstract  ="Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however{,} its effectiveness depends on the availability of high-quality chemical reaction data. Currently{,} most published chemical reactions are not available in machine-readable form{,} limiting the broader application of AI in this field. The extraction of published chemical reactions into structured databases still relies heavily on manual curation{,} and robust automatic parsing of chemical reaction images into machine-readable data remains a significant challenge. To address this{,} we introduce the Reaction Image Multimodal large language model (RxnIM){,} the first multimodal large language model specifically designed to parse chemical reaction images into machine-readable reaction data. RxnIM not only extracts key chemical components from reaction images but also interprets the textual content that describes reaction conditions. Together with a specially designed large-scale dataset generation method to support model training{,} our approach achieves excellent performance{,} with an average F1 score of 88% on various benchmarks{,} surpassing state-of-the-art methods by an average of 5%. This represents a crucial step toward the automatic construction of large databases of machine-readable reaction data parsed from images in the chemistry literature{,} providing essential data resources for AI research in chemistry. The source code{,} model checkpoints{,} and datasets developed in this work are released under permissive licenses."}

Related Skills

node-connect

332.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

81.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

332.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

81.7k

Commit, push, and open a PR