SkillAgentSearch skills...

U2Tokenizer

a multiscale multimodal large language models for radiology report generation (RRG) tasks

Install / Use

/learn @Siyou-Li/U2Tokenizer
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<p> <h1> <img src="./assets/logo.svg" height=150px align="right"/> <var>&micro<sup>2</sup></var>Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation </h1> </p>

PWC PWC PWC

🎉🎉🎉 Our Paper accepted by the 28th conference of The Medical Image Computing and Computer Assisted Intervention Society (MICCAI). See you in Daejeon, Korea from September 23-27, 2025.

<p align="center"> <img src="./assets/cover.svg"> </p>

This repository contains the official paper for μ² Tokenizer, a novel approach for automated radiology report generation (RRG) introduced in the paper "μ² Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation".

Our proposed model, μ²LLM, leverages a multi-scale, multi-modal architecture to generate accurate and clinically salient radiology reports from CT scans.

👋 Introduction

<img src="./assets/ullm.svg">

we introduce μ²LLM, a multi-scale multimodal large language model. At its core is the novel μ² Tokenizer, an intermediate layer that intelligently fuses visual features from CT scans with textual information. The model is further refined using Direct Preference Optimization (DPO), guided by the specialized medical report evaluation metric, GREEN, to ensure the generated reports align with expert standards.

<img src="./assets/dpo.svg">

Our experimental results on four large-scale CT datasets show that μ²LLM outperforms existing methods, highlighting its potential for generating high-quality radiology reports even with limited training data.

🚀 Quickstart

Here, we can easily use our model based on Hugging Face.

coming soon...

🤖 Model

| Model | Download Link | |----------|-----------------------------------------------------------------------------------------------------------------------------------------------| | μ²Qwen3-8B | HuggingFace| | μ²Qwen3-1.7B | HuggingFace|

⚙️ Installation

git clone https://github.com/Siyou-Li/u2Tokenizer.git
cd u2Tokenizer
pip install -r requirements.txt

Ensure that the NVIDIA CUDA version 11.8 or above to be compatible with PyTorch 2.2.2.

💿 Data

Coming soon...

🚄 Training

Coming soon...

🧰 System Hardware requirements

For training, stage 1 and 2 use a 4 * 80GB A100 GPU. For inference, a single 40GB A40 GPU is used. For loading model checkpoint, approximately 39GB of CPU memory is required.

🫡 Acknowledgements

✨ Cite our work

If you find this repo useful, please consider citing:

@misc{li2025mu2tokenizerdifferentiablemultiscalemultimodal,
      title={${\mu}^2$Tokenizer: Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation}, 
      author={Siyou Li and Pengyao Qin and Huanan Wu and Dong Nie and Arun J. Thirunavukarasu and Juntao Yu and Le Zhang},
      year={2025},
      eprint={2507.00316},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.00316}, 
}
View on GitHub
GitHub Stars276
CategoryDevelopment
Updated11d ago
Forks21

Languages

Python

Security Score

95/100

Audited on Mar 20, 2026

No findings