TumorChain
【ICLR 2026】Official Repo for Paper ‘’TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis‘’
Install / Use
/learn @ZJU4HealthCare/TumorChainREADME
<sup>1</sup>Zhejiang University <sup>2</sup>DAMO Academy, Alibaba Group <sup>3</sup>Hupan Lab <sup>4</sup>Shanghai Institution of Pancreatic Disease <sup>5</sup>Shengjing Hospital of China Medical University <sup>6</sup>Sun Yat-sen University Cancer Center <br>
<a href='https://arxiv.org/abs/2603.05867'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://github.com/alibaba-damo-academy/Tumorchain'><img src='https://img.shields.io/badge/DAMO-GitHub-green'></a>
</div>🌟 Overview
Welcome to TumorChain!
Our goal is to advance clinical tumor analysis through reliable multimodal reasoning at scale. This project presents a cohesive three-part framework—Dataset, Benchmark, and Model—to enable safe, explainable, and reproducible tumor assessment in high-stakes settings.
<p align="center"> <img src="image/teaser.png" style="width:90%;vertical-align:middle;" /> </p>:clap: Core Vision:
- Establish a closed-loop multimodal reasoning pipeline that standardizes the path from findings to impressions to pathology.
- Create high-quality benchmarks and reproducible evaluation protocols to enable cross-institution comparison and robust generalization.
- Deliver an interpretable, calibrated, and traceable multimodal framework that reduces hallucinations and supports real-world clinical decision-making.
:mailbox: Data collection and statistics
We introduce TumorCoT-1.5M — a large-scale dataset comprising 1.5 million Chain-of-Thought (CoT) labeled VQA prompts, paired with 3D CT scans, featuring stepwise reasoning and cross-modal alignments along the findings–impression–pathology trajectory.
<img src="image/agent.jpg" style="width:70%;vertical-align:middle;" /><img src="image/data.jpg" style="width:30%;vertical-align:middle;" />
:ferris_wheel: Model Architecture
TumorChain is a multi-modal, iterative interleaved reasoning framework for 3D CT tumor analysis that fuses a 3D vision encoder, organ segmentation model, auxiliary classification model, an MLP projector, and a large language model (LLM) to perform stepwise, evidence-grounded reasoning from findings to impressions to pathology, with traceable evidence and calibrated uncertainty.
<p align="center"> <img src="image/model.png" style="width:80%;vertical-align:middle;" /> </p>🛠️ Getting Started
😊 We will release our task definitions, benchmarks, and evaluation protocols in the near future to advance safe, explainable, and reproducible multimodal reasoning for high-stakes tumor analysis. 🚀
Related Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
Security Score
Audited on Apr 3, 2026
