IntelliScope
Frontiers in Intelligent Colonoscopy [ColonSurvey | ColonINST | ColonGPT]
Install / Use
/learn @ai4colonoscopy/IntelliScopeREADME
<img align="right" src="./assets/teaser-figure.png" width="285px" />
Colonoscopy is currently one of the most sensitive screening methods for colorectal cancer (🔗 Wikipedia). Have you ever wondered how to make colonoscopy smarter? Well, buckle up, let's enter the exciting world of intelligent colonoscopy!
- Our vision. To explore the frontiers of intelligent colonoscopy techniques and their potential impact on multimodal medical applications.
- Why use IntelliScope? It combines "Intelligent" and "colonoScope", where "Intelli" reflects the intelligent processing and decision-making capabilities of the system, and "Scope" refers to the colonoscope device used in medical endoscopy. Together, they imply a cutting-edge multimodal system designed to improve colonoscopy with advanced AI technologies.
- Project members. Ge-Peng Ji (🇦🇺 ANU), Jingyi Liu (🇯🇵 Keio), Peng Xu (🇨🇳 THU), Nick Barnes (🇦🇺 ANU), Fahad Shahbaz Khan (🇦🇪 MBZUAI), Salman Khan (🇦🇪 MBZUAI), Deng-Ping Fan (🇨🇳 NKU)
- Let's join our IntelliScope community. We are building a discussion forum for the convenience of researchers to 💬 ask any questions, 💬 showcase/promote your work, 💬 access data resources, and 💬 share research ideas.
- Quick view. Next, we present some features of our colonoscopy-specific AI chatbot, ColonGPT. This is a domain-pioneering multimodal language model that can help endoscopists perform various user-driven tasks through interactive dialogues.
Updates
- [Mar/09/2026] We provide the translated version (CN) of the paper for the convenience of Chinese readers, enabling a broader audience to better understand the methodology, experimental results, and key contributions presented in this work.
- [Jan/07/2026] Our paper has officially available at Springer Nature, please read our paper with this link, and cite our paper using this bibtex.
- [Dec/09/2025] 🔥🔥🔥 Thrilled to announce the largest multimodal colonoscopy dataset, ColonVQA, with 1.1M+ VQA entries. We also propose the first reasoning-centric solutions: ColonReason dataset and the first reasoning-based model, ColonR1. Project is here: https://github.com/ai4colonoscopy/Colon-X. Your 🌟star is our biggest motivation to move forward.
- [April/05/2025] Our project now supports the Chinese AI platform wisemodel.
- [Feb/07/2025] We announce a new two-stage training strategy for enhancing ColonGPT's performance, achieving SOTA results on all downstream tasks. Further details are available in our technical report (arXiv-v2).
- [Oct/30/2024] We've set up an online benchmark on the paper-with-code website.
- [Oct/16/2024] Open-source the whole codebase of project.
🔥 Research Highlights
<p align="center"> <img src="./assets/overview_for_github.png" width="800px" /> <br/> <em> Figure 1: Introductary diagram. </em> </p>- <u>Survey on colonoscopic scene perception (CSP)</u> ➡️ "We assess the current landscape to sort out domain challenges and under-researched areas in the AI era."
- 📖 ColonSurvey. We investigate the latest research progress in four colonoscopic scene perception tasks from both data-centric and model-centric perspectives. Our investigation summarises key features of 63 datasets and 137 representative deep techniques published since 2015. In addition, we highlight emerging trends and opportunities for future study. (🔗 Hyperlink)
- 💥 <u>Multimodal AI Initiatives</u> ➡️ "We advocate three initiatives to embrace the coming multimodal era in colonoscopy."
- 🏥 ColonINST. We introduce a pioneering instruction tuning dataset for multimodal colonoscopy research, aimed at instructing models to execute user-driven tasks interactively. This dataset comprises of 62 categories, 300K+ colonoscopic images, 128K+ medical captions (GPT-4V) generated), and 450K+ human-machine dialogues. (🔗 Hyperlink)
- 🤖 ColonGPT. We develop a domain-specific multimodal language model to assist endoscopists through interactive dialogues. To ensure reproducibility for average community users, we implement ColonGPT in a resource-friendly way, including three core designs: a 0.4B-parameter visual encoder 🤗 SigLIP-SO from Google, a 1.3B-parameter lightweight language model 🤗 Phi1.5 from Microsoft, and a multigranularity adapter for token reducing from 100% to only 33.74% but not compromise to performance. (🔗 Hyperlink)
- 💯 Multimodal Benchmark. We contribute a multimodal benchmark, including six general-purpose models and two designed for medical purposes, across three colonoscopy tasks to enable fair and rapid comparisons going forward. (🔗 Hyperlink)
📖 ColonSurvey
Our "ColonSurvey" project contributes various useful resources for the community. We investigate 63 colonoscopy datasets and 137 deep learning models focused on colonoscopic scene perception, all sourced from leading conferences or journals since 2015. This is a quick overview of our investigation; for a more detailed discussion, please refer to our paper in PDF format.
<p align="center"> <img src="./assets/colonsurvey.png"/> <br /> <em> Figure 2: The investigation of colonoscopy datasets and models. </em> </p>To better understand developments in this rapidly changing field and accelerate researchers’ progress, we are building a 📖 paper reading list, which includes a number of AI-based scientific studies on colonoscopy imaging from the past 12 years. [UPDATE ON OCT-14-2024] In detail, our online list contains:
- Colonoscopy datasets 🔗 Google sheet
- Colonoscopy models
- Classification tasks 🔗 Google sheet
- Detection tasks 🔗 Google sheet
- Segmentation tasks 🔗 Google sheet
- Vision language tasks 🔗 [Google sheet](https://docs.google.com/spreadsheets/d/1V_s99Jv9syzM6FPQAJVQqO
