Endora: Video Generation Models as Endoscopy Simulators (MICCAI 2024)

Project Page | ArXiv Paper | Video Demo

Accepted by International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024)

Chenxin Li1* Hengyu Liu1* Yifan Liu1* Brandon Y. Feng2 Wuyang Li1 Xinyu Liu1 Zhen Chen3 Jing Shao4 Yixuan Yuan1✉

1CUHK 2MIT CSAIL 3CAS CAIR 4Shanghai AI Lab

* Equal Contributions. ✉ Corresponding Author.

introduction

💡Key Features

A high-fidelity medical video generation framework, tested on endoscopy scenes, laying the groundwork for further advancements in the field.
The first public benchmark for endoscopy video generation, featuring a comprehensive collection of clinical videos and adapting existing general-purpose generative video models for this purpose.
A novel technique to infuse generative models with features distilled from a 2D visual foundation model, ensuring consistency and quality across different scales.
Versatile ability through successful applications in video-based disease diagnosis and 3D surgical scene reconstruction, highlighting its potential for downstream medical tasks

🛠Setup

git clone https://github.com/XGGNet/Endora.git
cd Endora
conda create -n Endora python=3.10
conda activate Endora

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt

Tips A: We test the framework using pytorch=2.1.2, and the CUDA compile version=11.8. Other versions should be also fine but not totally ensured.

Tips B: GPU with 24GB (or more) is recommended for video sampling by Endora inference, and 48GB (or more) for Endora training.

📚Data Preparation

Colonoscopic: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

Kvasir-Capsule: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

CholecTriplet: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

Please run process_data.py and process_list.py to get the split frames and the corresponding list at first.

CUDA_VISIBLE_DEVICES=gpu_id python process_data.py -s /path/to/datasets -t /path/to/save/video/frames

CUDA_VISIBLE_DEVICES=gpu_id python process_list.py -f /path/to/video/frames -t /path/to/save/text

The resulted file structure is as follows.

├── data
│   ├── CholecT45
│     ├── 00001.mp4
|     ├──  ...
│   ├── Colonoscopic
│     ├── 00001.mp4
|     ├──  ...
│   ├── Kvasir-Capsule
│     ├── 00001.mp4
|     ├──  ...
│   ├── CholecT45_frames
│     ├── train_128_list.txt
│     ├── 00001 
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...
│   ├── Colonoscopic_frames
│     ├── train_128_list.txt
│     ├── 00001
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...
│   ├── Kvasir-Capsule_frames
│     ├── train_128_list.txt
│     ├── 00001
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...

🎇Sampling Endoscopy Videos

You can directly sample the endoscopy videos from the checkpoint model. Here is an example for quick usage for using our pre-trained models:

Download the pre-trained weights from here and put them to specific path defined in the configs.
Run sample.py by the following scripts to customize the various arguments like adjusting sampling steps.

Simple Sample to generate a video

bash sample/col.sh
bash sample/kva.sh
bash sample/cho.sh

DDP sample

bash sample/col_ddp.sh
bash sample/kva_ddp.sh
bash sample/cho_ddp.sh

⏳Training Endora

The weight of pretrained DINO can be found here, and in our implementation we use ViT-B/8 during training Endora. And the saved path need to be edited in ./configs

Train Endora with the resolution of 128x128 with N GPUs on the Colonoscopic dataset

torchrun --nnodes=1 --nproc_per_node=N train.py \
  --config ./configs/col/col_train.yaml \
  --port PORT \
  --mode type_cnn \
  --prr_weight 0.5 \
  --pretrained_weights /path/to/pretrained/DINO

Run training Endora with scripts in ./train_scripts

bash train_scripts/col/train_col.sh
bash train_scripts/kva/train_kva.sh
bash train_scripts/cho/train_cho.sh

📏Metric Evaluation

We first split the generated videos to frames and use the code fr

Endora

Install / Use

README