SkillAgentSearch skills...

Endora

Endora: Video Generation Models as Endoscopy Simulators (MICCAI 2024)

Install / Use

/learn @CUHK-AIM-Group/Endora
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Endora: Video Generation Models as Endoscopy Simulators (MICCAI 2024)

<p align="center"> <img src="./assets/avatar.png" alt="" width="120" height="120"> </p> <!-- <i>The avatar is generated by stable diffusion.</i> -->

Project Page | ArXiv Paper | Video Demo

Accepted by International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2024)

Chenxin Li<sup>1*</sup> Hengyu Liu<sup>1*</sup> Yifan Liu<sup>1*</sup> Brandon Y. Feng<sup>2</sup> Wuyang Li<sup>1</sup> Xinyu Liu<sup>1</sup> Zhen Chen<sup>3</sup> Jing Shao<sup>4</sup> Yixuan Yuan<sup>1✉</sup>

<!-- [Yifan Liu](https://yifliu3.github.io/)<sup>1*</sup>, [Chenxin Li](https://xggnet.github.io/)<sup>1*</sup>, [Chen Yang](https://scholar.google.com/citations?user=C6fAQeIAAAAJ&hl)<sup>2</sup>, [Yixuan Yuan](https://www.ee.cuhk.edu.hk/en-gb/people/academic-staff/professors/prof-yixuan-yuan)<sup>1✉</sup> -->

<sup>1</sup>CUHK   <sup>2</sup>MIT CSAIL   <sup>3</sup>CAS CAIR   <sup>4</sup>Shanghai AI Lab  

<sup>*</sup> Equal Contributions. <sup></sup> Corresponding Author.


introduction

💡Key Features

  • A high-fidelity medical video generation framework, tested on endoscopy scenes, laying the groundwork for further advancements in the field.
  • The first public benchmark for endoscopy video generation, featuring a comprehensive collection of clinical videos and adapting existing general-purpose generative video models for this purpose.
  • A novel technique to infuse generative models with features distilled from a 2D visual foundation model, ensuring consistency and quality across different scales.
  • Versatile ability through successful applications in video-based disease diagnosis and 3D surgical scene reconstruction, highlighting its potential for downstream medical tasks

🛠Setup

git clone https://github.com/XGGNet/Endora.git
cd Endora
conda create -n Endora python=3.10
conda activate Endora

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt

Tips A: We test the framework using pytorch=2.1.2, and the CUDA compile version=11.8. Other versions should be also fine but not totally ensured.

Tips B: GPU with 24GB (or more) is recommended for video sampling by <i>Endora</i> inference, and 48GB (or more) for <i>Endora</i> training.

📚Data Preparation

Colonoscopic: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

<!-- The dataset provided in [Colonoscopic](https://arxiv.org/abs/2206.15255) is used. You can download and process the dataset from their website (https://github.com/med-air/EndoNeRF). We use the two accessible clips including 'pulling_soft_tissues' and 'cutting_tissues_twice'. -->

Kvasir-Capsule: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

<!-- The dataset provided in [Kvasir-Capsule](https://arxiv.org/abs/2206.15255) is used. You can download and process the dataset from their website (https://github.com/med-air/EndoNeRF). We use the two accessible clips including 'pulling_soft_tissues' and 'cutting_tissues_twice'. -->

CholecTriplet: The dataset provided by paper can be found here. You can directly use the processed video data by Endo-FM without further data processing.

<!-- The dataset provided in [CholecTriplet](https://endovissub2019-scared.grand-challenge.org/) is used. To obtain a link to the data and code release, sign the challenge rules and email them to max.allan@intusurg.com. You will receive a temporary link to download the data and code. --> <!-- Follow [MICCAI_challenge_preprocess](https://github.com/EikoLoki/MICCAI_challenge_preprocess) to extract data. -->

Please run process_data.py and process_list.py to get the split frames and the corresponding list at first.

CUDA_VISIBLE_DEVICES=gpu_id python process_data.py -s /path/to/datasets -t /path/to/save/video/frames

CUDA_VISIBLE_DEVICES=gpu_id python process_list.py -f /path/to/video/frames -t /path/to/save/text

The resulted file structure is as follows.

├── data
│   ├── CholecT45
│     ├── 00001.mp4
|     ├──  ...
│   ├── Colonoscopic
│     ├── 00001.mp4
|     ├──  ...
│   ├── Kvasir-Capsule
│     ├── 00001.mp4
|     ├──  ...
│   ├── CholecT45_frames
│     ├── train_128_list.txt
│     ├── 00001 
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...
│   ├── Colonoscopic_frames
│     ├── train_128_list.txt
│     ├── 00001
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...
│   ├── Kvasir-Capsule_frames
│     ├── train_128_list.txt
│     ├── 00001
│           ├── 00000.jpg
|           ├── ...
|     ├──  ...

🎇Sampling Endoscopy Videos

You can directly sample the endoscopy videos from the checkpoint model. Here is an example for quick usage for using our pre-trained models:

  1. Download the pre-trained weights from here and put them to specific path defined in the configs.
  2. Run sample.py by the following scripts to customize the various arguments like adjusting sampling steps.
<!-- with [`sample.py`](sample/sample.py). Weights for our pre-trained Latte model can be found [here](https://huggingface.co/maxin-cn/Latte). The script has various arguments to adjust sampling steps. --> <!-- change the classifier-free guidance scale, etc. For example, to sample from our model on FaceForensics, you can use: -->

Simple Sample to generate a video

bash sample/col.sh
bash sample/kva.sh
bash sample/cho.sh

DDP sample

bash sample/col_ddp.sh
bash sample/kva_ddp.sh
bash sample/cho_ddp.sh
<!-- or if you want to sample hundreds of videos, you can use the following script with Pytorch DDP: ```bash bash sample/ffs_ddp.sh ``` --> <!-- If you want to try generating videos from text, please download [`t2v_required_models`](https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models) and run `bash sample/t2v.sh`. -->

⏳Training Endora

The weight of pretrained DINO can be found here, and in our implementation we use ViT-B/8 during training Endora. And the saved path need to be edited in ./configs

Train Endora with the resolution of 128x128 with N GPUs on the Colonoscopic dataset

torchrun --nnodes=1 --nproc_per_node=N train.py \
  --config ./configs/col/col_train.yaml \
  --port PORT \
  --mode type_cnn \
  --prr_weight 0.5 \
  --pretrained_weights /path/to/pretrained/DINO

Run training Endora with scripts in ./train_scripts

bash train_scripts/col/train_col.sh
bash train_scripts/kva/train_kva.sh
bash train_scripts/cho/train_cho.sh
<!-- We provide the scripts [`train_endora.py`](train_with_img.py). --> <!-- Similar to [`train.py`](train.py) scripts, this scripts can be also used to train class-conditional and unconditional Latte models. For example, if you wan to train Latte model on the FaceForensics dataset, you can use --> <!-- We provide a training script for Latte in [`train.py`](train.py). This script can be used to train class-conditional and unconditional Latte models. To launch Latte (256x256) training with `N` GPUs on the FaceForensics dataset: --> <!-- ```bash torchrun --nnodes=1 --nproc_per_node=N train.py --config ./configs/ffs/ffs_train.yaml ``` --> <!-- or If you have a cluster that uses slurm, you can also train Latte's model using the following scripts: ```bash sbatch slurm_scripts/ffs.slurm ``` --> <!-- We also provide the video-image joint training scripts [`train_with_img.py`](train_with_img.py). Similar to [`train.py`](train.py) scripts, this scripts can be also used to train class-conditional and unconditional Latte models. For example, if you wan to train Latte model on the FaceForensics dataset, you can use: ```bash torchrun --nnodes=1 --nproc_per_node=N train_with_img.py --config ./configs/ffs/ffs_img_train.yaml ``` --> <!-- For training video generation on, run ``` python train.py -s data/endonerf/pulling --port 6017 --expname endonerf/pulling --configs arguments/endonerf/pulling.py ``` You can customize your training config through the config files. -->

📏Metric Evaluation

We first split the generated videos to frames and use the code fr

View on GitHub
GitHub Stars150
CategoryContent
Updated15d ago
Forks8

Languages

Python

Security Score

85/100

Audited on Mar 10, 2026

No findings