SkillAgentSearch skills...

EchoDiffusion

MICCAI 2023 code for the paper: Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis. EchoDiffusion is a collection of video diffusion models trained from scratch on the EchoNet-Dynamic dataset with the imagen-pytorch repo.

Install / Use

/learn @HReynaud/EchoDiffusion
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

EchoDiffusion

⭐️ NEW ⭐️: Check our new latent video diffusion repository! It's faster and requires much less ressources while having better temporal consistency !

This repository contains the code for the paper Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis. Hadrien Reynaud, Mengyun Qiao, Mischa Dombrowski, Thomas Day, Reza Razavi, Alberto Gomez, Paul Leeson and Bernhard Kainz. MICCAI 2023.

🤗 Check out our online demo: https://huggingface.co/spaces/HReynaud/EchoDiffusionDemo<br/> 🌐 Check out our website: https://hreynaud.github.io/EchoDiffusion/<br/> 📕 MICCAI proceedings: https://link.springer.com/chapter/10.1007/978-3-031-43999-5_14<br/>

This README is divided into the following sections:

Usage

The code is divided into two parts: the ejection fraction regression models and the diffusion models. The order of execution should be:

  1. Setup this repository
  2. Train the reference ejection fraction regression model
  3. Train diffusion models
  4. Evaluate diffusion models
  5. Train ejection fraction regression models on ablated and generated data

1. Setup this repository

  • To setup this repository, first clone it and cd into it: git clone https://github.com/HReynaud/EchoDiffusion.git; cd EchoDiffusion
  • (Optional) Setup a new conda env: conda create -n echodiff python=3.10 -y; conda activate echodiff
  • Install the requirements and current repo: pip install -r requirements.txt; pip install -e .
  • Then, download the EchoNet-Dynamic dataset https://echonet.github.io/dynamic/index.html#access. Unzip the file in the data folder. The only item in the data folder should be the folder named EchoNet-Dynamic.
  • (Optional) Download the trained weights with git clone https://huggingface.co/HReynaud/EchoDiffusionWeights

To download the weights from huggingface 🤗, you may need to install git lfs, otherwise you will download references to the weights instead of the actual weights. One way is sudo apt install git-lfs. Follow this guide if you are having troubles.

The weights are organized in 3 folders, corresponding to the 3 CDMs we have trained in the paper. Each folder contains a config.yaml file and a merged.pt file which contains the weights.

2. Train the reference ejection fraction regression model

The reference ejection fraction regression model is trained on the EchoNet-Dynamic dataset. To train it, run the following command:

python ef_regression/train_reference.py --config ef_regression/config_reference

3. Train diffusion models

Training a diffusion model requires substantial computational ressources, use the provided pre-trained weights to skip this part.

The diffusion models are trained on the EchoNet-Dynamic dataset. We provide configuration files for 1SCM, 2SCM and 4SCM cascaded diffusion models. To train them, you can run the following command:

python diffusion/train.py --config diffusion/configs/1SCM.yaml --stage 1 --bs 4 --ignore_time 0.25

where --stage is the stage of the cascaded diffusion model, --bs is the batch size and --ignore_time is the chance of ignoring the time dimension in the input. This command will run the training on a single gpu. To run the training on multiple gpus, you can use the following command:

accelerate launch --multi_gpu --num_processes=8 diffusion/train.py --config diffusion/configs/1SCM.yaml --stage 1 --bs 4 --ignore_time 0.25

where --num_processes is the number of gpus to use.

We also provide slurm scripts to launch the training of all the models described in our paper on a similar cluster. Scripts are located in diffusion/slurms and can be launched with the following commands:

sbatch diffusion/train_1SCM_stage1.sh

We used nodes of 8x NVIDIA A100 GPUs with 80GB of VRAM to train the models. Each stage was train for approximately 48 hours.

4. Evaluate diffusion models

We evaluate the diffusion models on two sets of metrics to get quantitative estimates of:

  • The accuracy in the ejection fraction of the generated video compared to the ejection fraction requested as a conditioning (MAE, RMSE, $R^2$)
  • The image quality of the generated videos (SSIM, LPIPS, FID, FVD)

4.1. Compute MAE, RMSE, $R^2$, SSIM and LPIPS

All the code necessary to compute these metrics is located in the evaluate folder. The easiest way to compute these metrics is to run:

python diffusion/evaluate/generate_score_file_chunk.py --model path/to/model --reg path/to/regression.pt --bs 4 --num_noise 3 --save_videos --rand_ef

where --model is the path to the model to evaluate (ex. 1SCM_v2), --bs is the batch size, --num_noise is the number of time we resample the same video and use the ejection fraction feedback loop to keep the best score, --save_videos is a flag to save the generated videos (necessary for FID/FVD scores) and --rand_ef is a flag to generate videos with random ejection fractions instead of the ejection fractions corresponding to the anatomy of the patient used as conditioning.

As generating videos can take a long time, we provide a script to launch the generation of videos on multiple gpus. To launch the generation of videos on 8 gpus, edit diffusion/evaluate/slurms/eval_{counter}factual.sh to set the path to a model and run:

sbatch diffusion/evaluate/slurms/eval_{counter}factual.sh

The script will generate one csv file per chunk (default to 1). If you used mutliple gpus you will need to merge the csv files with diffusion/evaluate/merge_score_files.py.

To compute the actual metrics, run:

python diffusion/evaluate/compute_metrics.py --file path/to/file.csv

This will compute: MAE, RMSE, $R^2$, SSIM and LPIPS, and display the results in the terminal.

4.2. Compute FID and FVD

To compute FID and FVD, we use the StyleGAN-V repo (original repo here). To get the FID and FVD scores:

  1. Clone the StyleGAN-V repository, and install the requirements (compatible with the requirements of this repo).
  2. We provide a script to prepare the videos that have been generated by running generate_score_file_chunk.py with the --save_videos flag. That script expects the following file tree:
MODEL (ex. 1SCM)
├───factual
│   ├───images
│   │   ├───real
│   │   │   ├───video001
│   │   │   │   image001.jpg
│   │   │   │   image002.jpg
│   │   │   ...
│   │   └───fake
│   │       ├───video001
│   │       │   image001.jpg
│   │       │   image002.jpg
│   │       ...
│   └───videos
│       video001.gif
│       video002.gif
│       ...
└───counterfactual
    ├───images
    │   ├───real
    │   │   ├───video001
    │   │   │   image001.jpg
    │   │   │   image002.jpg
    │   │   ...
    │   └───fake
    │       ├───video001
    │       │   image001.jpg
    │       │   image001.jpg
    │       ...
    └───videos
        video001.gif
        video002.gif
        ...
  1. You should copy all the generated videos of that model in the corresponding folder ie counterfactual/videos if you used the --rand_ef flag, and factual/videos otherwise. Then set root_dir to the counterfactual folder path or factual folder path in diffusion/evaluate/scripts/split_videos_into_real_fake.sh and run:
sh diffusion/evaluate/scripts/split_videos_into_real_fake.sh

This will populate the images/real and images/fake folder with the frames of the videos. Now you can run the FID and FVD metric computation with:

cd stylegan-v

python src/scripts/calc_metrics_for_dataset.py --real_data_path path/to/images/real --fake_data_path path/to/images/fake --mirror 0 --gpus 1 --resolution 128 --metrics fvd2048_16f,fid50k_full

This will take a few minutes to run depending on the number of videos you generated. Results are printed in the terminal.

For reference, we obtained the following metrics for our models, using ~1200 videos each time:

| Model | Task | Resolution | Frames | Sampling time | R2 | MAE | RMSE | SSIM | LPIPS | FID | FVD | |-------|----------------|------------|--------|---------------|------|------|------|------|-------|------|------| | 1SCM | Generation | 112 x 112 | 16 | 62s | 0.64 | 9.65 | 12.2 | 0.53 | 0.21 | 12.3 | 60.5 | | 2SCM | Generation | 112 x 112 | 32 | 146s | 0.89 | 4.81 | 6.69 | 0.53 | 0.24 | 31.7 | 141 | | 4SCM | Generation | 112 x 112 | 32 | 279s | 0.93 | 3.77 | 5.26 | 0.48 | 0.25 | 24.6 | 230 | | 1SCM | Reconstruction | 112 x 112 | 16 | 62s | 0.76 | 4.51 | 6.07 | 0.53 | 0.21 | 13.6 | 89.7 | | [2SCM](https://huggingfa

View on GitHub
GitHub Stars75
CategoryContent
Updated1d ago
Forks6

Languages

Python

Security Score

100/100

Audited on Mar 20, 2026

No findings