AnyV2V
Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" [TMLR 2024]
Install / Use
/learn @TIGER-AI-Lab/AnyV2VREADME
<img src="https://tiger-ai-lab.github.io/AnyV2V/static/images/icon.png" width="30"/> AnyV2V
<a href='https://huggingface.co/papers/2403.14468'><img src='https://img.shields.io/static/v1?label=Paper&message=Huggingface&color=orange'></a>
<a href='https://huggingface.co/spaces/TIGER-Lab/AnyV2V'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a>
🌐 Homepage | 📖 arXiv | 🤗 HuggingFace Demo | 🎬 Replicate Demo
This repo contains the codebase for our TMLR 2024 paper "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks"
Introduction
AnyV2V is a framework to achieve high appearance and temporal consistency in video editing.
- Perform Video Editing WITH ONLY SINGLE IMAGE
- turning video editing into an image editing problem
- can seamlessly build on top of image editing methods to perform diverse types of editing
- Training-Free
- Does not require any training/fine-tuning
📰 News
- 2024 Oct 29: Paper accepted to TMLR 2024.
- 2024 Apr 16: Local Gradio demo now supports edits up to 16 seconds (128 frames).
- 2024 Apr 11: Added local gradio demo for AnyV2V(i2vgen-xl)+InstantStyle.
- 2024 Apr 7: Added sections the showcases. Share your AnyV2V Edits with us!
- 2024 Apr 7: We recommend using InstantStyle with AnyV2V for Video Stylization! Check out the demo!!
- 2024 Apr 3: HuggingFace Demo is available!
- 2024 Apr 2: Added local Gradio demo for AnyV2V(i2vgen-xl).
- 2024 Mar 24: Added Replicate demo for AnyV2V(i2vgen-xl). Thanks @chenxwh for the effort!!
- 2024 Mar 22: Code released.
- 2024 Mar 21: Our paper is featured on Huggingface Daily Papers!
- 2024 Mar 21: Paper available on Arxiv. AnyV2V is the first work to leverage I2V models in Video Editing!
▶️ Quick Start for AnyV2V(i2vgen-xl)
Environment
Prepare the codebase of the AnyV2V project and Conda environment using the following commands:
git clone https://github.com/TIGER-AI-Lab/AnyV2V
cd AnyV2V
cd i2vgen-xl
conda env create -f environment.yml
🤗 Local Gradio Demo
AnyV2V+InstructPix2Pix (Prompt-based Editing)
python gradio_demo.py
AnyV2V+InstantStyle Demo (Style Transfer)
# Download InstantStyle depends
git lfs install
git clone https://huggingface.co/h94/IP-Adapter
mv IP-Adapter/models models
mv IP-Adapter/sdxl_models sdxl_models
rm -rf IP-Adapter
# Run script
python gradio_demo_style.py
📜 Notebook Demo
We provide a notebook demo i2vgen-xl/demo.ipynb for AnyV2V(i2vgen-xl).
You can run the notebook to perform Prompt-Based Editing on a single video.
Make sure the environment is set up correctly before running the notebook.
To edit multiple demo videos, please refer to the Video Editing section.
Video Editing
We provide demo source videos and edited images in the demo folder.
Below are the instructions for performing video editing on the provided source videos.
Navigate to i2vgen-xl/configs/group_ddim_inversion and i2vgen-xl/configs/group_pnp_edit:
- Modify the
template.yamlfiles to specify thedevice. - Modify the
group_config.jsonfiles according to the provided examples. The configurations ingroup_config.jsonwill override the configurations intemplate.yaml. To enable an example, setactive: true; to disable it, setactive: false.
Then you can run the following command to perform inference:
cd i2vgen-xl/scripts
bash run_group_ddim_inversion.sh
bash run_group_pnp_edit.sh
or run the following command using Python:
cd i2vgen-xl/scripts
# First invert the latent of source video
python run_group_ddim_inversion.py \
--template_config "configs/group_ddim_inversion/template.yaml" \
--configs_json "configs/group_ddim_inversion/group_config.json"
# Then run Anyv2v pipeline with the source video latent
python run_group_pnp_edit.py \
--template_config "configs/group_pnp_edit/template.yaml" \
--configs_json "configs/group_pnp_edit/group_config.json"
To edit your own source videos, follow the steps outlined below:
- Prepare the source video
Your-Video.mp4in thedemofolder. - Create two new folders
demo/Your-Video-Nameanddemo/Your-Video-Name/edited_first_frame. - Run the following command to perform first frame image editing:
python edit_image.py --video_path "./demo/Your-Video.mp4" --input_dir "./demo" --output_dir "./demo/Your-Video-Name/edited_first_frame" --prompt "Your prompt"
You can also use any other image editing method, such as InstantID, AnyDoor, or WISE, to edit the first frame.
Please put the edited first frame images in the demo/Your-Video-Name/edited_first_frame folder.
- Add an entry to the
group_config.jsonfiles located ini2vgen-xl/configs/group_ddim_inversionandi2vgen-xl/configs/group_pnp_editdirectories for your video, following the provided examples. - Run the inference command:
cd i2vgen-xl/scripts
bash run_group_ddim_inversion.sh
bash run_group_pnp_edit.sh
▶️ Quick Start for AnyV2V(consisti2v)
Please refer to ./consisti2v/README.md
▶️ Quick Start for AnyV2V(seine)
Please refer to ./seine/README.md
▶️ Misc
First Frame Image Edit
We provide the instructpix2pix port for image editing with an instruction prompt.
usage: edit_image.py [-h] [--model {magicbrush,instructpix2pix}]
[--video_path VIDEO_PATH] [--input_dir INPUT_DIR]
[--output_dir OUTPUT_DIR] [--prompt PROMPT] [--force_512]
[--dict_file DICT_FILE] [--seed SEED]
[--negative_prompt NEGATIVE_PROMPT]
Process some images.
optional arguments:
-h, --help show this help message and exit
--model {magicbrush,instructpix2pix}
Name of the image editing model
--video_path VIDEO_PATH
Name of the video
--input_dir INPUT_DIR
Directory containing the video
--output_dir OUTPUT_DIR
Directory to save the processed images
--prompt PROMPT Instruction prompt for editing
--force_512 Force resize to 512x512 when feeding into image model
--dict_file DICT_FILE
JSON file containing files, instructions etc.
--seed SEED Seed for random number generator
--negative_prompt NEGATIVE_PROMPT
Negative prompt for editing
Usage Example:
python edit_image.py --video_path "./demo/Man Walking.mp4" --input_dir "./demo" --output_dir "./demo/Man Walking/edited_first_frame" --prompt "turn the man into darth vader"
You can use other image models for editing, here are some online demo models that you can use:
- Idenity Manipulation model: InstantID
- Subject Driven Image editing model: AnyDoor
- Style Transfer model: WISE
- Style Transfer model: InstantStyle
Video Preprocess Script
It is possible to edit videos with 16 seconds (128 frames) under an A6000 gpu. We provide a script to trim and crop video into any dimension and length.
usage: prepare_video.py [-h] [--input_folder INPUT_FOLDER] [--video_path VIDEO_PATH] [--output_folder OUTPUT_FOLDER]
[--clip_duration CLIP_DURATION] [--width WIDTH] [--height HEIGHT] [--start_time START_TIME] [--end_time END_TIME]
[--n_frames N_FRAMES] [--center_crop] [--x_offset X_OFFSET] [--y_offset Y_OFFSET] [--longest_to_width]
Crop and resize video segments.
optional arguments:
-h, --help show this help message and exit
--input_folder INPUT_FOLDER
Path to the input folder containing video files
--video_path VIDEO_PATH
Path to the input video file
--output_folder OUTPUT_FOLDER
Path to the folder for the output videos
--clip_duration CLIP_DURATION
Duration of the video clips in seconds default=2
--width WIDTH Width of the output video (optional) default=512
--height HEIGHT Height of the output video (optional) default
