WISA
World Simulator Assistant for Physics-Aware Text-to-Video Generation
Install / Use
/learn @360CVGroup/WISAREADME
WISA
This is the official reproduction of WISA, designed to enhance Text-to-Video models by improving their ability to simulate the real world.
WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation
</br>
Jing Wang*, Ao Ma*, Ke Cao*, Jun Zheng, Zhanjie Zhang, Jiasong Feng, Shanyuan Liu, Yuhang Ma, Bo Cheng, Dawei Leng‡, Yuhui Yin, Xiaodan Liang‡(*Equal Contribution, ‡Corresponding Authors)
</br>
📰 News
- [2025.05.15] 🔥 We are excited to announce the official release of WISA's codebase and model weights on GitHub! This implementation is built upon the powerful finetrainers framework.
- [2025.03.28] We have uploaded the WISA-80K dataset to Hugging Face, including processed video clips and annotations.
- [2025.03.12] We have released our paper WISA and created a dedicated project homepage.
🚀 Quick Started
1. Environment Set Up
Clone this repository and install packages.
git clone https://github.com/360CVGroup/WISA.git
cd WISA
conda create -n wisa python=3.10
conda activate wisa
pip install -r requirements.txt
2. Download Pretrained Weights
1. Download Text-to-Video Pretrained Models
Please download CogvideoX and Wan2.1 checkpoints from ModelScope and put it in ./pretrain_models/.
mkdir ./pretrain_models
cd ./pretrain_models
pip install modelscope
modelscope download Wan-AI/Wan2.1-T2V-14B-Diffusers --local_dir ./Wan2.1-T2V-14B-Diffusers
modelscope download ZhipuAI/CogVideoX-5b --local_dir ./CogVideoX-5b-Diffusers
2. Download WISA Pretrained Lora and Physical-block Weight
Please download weight from Huggingface and put it in ./pretrain_models/WISA/.
git lfs install
git clone https://huggingface.co/qihoo360/WISA
cd ..
3. Generate Video
You can revise the MODEL_TYPE, GEN_TYPE, PROMPT_PATH, OUTPUT_FILE and LORA_PATH in inference.sh for different inference settings.
Then run
sh inference.sh
✨ Training
1. Download WISA-80K
Download the WISA-80K dataset from huggingface.
2. Precomputing Latents and Text Embeddings (Optional)
This project supports precomputing and saving the latent codes of videos and text embeddings to avoid loading the VAE and Text Encoder onto the GPU during training, thereby reducing GPU memory usage. This operation is essential when training Wan2.1-14B; otherwise, it will result in an out-of-memory (OOM) error.
Step 1: you need to add the following parameters to the dataset_cmd in your training script (like examples/training/sft/wan/crush_smol_lora/train_wisa.sh), and ensure you have sufficient storage space available.
dataset_cmd=(
--dataset_config $TRAINING_DATASET_CONFIG
--dataset_shuffle_buffer_size 10
--precomputation_items 2000 # Number of samples to precompute
--enable_precomputation # Flag to activate precomputation
--precomputation_once
--precomputation_dir ./cache/path # Directory for cached outputs
--hash_save # Enable hash-based filename storage
--first_samples
)
Step 2: Configure dataset paths in file examples/training/sft/wan/crush_smol_lora/training_wisa.json and execute
sh examples/training/sft/wan/crush_smol_lora/train_wisa.sh
"Note: Process data in batches to prevent CPU cache overload (recommended maximum: 12,000 samples per batch)."
Step 3: Disable --enable_precomputation flag
dataset_cmd=(
--dataset_config $TRAINING_DATASET_CONFIG
--dataset_shuffle_buffer_size 10
--precomputation_items 2000 # Number of samples to precompute
# --enable_precomputation # Flag to activate precomputation
--precomputation_once
--precomputation_dir ./cache/path # Directory for cached outputs
--hash_save # Enable hash-based filename storage
--first_samples
)
3. Start Training
sh examples/training/sft/wan/crush_smol_lora/train_wisa.sh
Due to quality issues in the validation phase (bug-induced video generation artifacts causing significant deviation from test-phase results), we have disabled validation.
👍 Acknowledgement
This work stands on the shoulders of groundbreaking research and open-source contributions. We extend our deepest gratitude to the authors and contributors of the following projects:
- CogVideoX - For their pioneering work in video generation
- Wan2.1 - For their foundational contributions to large-scale video models
Special thanks to the finetrainers framework for enabling efficient model training - your excellent work has been invaluable to this project.
BibTeX
@misc{wang2025wisa,
title={WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation},
author={Jing Wang and Ao Ma and Ke Cao and Jun Zheng and Zhanjie Zhang and Jiasong Feng and Shanyuan Liu and Yuhang Ma and Bo Cheng and Dawei Leng and Yuhui Yin and Xiaodan Liang},
year={2025},
eprint={2502.08153},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.08153},
}
Related Skills
docs-writer
98.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
329.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
2.8kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
be
Assume the personality of the Persona described in any of the document available in the @~/.ai/personas directory.
