UniWorld
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Install / Use
/learn @PKU-YuanGroup/UniWorldREADME
📣 News
-
[2025/11/25]:🤗 We release Uniworld-OSP2.0, a VLM-Enhanced Unified Framework for Image-to-Video Generation. The architecture scales FlashI2V to 14B parameters and introduces a novel conditioning mechanism based on a 7B VLM to losslessly inherit powerful semantic understanding. Uniworld-OSP2.0 surpasses the video generation model Wan2.1 across six key evaluation metrics on Vbench-I2V.
-
[2025/10/19]: We release UniWorld-V2, which employs DiffusionNFT and a training-free reward model derived from pretrained MLLMs to fine-tune diffusion models for image editing. UniWorld-Qwen-Image-Edit-2509 and UniWorld-FLUX.1-Kontext-Dev are open-sourced.
-
[2025.06.03] 🤗 We release UniWorld-V1, a unified framework for understanding, generation, and editing. All data, models, training code, and evaluation code are open-sourced. Checking our report for more details. Welcome to watch 👀 this repository for the latest updates.
</p ></details>
💡 Hub
😍 Gallery
UniWorld-OSP2.0
| Model | I2V Paradigm | Subject Consistency ↑ | Background Consistency ↑ | Motion Smoothness ↑ | Dynamic Degree ↑ | Aesthetic Quality ↑ | Imaging Quality ↑ | I2V Subject Consistency ↑ | I2V Background Consistency ↑ | | ----------------------- | -------------------------------------- | ---------------------------------- | ------------------------------------- | -------------------------------- | ----------------------------- | -------------------------------- | ------------------------------ | -------------------------------------- | ----------------------------------------- | | SVD-XT-1.0 (1.5B) | Repeating Concat and Adding Noise | 95.52 | 96.61 | 98.09 | 52.36 | 60.15 | 69.80 | 97.52 | 97.63 | | SVD-XT-1.1 (1.5B) | Repeating Concat and Adding Noise | 95.42 | 96.77 | 98.12 | 43.17 | 60.23 | 70.23 | 97.51 | 97.62 | | SEINE-512x512 (1.8B) | Inpainting | 95.28 | 97.12 | 97.12 | 27.07 | 64.55 | 71.39 | 97.15 | 96.94 | | CogVideoX-5B-I2V | Zero-padding Concat and Adding Noise | 94.34 | 96.42 | 98.40 | 33.17 | 61.87 | 70.01 | 97.19 | 96.74 | | Wan2.1-I2V-14B-720P | Inpainting | 94.86 | 97.07 | 97.90 | 51.38 | 64.75 | 70.44 | 96.95 | 96.44 | | CogVideoX1.5-5B-I2V | Zero-padding Concat and Adding Noise | 95.04 | 96.52 | 98.47 | 37.48 | 62.68 | 70.99 | 97.78 | 98.73 | | Wan2.1-I2V-14B-480P | Inpainting | 95.68 | 97.44 | 98.46 | 45.20 | 61.44 | 70.37 | 97.83 | 99.08 | | Uniworld-OSP2.0 | FlashI2V | 96.21 | 97.71 | 98.47 | 46.10 | 66.55 | 70.57 | 97.99 | 98.94
UniWorld-V2
| Original | Prompt | Nano-banana | GPT-4o | Qwen-Image-Edit | UniWorld-V2 (Ours) |
| :---: | :---: | :---: | :---: | :---: | :---: |
| <img src="UniWorld-V2/imgs/0-0.jpg" width="400"> | Case 1: 把鸟移动到红框里,删除掉现在的鸟,最后移除红框 | <img src="UniWorld-V2/imgs/0-1.webp" width="400"> | <img src="UniWorld-V2/imgs/0-2.webp" width="400"> | <img src="UniWorld-V2/imgs/0-3.webp" width="400"> | <img src="UniWorld-V2/imgs/0-4.webp" width="400"> (✅正确执行指令)|
| <img src="UniWorld-V2/imgs/1-0.jpg" width="400"> | Case 2: 把中间白色衣服戴口罩女生的手势改成OK | <img src="UniWorld-V2/imgs/1-1.webp" width="400"> | <img src="UniWorld-V2/imgs/1-2.webp" width="400"> | <img src="UniWorld-V2/imgs/1-3.webp" width="400"> | <img src="UniWorld-V2/imgs/1-4.webp" width="400"> (✅OK手势 )|
| <img src="UniWorld-V2/imgs/2-0.jpg" width="400
