SkillAgentSearch skills...

UniWorld

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Install / Use

/learn @PKU-YuanGroup/UniWorld

README

<p align="center"> <img src="https://s21.ax1x.com/2025/06/03/pVCBdw8.png" width="200"/> <p> <h2 align="center"> <a href="https://arxiv.org/abs/2510.16888"> UniWorld-Family </a> </h2>

UniWorld-V2 UniWorld-V1 hf_paper hf_paper model model data License Twitter <br><br> GitHub repo stars  GitHub repo forks  GitHub repo watchers  GitHub repo size <br> GitHub repo contributors GitHub Commit Pr GitHub issues GitHub closed issues

<!-- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniworld-v1-high-resolution-semantic-encoders/image-generation-on-wise)](https://paperswithcode.com/sota/image-generation-on-wise?p=uniworld-v1-high-resolution-semantic-encoders) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniworld-v1-high-resolution-semantic-encoders/image-editing-on-imgedit-data)](https://paperswithcode.com/sota/image-editing-on-imgedit-data?p=uniworld-v1-high-resolution-semantic-encoders) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/uniworld-v1-high-resolution-semantic-encoders/text-to-image-generation-on-geneval)](https://paperswithcode.com/sota/text-to-image-generation-on-geneval?p=uniworld-v1-high-resolution-semantic-encoders) <br> -->

📣 News

  • [2025/11/25]:🤗 We release Uniworld-OSP2.0, a VLM-Enhanced Unified Framework for Image-to-Video Generation. The architecture scales FlashI2V to 14B parameters and introduces a novel conditioning mechanism based on a 7B VLM to losslessly inherit powerful semantic understanding. Uniworld-OSP2.0 surpasses the video generation model Wan2.1 across six key evaluation metrics on Vbench-I2V.

  • [2025/10/19]: We release UniWorld-V2, which employs DiffusionNFT and a training-free reward model derived from pretrained MLLMs to fine-tune diffusion models for image editing. UniWorld-Qwen-Image-Edit-2509 and UniWorld-FLUX.1-Kontext-Dev are open-sourced.

  • [2025.06.03] 🤗 We release UniWorld-V1, a unified framework for understanding, generation, and editing. All data, models, training code, and evaluation code are open-sourced. Checking our report for more details. Welcome to watch 👀 this repository for the latest updates.

<p align="center"> <img src="https://github.com/user-attachments/assets/e187584a-f096-44df-b26b-f85aae838a18" width="200"/> <p>
</p ></details>

💡 Hub

😍 Gallery

UniWorld-OSP2.0

| Model | I2V Paradigm | Subject Consistency ↑ | Background Consistency ↑ | Motion Smoothness ↑ | Dynamic Degree ↑ | Aesthetic Quality ↑ | Imaging Quality ↑ | I2V Subject Consistency ↑ | I2V Background Consistency ↑ | | ----------------------- | -------------------------------------- | ---------------------------------- | ------------------------------------- | -------------------------------- | ----------------------------- | -------------------------------- | ------------------------------ | -------------------------------------- | ----------------------------------------- | | SVD-XT-1.0 (1.5B) | Repeating Concat and Adding Noise | 95.52 | 96.61 | 98.09 | 52.36 | 60.15 | 69.80 | 97.52 | 97.63 | | SVD-XT-1.1 (1.5B) | Repeating Concat and Adding Noise | 95.42 | 96.77 | 98.12 | 43.17 | 60.23 | 70.23 | 97.51 | 97.62 | | SEINE-512x512 (1.8B) | Inpainting | 95.28 | 97.12 | 97.12 | 27.07 | 64.55 | 71.39 | 97.15 | 96.94 | | CogVideoX-5B-I2V | Zero-padding Concat and Adding Noise | 94.34 | 96.42 | 98.40 | 33.17 | 61.87 | 70.01 | 97.19 | 96.74 | | Wan2.1-I2V-14B-720P | Inpainting | 94.86 | 97.07 | 97.90 | 51.38 | 64.75 | 70.44 | 96.95 | 96.44 | | CogVideoX1.5-5B-I2V | Zero-padding Concat and Adding Noise | 95.04 | 96.52 | 98.47 | 37.48 | 62.68 | 70.99 | 97.78 | 98.73 | | Wan2.1-I2V-14B-480P | Inpainting | 95.68 | 97.44 | 98.46 | 45.20 | 61.44 | 70.37 | 97.83 | 99.08 | | Uniworld-OSP2.0 | FlashI2V | 96.21 | 97.71 | 98.47 | 46.10 | 66.55 | 70.57 | 97.99 | 98.94

UniWorld-V2

| Original | Prompt | Nano-banana | GPT-4o | Qwen-Image-Edit | UniWorld-V2 (Ours) | | :---: | :---: | :---: | :---: | :---: | :---: | | <img src="UniWorld-V2/imgs/0-0.jpg" width="400"> | Case 1: 把鸟移动到红框里,删除掉现在的鸟,最后移除红框 | <img src="UniWorld-V2/imgs/0-1.webp" width="400"> | <img src="UniWorld-V2/imgs/0-2.webp" width="400"> | <img src="UniWorld-V2/imgs/0-3.webp" width="400"> | <img src="UniWorld-V2/imgs/0-4.webp" width="400"> (✅正确执行指令)| | <img src="UniWorld-V2/imgs/1-0.jpg" width="400"> | Case 2: 把中间白色衣服戴口罩女生的手势改成OK | <img src="UniWorld-V2/imgs/1-1.webp" width="400"> | <img src="UniWorld-V2/imgs/1-2.webp" width="400"> | <img src="UniWorld-V2/imgs/1-3.webp" width="400"> | <img src="UniWorld-V2/imgs/1-4.webp" width="400"> (✅OK手势 )| | <img src="UniWorld-V2/imgs/2-0.jpg" width="400

View on GitHub
GitHub Stars862
CategoryDevelopment
Updated10h ago
Forks28

Languages

Python

Security Score

85/100

Audited on Apr 7, 2026

No findings