OmniGen2

OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871

Generate Convert Improve

Install / Use

/learn @VectorSpaceLab/OmniGen2

About this skill

Quality Score

0/100

README

<img src="assets/brand.png" width="65%"> <a href="https://vectorspacelab.github.io/OmniGen2"><img src="https://img.shields.io/badge/Project%20Page-OmniGen2-yellow" alt="project page"></a> <a href="https://arxiv.org/abs/2506.18871"><img src="https://img.shields.io/badge/arXiv%20paper-2506.18871-b31b1b.svg" alt="arxiv"></a> <a href="https://github.com/VectorSpaceLab/OmniGen2?tab=readme-ov-file#-gradio-demo"><img src="https://img.shields.io/badge/Online%20Demo-🤗-blue" alt="demo"></a> <a href="https://huggingface.co/spaces/OmniGen2/OmniGen2"><img src="https://img.shields.io/badge/HF%20Spaces-🤗-lightblue" alt="demo"></a> <a href="https://huggingface.co/OmniGen2/OmniGen2"><img src="https://img.shields.io/badge/Model-🤗-yellow" alt="model"></a> <a href="https://huggingface.co/datasets/OmniGen2/OmniContext"><img src="https://img.shields.io/badge/Benchmark-🤗-yellow" alt="model"></a> <a href="https://huggingface.co/datasets/OmniGen2/X2I2"><img src="https://img.shields.io/badge/Dataset-🤗-yellow" alt="model"></a> <h4 align="center"> <a href=#-news>News</a> | <a href=#-quick-start>Quick Start</a> | <a href=#-usage-tips>Usage Tips</a> | <a href=#-limitations-and-suggestions>Limitations</a> | <a href=#-gradio-demo>Online Demos</a> | <a href=#%EF%B8%8F-citing-us>Citation</a> </h4>

🔥 News

2025-09-30: Introducing EditScore — a family of state-of-the-art open-source reward models (7B–72B) for instruction-guided image editing.
- Model Release: As part of this, We release OmniGen2-EditScore7B, unlocking online RL For Image Editing via high-fidelity EditScore. LoRA weights are now available on Hugging Face and ModelScope.
- Benchmark: We are also launching EditReward-Bench to provide a systematic way to evaluate and compare reward models.
- Check out the project repository to get started!
2025-07-23: Users can access OmniGen2 through web app.
2025-07-05: Training datasets X2I2 are available.
2025-07-03: OmniGen2 now supports TeaCache and TaylorSeer for faster inference, see Usage Tips for details. Thanks @legitnull for great TeaCache-PR and TaylorSeer-PR.
2025-07-01: OmniGen2 is supported by ComfyUI official, thanks !!
2025-06-30: Training code is available, see fine-tuning for details.
2025-06-28: We release OmniContext benchmark. The evaluation codes are in omnicontext.
2025-06-24: Technical Report is available.
2025-06-23: We’ve updated our code and HF model—OmniGen2 now runs without flash-attn. Users can still install it for optimal performance.
2025-06-20: Updated resource requirements, adding CPU offload support for devices with limited VRAM.
2025-06-16: Gradio and Jupyter is available. Online Gradio Demo: Demo1; Chat-Demo1; see more demo links in gradio section
2025-06-16: We release OmniGen2, a multimodal generation model, model weights can be accessed in huggingface and modelscope.

Introduction

OmniGen2 is a powerful and efficient generative model. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer. OmniGen2 has competitive performance across four primary capabilities:

Visual Understanding: Inherits the robust ability to interpret and analyze image content from its Qwen-VL-2.5 foundation.
Text-to-Image Generation: Creates high-fidelity and aesthetically pleasing images from textual prompts.
Instruction-guided Image Editing: Executes complex, instruction-based image modifications with high precision, achieving state-of-the-art performance among open-source models.
In-context Generation: A versatile capability to process and flexibly combine diverse inputs—including humans, reference objects, and scenes—to produce novel and coherent visual outputs.

We will release the training code and dataset. Stay tuned!

Some good cases of OmniGen2:

<img src="assets/teaser.jpg" width="95%"> Demonstrations. <img src="assets/examples_edit.png" width="95%"> Good demonstrations of OmniGen2's image editing capabilities. <img src="assets/examples_subject.png" width="95%"> Good demonstrations of OmniGen2's in-context generation capabilities.

📌 TODO

[x] Technical report.
[x] Support CPU offload and improve inference efficiency.
[x] In-context generation benchmark: OmniContext.
[ ] Integration of diffusers.
[x] Training datasets.
[ ] Training data construction pipeline.
[ ] ComfyUI Demo (commuity support will be greatly appreciated!).

🚀 Quick Start

🛠️ Environment Setup

✅ Recommended Setup

# 1. Clone the repo
git clone git@github.com:VectorSpaceLab/OmniGen2.git
cd OmniGen2

# 2. (Optional) Create a clean Python environment
conda create -n omnigen2 python=3.11
conda activate omnigen2

# 3. Install dependencies
# 3.1 Install PyTorch (choose correct CUDA version)
pip install torch==2.6.0 torchvision --extra-index-url https://download.pytorch.org/whl/cu124

# 3.2 Install other required packages
pip install -r requirements.txt

# Note: Version 2.7.4.post1 is specified for compatibility with CUDA 12.4.
# Feel free to use a newer version if you use CUDA 12.6 or they fixed this compatibility issue.
# OmniGen2 runs even without flash-attn, though we recommend install it for best performance.
pip install flash-attn==2.7.4.post1 --no-build-isolation

🌏 For users in Mainland China

# Install PyTorch from a domestic mirror
pip install torch==2.6.0 torchvision --index-url https://mirror.sjtu.edu.cn/pytorch-wheels/cu124

# Install other dependencies from Tsinghua mirror
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

# Note: Version 2.7.4.post1 is specified for compatibility with CUDA 12.4.
# Feel free to use a newer version if you use CUDA 12.6 or they fixed this compatibility issue.
# OmniGen2 runs even without flash-attn, though we recommend install it for best performance.
pip install flash-attn==2.7.4.post1 --no-build-isolation -i https://pypi.tuna.tsinghua.edu.cn/simple

🧪 Run Examples

# Visual Understanding
bash example_understanding.sh

# Text-to-image generation
bash example_t2i.sh

# Instruction-guided image editing
bash example_edit.sh

# In-context generation
bash example_in_context_generation.sh

🌐 Gradio Demo

Online Demo: HF Spaces. Beyond Hugging Face Spaces, we are temporarily allocating additional GPU resources to ensure smooth access to the online demos. If you notice a long queue for a particular link, please try other links:

Demo1, Demo2, Demo3, Demo4

Chat-Demo1, Chat-Demo2, Chat-Demo3, Chat-Demo4
Web Application: You can also try the self-hosted OmniGen2 web application by visiting this link or scanning the QR code below:

<img src="assets/qr-code.PNG" width="30%"> OmniGen2 web.

Run Locally:

# for only generating image
pip install gradio
python app.py
# Optional: Share demo with public link (You need to be able to access huggingface)
python app.py --share

# for generating image or text
pip install gradio
python app_chat.py

💡 Usage Tips

To achieve optimal results with OmniGen2, you can adjust the following key hyperparameters based on your specific use case.

text_guidance_scale: Controls how strictly the output adheres to the text prompt (Classifier-Free Guidance).
image_guidance_scale: This controls how much the final image should resemble the input reference image.
- The Trade-off: A higher value makes the output more faithful to the reference image's structure and style, but it might ignore parts of your text prompt. A lower value (~1.5) gives the text prompt more influence.
- Tip: For image editing task, we recommend to set it between 1.2 and 2.0; for in-context generateion task, a higher image_guidance_scale will maintian more details in input images, and we recommend to set it between 2.5 and 3.0.
max_pixels: Automatically resizes images when their tot

Related Skills

node-connect

353.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

111.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

353.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

353.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。