BLIP3o
Official implementation of BLIP3o-Series
Install / Use
/learn @JiuhaiChen/BLIP3oREADME
BLIP3o-NEXT
<!-- </tiny><a href="https://jiuhaichen.github.io/BLIP3o-NEXT.github.io/" style="font-weight: normal;">Project Page</a></tiny> -->📖 Arxiv
<p align="center"> <img src="figure/blip3o_next.png" width="1342"> </p>AR + Diffusion Architecture: Similar with BLIP3o, BLIP3o-NEXT generates intermediate features via the autoregressive model and then conditions on these features to generate images through the diffusion model.
Discrete Image Token Supervision: We add discrete SigLIP-2 image token prediction as extra training supervision, jointly optimizing CrossEntropy and the diffusion objective. By having the AR model lay down a discrete "blueprint" and feeding their hidden representations into the diffusion model, we combine structural accuracy with high visual-fidelity image outputs.
RL with verified reward: The introduction of discrete image tokens unlocks seamless compatibility with existing language-model RL framework. Using Group Relative Policy Optimization (GRPO), we train the BLIP3o-NEXT to improve prompt alignment and text rendering in image generation.
Fully Open-Source:
- Pretraining Data: 27 Million Detailed Captions, 5 Million Short Captions
- Instruction Tuning Data: BLIP3o-60k, ShareGPT-4o-Image
- Image Editing Data: [BLIP3o-edit (TODO)]
- Model Weights (3B): Pretrain, Instruction Tuning, GRPO-Geneval, GRPO-Text
- Training Code: Pretrain, Instruction Tuning, GRPO
🔥 Welcome to discuss with us if you have any questions. Discord: https://discord.gg/SsVYdV84bw or Wechat
<p align="center"> <img src="figure/wechat_2.jpg" width="256"> </p>Install package for pretraining and instruction tuning
conda create -n blip3o-next python=3.11 -y
conda activate blip3o-next
pip install --upgrade pip setuptools
pip install -r requirements.txt
pip install -e .
Import slurm config and environment
sbatch scrips/run.sh
For the inference, change the model path in inference.py and
python inference.py
For GRPO, we recommend to install a new enviroment since some version conflicts for torch if using blip3o-next environment. Also you need to install the dependency from setup.py, please follow below
cd trl
conda create -n grpo python=3.11 -y
conda activate grpo
pip install -r requirements.txt
cd ..
pip install -e .
Related Skills
node-connect
341.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
341.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.4kCommit, push, and open a PR
