GCPO
[ICLR 2026] Code for "Group Critical-token Policy Optimization for Autoregressive Image Generation"
Install / Use
/learn @zghhui/GCPOREADME
🔈 News
- [2025-09] All weights are avialable on Huggingface!
- [2025-09] Train and inference code are avialable
- [2025-09] GCPO is released on Arixv.
🔍 Introduction
We propose a novel reinforcement learning framework, Group Critical-token Policy Optimization (GCPO), to achieve efficient policy optimization of critical tokens. We consider critical tokens from three perspectives:
- Causal dependency
- Entropy-induced spatial structure
- RLVR-focused token diversity
We select 30% of critical tokens from all tokens and combine them with Dynamic Advantage Weights to achieve precise optimization✌
<div style="text-align: center; width: 100%;"> <img src="assets/method.png" alt="Image 1" style="width: 100%;"> </div>We validate the effectiveness of GCPO on multiple models (LlamaGen, Janus-Pro) and text-to-image benchmarks (Geneval, T2I-CompBench, DrawBench). Notably, GCPO achieves 0.90 on Geneval built on the original Janus-Pro-7B, and has strong genvalidate the effectiveness of GCPO on multiple models (LlamaGen, Janus-Pro) and text-to-image benchmarks (Geneval, T2I-CompBench, DrawBench). Notably, GCPO achieves 0.90 on Geneval built on the original Janus-Pro-7B, and has strong generalization on T2I-CompBench.
<details><summary><b>CLICK for Detailed Results</b></summary> Visualization Results <div style="text-align: center; width: 100%;"> <img src="assets/compare.png" alt="Image 1" style="width: 100%;"> </div>Geneval Results
<div style="text-align: center; width: 100%;"> <img src="assets/geneval.png" alt="Image 1" style="width: 100%;"> </div>T2I-CompBench Results
<div style="text-align: center; width: 100%;"> <img src="assets/t2i-compbench.png" alt="Image 1" style="width: 100%;"> </div> </details>🤗 Model List
| Model | Preference Alignment | GenEval | |:--------------:|:-------------------:|:----------------:| | LlamaGen-T2I | 🤗HPS | | | Janus-Pro 1B | 🤗HPS | 🤗Geneval | | Janus-Pro 7B | 🤗HPS | 🤗Geneval |
🔧 Environment SetUp
1. Clone this repository and navigate to the folder:
git clone https://github.com/zghhui/GCPO.git
cd GCPO
2. Install the training package:
We provide training codes for LlamaGen and Janus-Pro, and recommend installing the environments for each.
For LlamaGen:
conda create -n gcpo_llamagen python=3.10
conda activate gcpo_llamagen
pip install -r llamaGen/requirements.txt
For Janus-Pro:
conda create -n gcpo_janus python=3.10
conda activate gcpo_janus
pip install -r janus/requirements.txt
3. Download Models
For LlamaGen:
huggingface-cli download zghhui/LlamaGen-T2I
huggingface-cli download google/flan-t5-xl
wget https://huggingface.co/peizesun/llamagen_t2i/resolve/main/vq_ds16_t2i.pt
For Janus-Pro:
huggingface-cli download deepseek-ai/Janus-Pro-1B
huggingface-cli download deepseek-ai/Janus-Pro-7B
For HPS Reward:
huggingface-cli xswu/HPSv2
For Geneval Reward:
- Please according to the instructions in Flow-GRPO and reward-server.
🚀 Training GCPO
LlamaGen
cd llamaGen/src
bash scripts/rl_gcpo_hps.sh
[!Note]
Remember to modify the t5_model path in
gcpo/llamaGen/simpar/model/llama_model.py(line 1244)
Janus-Pro
cd janus/src
bash scripts/run_gcpo_hps.sh
bash scripts/run_gcpo_geneval.sh
[!Note]
Please run geneval server before running geneval reward. The reward function is located in
utils/reward_geneval.py, and the IP of server can be modified here.
💫 Inference
LlamaGen
cd llamaGen
bash scripts/inference.sh
Janus-Pro
cd janus/src
bash scripts/inference.sh
📧 Contact
If you have any comments or questions, please open a new issue.
🤗 Acknowledgments
Our training code is based on T2I-R1, SimpleAR, and Flow-GRPO.
Thanks to all the contributors!
⭐ Citation
If you find GCPO useful for your research or projects, we would greatly appreciate it if you could cite the following paper:
@article{zhang2025group,
title={Group Critical-token Policy Optimization for Autoregressive Image Generation},
author={Zhang, Guohui and Yu, Hu and Ma, Xiaoxiao and Zhang, JingHao and Pan, Yaning and Yao, Mingde and Xiao, Jie and Huang, Linjiang and Zhao, Feng},
journal={arXiv preprint arXiv:2509.22485},
year={2025}
}
Related Skills
qqbot-channel
352.2kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.6k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
352.2kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
arscontexta
3.1kClaude Code plugin that generates individualized knowledge systems from conversation. You describe how you think and work, have a conversation and get a complete second brain as markdown files you own.
