Bizgen
[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/
Install / Use
/learn @1230young/BizgenREADME
<span style="font-size: 16px; font-weight: 600;">This repository supports article-level visual text rendering of business content (infographics and slides) based on ultra-dense layouts
<!-- Features -->🌟 Features
- Long context length: Supports ultra-dense layouts with 50+ layers and article-level descriptive prompts with more than 1000 tokens, and can generate high-quality business content with up to 2240*896 resolution.
- Powerful visual text rendering: Supports article-level visual text rendering in ten different languages and maintains high spelling accuracy.
- Image generation diversity and flexibility: Supports layer-wise detail refinement through layout conditional CFG.
🚧 TODO List
- [x] Release inference code and pretrained model
- [ ] Release training code
Table of Contents
Environment Setup
1. Create Conda Environment
conda create -n bizgen python=3.10 -y
conda activate bizgen
2. Install Dependencies
git clone
cd bizgen
pip install -r requirements.txt
3. Login to Hugging Face
huggingface-cli login
Quick Start
Use inference.py to simply have a try:
python inference.py
Testing BizGen
1. Download Checkpoints
Create a path bizgen/checkpoints and download the following checkpoints into this path.
| Name | Description|
|----------|-------------|
| byt5 | ByT5 model checkpoint |
| lora_infographic | Unet LoRA weights and finetuned ByT5 mapper checkpoint for infographic |
| lora_slides | Unet LoRA weights and finetuned ByT5 mapper checkpoint for slides |
| spo | Post-trained SDXL checkpoint (for aesthetic improvement) |
The downloaded checkpoints should be organized as follows:
checkpoints/
├── byt5/
│ ├── base.pt
│ └── byt5_model.pt
├── lora/
| ├── infographic/
| | ├──byt5_mapper.pt
| | └──unet_lora.pt
| └── slides/
| ├──byt5_mapper.pt
| └──unet_lora.pt
└── spo
2. Run the testing Script
For infographics:
python inference.py \
--ckpt_dir checkpoints/lora/infographic \
--output_dir infographic \
--sample_list meta/infographics.json
For slides:
python inference.py \
--ckpt_dir checkpoints/lora/slides \
--output_dir slide \
--sample_list meta/slides.json
:mailbox_with_mail: Citation
If you find this code useful in your research, please consider citing:
@misc{peng2025bizgenadvancingarticlelevelvisual,
title={BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation},
author={Yuyang Peng and Shishi Xiao and Keming Wu and Qisheng Liao and Bohan Chen and Kevin Lin and Danqing Huang and Ji Li and Yuhui Yuan},
year={2025},
eprint={2503.20672},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.20672},
}
@article{liu2024glyphv2,
title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering},
author={Liu, Zeyu and Liang, Weicong and Zhao, Yiming and Chen, Bohan and Li, Ji and Yuan, Yuhui},
journal={arXiv preprint arXiv:2406.10208},
year={2024}
}
@article{liu2024glyph,
title={Glyph-byt5: A customized text encoder for accurate visual text rendering},
author={Liu, Zeyu and Liang, Weicong and Liang, Zhanhao and Luo, Chong and Li, Ji and Huang, Gao and Yuan, Yuhui},
journal={arXiv preprint arXiv:2403.09622},
year={2024}
}
Related Skills
node-connect
343.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
92.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
