Bizgen

[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/

Generate Convert Improve

Install / Use

/learn @1230young/Bizgen

About this skill

Quality Score

0/100

README

<h1 align="center">BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation (Glyph-ByT5-v3)</h1> <p align="center"> <a href="https://arxiv.org/abs/2503.20672"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a> <a href='https://bizgen-msra.github.io'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a> <a href='https://huggingface.co/PYY2001/BizGen'><img src='https://img.shields.io/badge/Model-Huggingface-yellow?logo=huggingface&logoColor=yellow' alt='Model'></a> <p align="center"><img src="assets/teaser_info.png" width="100%"></p> <p align="center"><img src="assets/teaser_slide.png" width="100%"></p>

<span style="font-size: 16px; font-weight: 600;">This repository supports article-level visual text rendering of business content (infographics and slides) based on ultra-dense layouts

🌟 Features

Long context length: Supports ultra-dense layouts with 50+ layers and article-level descriptive prompts with more than 1000 tokens, and can generate high-quality business content with up to 2240*896 resolution.
Powerful visual text rendering: Supports article-level visual text rendering in ten different languages and maintains high spelling accuracy.
Image generation diversity and flexibility: Supports layer-wise detail refinement through layout conditional CFG.

🚧 TODO List

[x] Release inference code and pretrained model
[ ] Release training code

Environment Setup
Testing

Environment Setup

1. Create Conda Environment

conda create -n bizgen python=3.10 -y
conda activate bizgen

2. Install Dependencies

git clone
cd bizgen
pip install -r requirements.txt

3. Login to Hugging Face

huggingface-cli login

Quick Start

Use inference.py to simply have a try:

python inference.py

Testing BizGen

1. Download Checkpoints

Create a path bizgen/checkpoints and download the following checkpoints into this path.

| Name | Description| |----------|-------------| | byt5 | ByT5 model checkpoint | | lora_infographic | Unet LoRA weights and finetuned ByT5 mapper checkpoint for infographic | | lora_slides | Unet LoRA weights and finetuned ByT5 mapper checkpoint for slides | | spo | Post-trained SDXL checkpoint (for aesthetic improvement) |

The downloaded checkpoints should be organized as follows:

checkpoints/
├── byt5/
│   ├── base.pt
│   └── byt5_model.pt
├── lora/
|   ├── infographic/
|   |   ├──byt5_mapper.pt
|   |   └──unet_lora.pt
|   └── slides/
|       ├──byt5_mapper.pt
|       └──unet_lora.pt
└── spo

2. Run the testing Script

For infographics:

python inference.py \
--ckpt_dir checkpoints/lora/infographic \
--output_dir infographic \
--sample_list meta/infographics.json

For slides:

python inference.py \
--ckpt_dir checkpoints/lora/slides \
--output_dir slide \
--sample_list meta/slides.json

:mailbox_with_mail: Citation

If you find this code useful in your research, please consider citing:

@misc{peng2025bizgenadvancingarticlelevelvisual,
  title={BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation}, 
  author={Yuyang Peng and Shishi Xiao and Keming Wu and Qisheng Liao and Bohan Chen and Kevin Lin and Danqing Huang and Ji Li and Yuhui Yuan},
  year={2025},
  eprint={2503.20672},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2503.20672}, 
}

@article{liu2024glyphv2,
  title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering},
  author={Liu, Zeyu and Liang, Weicong and Zhao, Yiming and Chen, Bohan and Li, Ji and Yuan, Yuhui},
  journal={arXiv preprint arXiv:2406.10208},
  year={2024}
}

@article{liu2024glyph,
  title={Glyph-byt5: A customized text encoder for accurate visual text rendering},
  author={Liu, Zeyu and Liang, Weicong and Liang, Zhanhao and Luo, Chong and Li, Ji and Huang, Gao and Yuan, Yuhui},
  journal={arXiv preprint arXiv:2403.09622},
  year={2024}
}

Related Skills

node-connect

343.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

92.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

1230young

View profile

View on GitHub

GitHub Stars301

CategoryDevelopment

Updated11h ago

Forks40

1230young/bizgen

Languages

Python

Security Score

95/100

Audited on Mar 31, 2026

No findings

Bizgen

Install / Use

README

🌟 Features

🚧 TODO List

Table of Contents

Environment Setup

1. Create Conda Environment

2. Install Dependencies

3. Login to Hugging Face

Quick Start

Testing BizGen

1. Download Checkpoints

2. Run the testing Script

:mailbox_with_mail: Citation

Related Skills