LightGen

An Efficient Text-to-Image Generation Pretrain Pipeline

Generate Convert Improve

Install / Use

/learn @XianfengWu01/LightGen

About this skill

Quality Score

0/100

README

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Official PyTorch Implementation

<code>HF Checkpoint 🚀</code> | <code>Technical Report 📝</code> | <code>机器之心 🤩</code> | <code>量子位 🤩</code> | <code>HKUST AIS 🤩</code>

LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization <a href="https://maradona10wxf.github.io/">Xianfeng Wu1, 2#</a> · <a href="https://scholar.google.com/citations?user=0bmTpcAAAAAJ&hl=en&oi=ao">Yajing Bai1, 2#</a> · <a href="https://sairlab.org/haozez/">Haoze Zheng1, 2#</a> · <a href="https://haroldchen19.github.io/">Harold (haodong) Chen1, 2#</a> · <a href="https://scholar.google.com/citations?user=Y8zBpcoAAAAJ&hl=zh-CN">Yexin Liu1, 2#</a> · <a href="https://scholar.google.com/citations?user=UhFbFCMAAAAJ&hl=en">Zihao Wang1, 2</a> · <a href="">Xuran Ma1, 2</a> · <a href="https://scholar.google.cz/citations?user=bM_lvLAAAAAJ&hl=zh-CN">Wenjie Shu1, 2</a> · <a href="">Xianzu Wu1, 2</a> · <a href="https://leehomyc.github.io/">Harry Yang1, 2*</a> · <a href="https://scholar.google.com/citations?user=HX0BfLYAAAAJ&hl=en">Sernam Lim2, 3*</a> 1 <a href="https://amc.hkust.edu.hk/">HKUST AMC</a>, 2 <a href="https://www.everlyn.ai/">Everlyn AI</a>, 3 <a href="https://www.cs.ucf.edu/">UCF CS</a>, #Equal contribution, * Corresponding Author <img src="demo/demo.png" width="720">

This is a PyTorch/GPU implementation of LightGen, this repo wants to provide an efficient pre-training pipeline for text-to-image generation based on Fluid/MAR

🦉 ToDo List

[ ] DPO Post-proceesing Code Released
[ ] Release Complete Checkpoint.
[ ] Add Accelerate Module.

Env

conda create -n everlyn_video python=3.10
conda activate everlyn_video
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
# pip install -U xformers==0.0.26 --index-url https://download.pytorch.org/whl/cu121
pip install -r requierments.txt

Prepare stage

huggingface-cli download --token hf_ur_token --resume-download stabilityai/stable-diffusion-3.5-large --local-dir stable-diffusion-3.5-large # Image VAE
huggingface-cli download --resume-download google/flan-t5-xxl --local-dir google/flan-t5-xxl # Text Encoder
huggingface-cli download --repo-type dataset --resume-download jackyhate/text-to-image-2M --local-dir text-to-image-2M # Dataset

untar script for text-to-image2M

#!/bin/bash

# Check if the 'untar' directory exists, and create it if it does not
mkdir -p untar

# Loop through all .tar files
for tar_file in *.tar; do
    # Extract the numeric part, for example 00001, 00002, ...
    dir_name=$(basename "$tar_file" .tar)
    
    # Create the corresponding directory
    mkdir -p "untar/$dir_name"
    
    # Extract the tar file to the corresponding directory
    tar -xvf "$tar_file" -C "untar/$dir_name"
    
    echo "Extraction completed: $tar_file to untar/$dir_name"
done

echo "All files have been extracted."

It may too large to cost much time to read this data in normal dataset, so we need to generate a json file first to accelerate this process, modify scripts/generate_txt.py then run it.

python generate_json.py

Training

Script for the default setting, u can modify some setting in scripts/run.sh:

sh run.sh

Inference

Script for the default setting:

python pipeline_image.py

Acknowledgements

A large portion of codes in this repo is based on MAR.

✨ Star History

Cite

@article{wu2025lightgen,
  title={LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization},
  author={Wu, Xianfeng and Bai, Yajing and Zheng, Haoze and Chen, Harold Haodong and Liu, Yexin and Wang, Zihao and Ma, Xuran and Shu, Wen-Jie and Wu, Xianzu and Yang, Harry and others},
  journal={arXiv preprint arXiv:2503.08619},
  year={2025}
}

Related Skills

qqbot-channel

351.8k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

docs-writer

100.6k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

351.8k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

project-overview

FlightPHP Skeleton Project Instructions This document provides guidelines and best practices for structuring and developing a project using the FlightPHP framework. Instructions for AI Coding A

XianfengWu01

View profile

View on GitHub

GitHub Stars130

CategoryContent

Updated13d ago

Forks8

XianfengWu01/LightGen

Languages

Python

Security Score

100/100

Audited on Mar 26, 2026

No findings