HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Generate Convert Improve

Install / Use

/learn @Tencent-Hunyuan/HunyuanDiT

About this skill

Quality Score

0/100

README

<p align="center"> <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/logo.png" height=100> </p>

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our project page.

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding <br>

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation <br>

🔥🔥🔥 News!!

Dec 17, 2024: :tada: Optimize Lora training with refined grad checkpoint and low-bit optimizer. Just use --lowbit-opt to get started.
Sep 13, 2024: 🎉 IPAdapter is officially supported by HunYuanDiT. Document for it: ./ipadapter. And scaled attention is utilized to replace flash attention on V100 GPUs.
Aug 26, 2024, 🎉 HunYuanDIT Controlnet and LoRA are officially supported by ComfyUI. Document for it: ./comfyui
Jul 15, 2024: 🚀 HunYuanDiT and Shakker.Ai have jointly launched a fine-tuning event based on the HunYuanDiT 1.2 model. By publishing a lora or fine-tuned model based on HunYuanDiT, you can earn up to $230 bonus from Shakker.Ai. See Shakker.Ai for more details.
Jul 15, 2024: :tada: Update ComfyUI to support standardized workflows and compatibility with weights from t2i module and Lora training for versions 1.1/1.2, as well as those trained by Kohya or the official script.
Jul 15, 2024: :zap: We offer Docker environments for CUDA 11/12, allowing you to bypass complex installations and play with a single click! See dockers for details.
Jul 08, 2024: :tada: HYDiT-v1.2 version is released. Please check HunyuanDiT-v1.2 and Distillation-v1.2 for more details.
Jul 03, 2024: :tada: Kohya-hydit version now available for v1.1 and v1.2 models, with GUI for inference. Official Kohya version is under review. See kohya for details.
Jun 27, 2024: :art: Hunyuan-Captioner is released, providing fine-grained caption for training data. See mllm for details.
Jun 27, 2024: :tada: Support LoRa and ControlNet in diffusers. See diffusers for details.
Jun 27, 2024: :tada: 6GB GPU VRAM Inference scripts are released. See lite for details.
Jun 19, 2024: :tada: ControlNet is released, supporting canny, pose and depth control. See training/inference codes for details.
Jun 13, 2024: :zap: HYDiT-v1.1 version is released, which mitigates the issue of image oversaturation and alleviates the watermark issue. Please check HunyuanDiT-v1.1 and Distillation-v1.1 for more details.
Jun 13, 2024: :truck: The training code is released, offering full-parameter training and LoRA training.
Jun 06, 2024: :tada: Hunyuan-DiT is now available in ComfyUI. Please check ComfyUI for more details.
Jun 06, 2024: 🚀 We introduce Distillation version for Hunyuan-DiT acceleration, which achieves 50% acceleration on NVIDIA GPUs. Please check Distillation for more details.
Jun 05, 2024: 🤗 Hunyuan-DiT is now available in 🤗 Diffusers! Please check the example below.
Jun 04, 2024: :globe_with_meridians: Support Tencent Cloud links to download the pretrained models! Please check the links below.
May 22, 2024: 🚀 We introduce TensorRT version for Hunyuan-DiT acceleration, which achieves 47% acceleration on NVIDIA GPUs. Please check TensorRT-libs for instructions.
May 22, 2024: 💬 We support demo running multi-turn text2image generation now. Please check the script below.

🤖 Try it on the web

Welcome to our web-based Tencent Hunyuan Bot, where you can explore our innovative products! Just input the suggested prompts below or any other imaginative prompts containing drawing-related keywords to activate the Hunyuan text-to-image generation feature. Unleash your creativity and create any picture you desire, all for free!

You can use simple prompts similar to natural language text

画一只穿着西装的猪

draw a pig in a suit

生成一幅画，赛博朋克风，跑车

generate a painting, cyberpunk style, sports car

or multi-turn language interactions to create the picture.

画一个木制的鸟

draw a wooden bird

变成玻璃的

turn into glass

🤗 Community Contribution Leaderboard

By @TTPlanetPig
- HunyuanDIT_v1.2 ControlNet models
  - Inpaint controlnet: https://huggingface.co/TTPlanet/HunyuanDiT_Controlnet_inpainting
  - Tile controlnet: https://huggingface.co/TTPlanet/HunyuanDiT_Controlnet_tile
  - Lineart controlnet: https://huggingface.co/TTPlanet/HunyuanDiT_Controlnet_lineart
- HunyuanDIT_v1.2 ComfyUI nodes
  - Comfyui_TTP_CN_Preprocessor: https://github.com/TTPlanetPig/Comfyui_TTP_CN_Preprocessor
  - Comfyui_TTP_Toolset: https://github.com/TTPlanetPig/Comfyui_TTP_Toolset
By @sdbds (bilibili up 青龙圣者)
- Kohya_ss-hydit train tools: https://github.com/zml-ai/HunyuanDIT-PRE/tree/main/kohya_ss-hydit
By @CrazyBoyM (bilibili up 飞鸟白菜)
- ComfyUI support for HunyuanDIT_v1.2 Controlnet: https://github.com/comfyanonymous/ComfyUI/pull/4245
By @L_A_X
- HunyuanDIT_v1.2 base model for anime
  - Original hf: https://huggingface.co/Laxhar/Freeway_Animation_HunYuan_Demo
  - Converted ComfyUI model: https://huggingface.co/comfyanonymous/Freeway_Animation_Hunyuan_Demo_ComfyUI_Converted

📑 Open-source Plan

Hunyuan-DiT (Text-to-Image Model)
- [x] Inference
- [x] Checkpoints
- [x] Distillation Version
- [x] TensorRT Version
- [x] Training
- [x] Lora
- [x] Controlnet (Pose, Canny, Depth)
- [x] 6GB GPU VRAM Inference
- [x] IP-adapter
- [ ] Hunyuan-DiT-S checkpoints (0.7B model)
Mllm
- Hunyuan-Captioner (Re-caption the raw image-text pairs)
  - [x] Inference
- Hunyuan-DialogGen (Prompt Enhancement Model)
  - [x] Inference
[X] Web Demo (Gradio)
[x] Multi-turn T2I Demo (Gradio)
[X] Cli Demo
[X] ComfyUI
[X] Diffusers
[X] Kohya
[ ] WebUI

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Related Skills

node-connect

343.1k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

90.0k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.1k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.1k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

Tencent-Hunyuan

View profile

View on GitHub

GitHub Stars4.3k

CategoryDevelopment

Updated1d ago

Forks360

Tencent-Hunyuan/HunyuanDiT

Languages

Jupyter Notebook

Security Score

80/100

Audited on Mar 30, 2026

No findings

HunyuanDiT

Install / Use

README

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

🔥🔥🔥 News!!

🤖 Try it on the web

🤗 Community Contribution Leaderboard

📑 Open-source Plan

Contents

Related Skills