SkillAgentSearch skills...

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Install / Use

/learn @Tencent-Hunyuan/HunyuanDiT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- ## **HunyuanDiT** --> <p align="center"> <img src="https://raw.githubusercontent.com/Tencent/HunyuanDiT/main/asset/logo.png" height=100> </p>

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

<div align="center"> <a href="https://github.com/Tencent-Hunyuan/HunyuanDiT"><img src="https://img.shields.io/static/v1?label=Hunyuan-DiT Code&message=Github&color=blue&logo=github-pages"></a> &ensp; <a href="https://dit.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a> &ensp; <a href="https://arxiv.org/pdf/2405.08748"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv:HunYuan-DiT&color=red&logo=arxiv"></a> &ensp; <a href="https://arxiv.org/abs/2403.08857"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:DialogGen&color=red&logo=arxiv"></a> &ensp; <a href="https://huggingface.co/Tencent-Hunyuan/HunyuanDiT"><img src="https://img.shields.io/static/v1?label=Hunyuan-DiT&message=HuggingFace&color=yellow"></a> &ensp; <a href="https://yuanbao.tencent.com/chat/naQivTmsDa"><img src="https://img.shields.io/static/v1?label=Hunyuan Bot&message=Web&color=green"></a> &ensp; <a href="./comfyui"><img src="https://img.shields.io/static/v1?label=ComfyUI Support&message=ComfyUI&color=purple&logo=github-pages"></a> &ensp; </div>

This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our project page.

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding <br>

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation <br>

🔥🔥🔥 News!!

  • Dec 17, 2024: :tada: Optimize Lora training with refined grad checkpoint and low-bit optimizer. Just use --lowbit-opt to get started.
  • Sep 13, 2024: 🎉 IPAdapter is officially supported by HunYuanDiT. Document for it: ./ipadapter. And scaled attention is utilized to replace flash attention on V100 GPUs.
  • Aug 26, 2024, 🎉 HunYuanDIT Controlnet and LoRA are officially supported by ComfyUI. Document for it: ./comfyui
  • Jul 15, 2024: 🚀 HunYuanDiT and Shakker.Ai have jointly launched a fine-tuning event based on the HunYuanDiT 1.2 model. By publishing a lora or fine-tuned model based on HunYuanDiT, you can earn up to $230 bonus from Shakker.Ai. See Shakker.Ai for more details.
  • Jul 15, 2024: :tada: Update ComfyUI to support standardized workflows and compatibility with weights from t2i module and Lora training for versions 1.1/1.2, as well as those trained by Kohya or the official script.
  • Jul 15, 2024: :zap: We offer Docker environments for CUDA 11/12, allowing you to bypass complex installations and play with a single click! See dockers for details.
  • Jul 08, 2024: :tada: HYDiT-v1.2 version is released. Please check HunyuanDiT-v1.2 and Distillation-v1.2 for more details.
  • Jul 03, 2024: :tada: Kohya-hydit version now available for v1.1 and v1.2 models, with GUI for inference. Official Kohya version is under review. See kohya for details.
  • Jun 27, 2024: :art: Hunyuan-Captioner is released, providing fine-grained caption for training data. See mllm for details.
  • Jun 27, 2024: :tada: Support LoRa and ControlNet in diffusers. See diffusers for details.
  • Jun 27, 2024: :tada: 6GB GPU VRAM Inference scripts are released. See lite for details.
  • Jun 19, 2024: :tada: ControlNet is released, supporting canny, pose and depth control. See training/inference codes for details.
  • Jun 13, 2024: :zap: HYDiT-v1.1 version is released, which mitigates the issue of image oversaturation and alleviates the watermark issue. Please check HunyuanDiT-v1.1 and Distillation-v1.1 for more details.
  • Jun 13, 2024: :truck: The training code is released, offering full-parameter training and LoRA training.
  • Jun 06, 2024: :tada: Hunyuan-DiT is now available in ComfyUI. Please check ComfyUI for more details.
  • Jun 06, 2024: 🚀 We introduce Distillation version for Hunyuan-DiT acceleration, which achieves 50% acceleration on NVIDIA GPUs. Please check Distillation for more details.
  • Jun 05, 2024: 🤗 Hunyuan-DiT is now available in 🤗 Diffusers! Please check the example below.
  • Jun 04, 2024: :globe_with_meridians: Support Tencent Cloud links to download the pretrained models! Please check the links below.
  • May 22, 2024: 🚀 We introduce TensorRT version for Hunyuan-DiT acceleration, which achieves 47% acceleration on NVIDIA GPUs. Please check TensorRT-libs for instructions.
  • May 22, 2024: 💬 We support demo running multi-turn text2image generation now. Please check the script below.

🤖 Try it on the web

Welcome to our web-based Tencent Hunyuan Bot, where you can explore our innovative products! Just input the suggested prompts below or any other imaginative prompts containing drawing-related keywords to activate the Hunyuan text-to-image generation feature. Unleash your creativity and create any picture you desire, all for free!

You can use simple prompts similar to natural language text

画一只穿着西装的猪

draw a pig in a suit

生成一幅画,赛博朋克风,跑车

generate a painting, cyberpunk style, sports car

or multi-turn language interactions to create the picture.

画一个木制的鸟

draw a wooden bird

变成玻璃的

turn into glass

🤗 Community Contribution Leaderboard

  1. By @TTPlanetPig

    • HunyuanDIT_v1.2 ControlNet models
      • Inpaint controlnet: https://huggingface.co/TTPlanet/HunyuanDiT_Controlnet_inpainting
      • Tile controlnet: https://huggingface.co/TTPlanet/HunyuanDiT_Controlnet_tile
      • Lineart controlnet: https://huggingface.co/TTPlanet/HunyuanDiT_Controlnet_lineart
    • HunyuanDIT_v1.2 ComfyUI nodes
      • Comfyui_TTP_CN_Preprocessor: https://github.com/TTPlanetPig/Comfyui_TTP_CN_Preprocessor
      • Comfyui_TTP_Toolset: https://github.com/TTPlanetPig/Comfyui_TTP_Toolset
  2. By @sdbds (bilibili up 青龙圣者)

    • Kohya_ss-hydit train tools: https://github.com/zml-ai/HunyuanDIT-PRE/tree/main/kohya_ss-hydit
  3. By @CrazyBoyM (bilibili up 飞鸟白菜)

    • ComfyUI support for HunyuanDIT_v1.2 Controlnet: https://github.com/comfyanonymous/ComfyUI/pull/4245
  4. By @L_A_X

    • HunyuanDIT_v1.2 base model for anime
      • Original hf: https://huggingface.co/Laxhar/Freeway_Animation_HunYuan_Demo
      • Converted ComfyUI model: https://huggingface.co/comfyanonymous/Freeway_Animation_Hunyuan_Demo_ComfyUI_Converted

📑 Open-source Plan

  • Hunyuan-DiT (Text-to-Image Model)
    • [x] Inference
    • [x] Checkpoints
    • [x] Distillation Version
    • [x] TensorRT Version
    • [x] Training
    • [x] Lora
    • [x] Controlnet (Pose, Canny, Depth)
    • [x] 6GB GPU VRAM Inference
    • [x] IP-adapter
    • [ ] Hunyuan-DiT-S checkpoints (0.7B model)
  • Mllm
    • Hunyuan-Captioner (Re-caption the raw image-text pairs)
      • [x] Inference
    • Hunyuan-DialogGen (Prompt Enhancement Model)
      • [x] Inference
  • [X] Web Demo (Gradio)
  • [x] Multi-turn T2I Demo (Gradio)
  • [X] Cli Demo
  • [X] ComfyUI
  • [X] Diffusers
  • [X] Kohya
  • [ ] WebUI

Contents

Related Skills

View on GitHub
GitHub Stars4.3k
CategoryDevelopment
Updated1d ago
Forks360

Languages

Jupyter Notebook

Security Score

80/100

Audited on Mar 30, 2026

No findings