HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Install / Use
/learn @Tencent-Hunyuan/HunyuanDiTREADME
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
<div align="center"> <a href="https://github.com/Tencent-Hunyuan/HunyuanDiT"><img src="https://img.shields.io/static/v1?label=Hunyuan-DiT Code&message=Github&color=blue&logo=github-pages"></a>   <a href="https://dit.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a>   <a href="https://arxiv.org/pdf/2405.08748"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv:HunYuan-DiT&color=red&logo=arxiv"></a>   <a href="https://arxiv.org/abs/2403.08857"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:DialogGen&color=red&logo=arxiv"></a>   <a href="https://huggingface.co/Tencent-Hunyuan/HunyuanDiT"><img src="https://img.shields.io/static/v1?label=Hunyuan-DiT&message=HuggingFace&color=yellow"></a>   <a href="https://yuanbao.tencent.com/chat/naQivTmsDa"><img src="https://img.shields.io/static/v1?label=Hunyuan Bot&message=Web&color=green"></a>   <a href="./comfyui"><img src="https://img.shields.io/static/v1?label=ComfyUI Support&message=ComfyUI&color=purple&logo=github-pages"></a>   </div>This repo contains PyTorch model definitions, pre-trained weights and inference/sampling code for our paper exploring Hunyuan-DiT. You can find more visualizations on our project page.
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation <br>
🔥🔥🔥 News!!
- Dec 17, 2024: :tada: Optimize Lora training with
refined grad checkpointandlow-bit optimizer. Just use--lowbit-optto get started. - Sep 13, 2024: 🎉 IPAdapter is officially supported by HunYuanDiT. Document for it: ./ipadapter. And scaled attention is utilized to replace flash attention on V100 GPUs.
- Aug 26, 2024, 🎉 HunYuanDIT Controlnet and LoRA are officially supported by ComfyUI. Document for it: ./comfyui
- Jul 15, 2024: 🚀 HunYuanDiT and Shakker.Ai have jointly launched a fine-tuning event based on the HunYuanDiT 1.2 model. By publishing a lora or fine-tuned model based on HunYuanDiT, you can earn up to $230 bonus from Shakker.Ai. See Shakker.Ai for more details.
- Jul 15, 2024: :tada: Update ComfyUI to support standardized workflows and compatibility with weights from t2i module and Lora training for versions 1.1/1.2, as well as those trained by Kohya or the official script.
- Jul 15, 2024: :zap: We offer Docker environments for CUDA 11/12, allowing you to bypass complex installations and play with a single click! See dockers for details.
- Jul 08, 2024: :tada: HYDiT-v1.2 version is released. Please check HunyuanDiT-v1.2 and Distillation-v1.2 for more details.
- Jul 03, 2024: :tada: Kohya-hydit version now available for v1.1 and v1.2 models, with GUI for inference. Official Kohya version is under review. See kohya for details.
- Jun 27, 2024: :art: Hunyuan-Captioner is released, providing fine-grained caption for training data. See mllm for details.
- Jun 27, 2024: :tada: Support LoRa and ControlNet in diffusers. See diffusers for details.
- Jun 27, 2024: :tada: 6GB GPU VRAM Inference scripts are released. See lite for details.
- Jun 19, 2024: :tada: ControlNet is released, supporting canny, pose and depth control. See training/inference codes for details.
- Jun 13, 2024: :zap: HYDiT-v1.1 version is released, which mitigates the issue of image oversaturation and alleviates the watermark issue. Please check HunyuanDiT-v1.1 and Distillation-v1.1 for more details.
- Jun 13, 2024: :truck: The training code is released, offering full-parameter training and LoRA training.
- Jun 06, 2024: :tada: Hunyuan-DiT is now available in ComfyUI. Please check ComfyUI for more details.
- Jun 06, 2024: 🚀 We introduce Distillation version for Hunyuan-DiT acceleration, which achieves 50% acceleration on NVIDIA GPUs. Please check Distillation for more details.
- Jun 05, 2024: 🤗 Hunyuan-DiT is now available in 🤗 Diffusers! Please check the example below.
- Jun 04, 2024: :globe_with_meridians: Support Tencent Cloud links to download the pretrained models! Please check the links below.
- May 22, 2024: 🚀 We introduce TensorRT version for Hunyuan-DiT acceleration, which achieves 47% acceleration on NVIDIA GPUs. Please check TensorRT-libs for instructions.
- May 22, 2024: 💬 We support demo running multi-turn text2image generation now. Please check the script below.
🤖 Try it on the web
Welcome to our web-based Tencent Hunyuan Bot, where you can explore our innovative products! Just input the suggested prompts below or any other imaginative prompts containing drawing-related keywords to activate the Hunyuan text-to-image generation feature. Unleash your creativity and create any picture you desire, all for free!
You can use simple prompts similar to natural language text
画一只穿着西装的猪
draw a pig in a suit
生成一幅画,赛博朋克风,跑车
generate a painting, cyberpunk style, sports car
or multi-turn language interactions to create the picture.
画一个木制的鸟
draw a wooden bird
变成玻璃的
turn into glass
🤗 Community Contribution Leaderboard
-
By @TTPlanetPig
- HunyuanDIT_v1.2 ControlNet models
- Inpaint controlnet: https://huggingface.co/TTPlanet/HunyuanDiT_Controlnet_inpainting
- Tile controlnet: https://huggingface.co/TTPlanet/HunyuanDiT_Controlnet_tile
- Lineart controlnet: https://huggingface.co/TTPlanet/HunyuanDiT_Controlnet_lineart
- HunyuanDIT_v1.2 ComfyUI nodes
- Comfyui_TTP_CN_Preprocessor: https://github.com/TTPlanetPig/Comfyui_TTP_CN_Preprocessor
- Comfyui_TTP_Toolset: https://github.com/TTPlanetPig/Comfyui_TTP_Toolset
- HunyuanDIT_v1.2 ControlNet models
-
- Kohya_ss-hydit train tools: https://github.com/zml-ai/HunyuanDIT-PRE/tree/main/kohya_ss-hydit
-
By @CrazyBoyM (bilibili up 飞鸟白菜)
- ComfyUI support for HunyuanDIT_v1.2 Controlnet: https://github.com/comfyanonymous/ComfyUI/pull/4245
-
By @L_A_X
- HunyuanDIT_v1.2 base model for anime
- Original hf: https://huggingface.co/Laxhar/Freeway_Animation_HunYuan_Demo
- Converted ComfyUI model: https://huggingface.co/comfyanonymous/Freeway_Animation_Hunyuan_Demo_ComfyUI_Converted
- HunyuanDIT_v1.2 base model for anime
📑 Open-source Plan
- Hunyuan-DiT (Text-to-Image Model)
- [x] Inference
- [x] Checkpoints
- [x] Distillation Version
- [x] TensorRT Version
- [x] Training
- [x] Lora
- [x] Controlnet (Pose, Canny, Depth)
- [x] 6GB GPU VRAM Inference
- [x] IP-adapter
- [ ] Hunyuan-DiT-S checkpoints (0.7B model)
- Mllm
- Hunyuan-Captioner (Re-caption the raw image-text pairs)
- [x] Inference
- Hunyuan-DialogGen (Prompt Enhancement Model)
- [x] Inference
- Hunyuan-Captioner (Re-caption the raw image-text pairs)
- [X] Web Demo (Gradio)
- [x] Multi-turn T2I Demo (Gradio)
- [X] Cli Demo
- [X] ComfyUI
- [X] Diffusers
- [X] Kohya
- [ ] WebUI
Contents
- Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
- 🔥🔥🔥 News!!
- 🤖 Try it on the web
- 🤗 Community Contribution Leaderboard
- 📑 Open-source Plan
- Contents
- Abstract
- 🎉 Hunyuan-DiT Key Features
- 📈 Comparisons
- 🎥 Visualization
- 📜 Requirements
- 🛠️ Dependencies and Installation
- 🧱 Download Pretrained Models - 1. Using HF-Mirror - 2. Resume Download
- :truck: Training
- 🔑 Inference
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
