Xdiffusion
A unified media (Image, Video, Audio, Text) diffusion repository, for education and learning.
Install / Use
/learn @swookey-thinky/XdiffusionREADME
xdiffusion
A unified media (Image, Video, Audio, Text) diffusion repository, for education and learning.
If you are looking for just the lessons on image diffusion models, checkout mindiffusion for more image diffusion model paper implementations.
Requirements
This package built using PyTorch and written in Python 3. Due to package dependencies, this repo requires python 3.10 or greater. To setup an environment to run all of the lessons, we suggest using conda or venv:
> python3 -m venv xdiffusion_env
> source xdiffusion_env/bin/activate
> pip install --upgrade pip
> pip install -r requirements.txt
I find pyenv-virtualenv to be very helpful as well, in managing both the virtual environment as well as the python version dependencies.
> pyenv install 3.10.15
> pyenv virtualenv 3.10.15 xdiffusion_env
> pyvenv activate xdiffusion_env
> pip install --upgrade pip
> pip install -r requirements.txt
All lessons are designed to be run from the root of the repository, and you should set your python path to include the repository root:
> export PYTHONPATH=$(pwd)
If you have issues with PyTorch and different CUDA versions on your instance, make sure to install the correct version of PyTorch for the CUDA version on your machine. For example, if you have CUDA 11.8 installed, you can install PyTorch using:
> pip install torch==2.1.0 torchvision --index-url https://download.pytorch.org/whl/cu118
Image Diffusion
Training Datasets
In this repository, we will be working with the MNIST dataset because it is simple and can be trained in real time with minimal GPU power and memory. The main difference between MNIST and other datasets is the single channel of the imagery, versus 3 channels in most other datasets. We will make sure that the models we build can easily accomodate 1- or 3-channel data, so that you can test the models we build on other datasets.
Image Models
The following is a list of the supported image models, their current results, and a link to their configuration files and documentation.
| Date | Name | Paper | Config | Results | Instructions
| :---- | :---- | ----- | ------ | ----- | -----
| June 2020 | DDPM | Denoising Diffusion Probabilistic Models | config | | instructions
| November 2020 | Score-SDE | Score-Based Generative Modeling through Stochastic Differential Equations | config |
| instructions
| July 2021 | D3PM | Structured Denoising Diffusion Models in Discrete State-Spaces | | |
| May 2022 | Imagen | Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding | config |
| instructions
| June 2022 | EDM | Elucidating the Design Space of Diffusion-Based Generative Models | config |
| instructions
| September 2022 | Rectified Flow | Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow | config |
| instructions
| December 2022 | LoRA | LoRA: Low-Rank Adaptation of Large Language Models | - |
| instructions
| December 2022 | DiT | Scalable Diffusion Models with Transformers | config |
| instructions
| March 2023 | Consistency Models | Consistency Models | config |
| instructions
| September 2023 | PixArt-α | PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis | config |
| instructions
| November 2023 | DiffuSSM | Diffusion Models Without Attention | config |
| instructions
| March 2024 | Stable Diffusion 3 | Scaling Rectified Flow Transformers for High-Resolution Image Synthesis | config |
| instructions
| July 2024 | AuraFlow | Introducing AuraFlow v0.1, an Open Exploration of Large Rectified Flow Models | config |
| instructions
| August 2024 | Flux | Flux Announcement | config |
| instructions
| October 2024 | Sana | SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers | config |
| instructions
| October 2024 | Stable Diffusion 3.5 | Introducing Stable Diffusion 3.5 | config |
| instructions
| November 2024 | | Training-free Regional Prompting for Diffusion Transformers | | |
| November 2024 | JanusFlow | JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | | |
| March 2025 | Dynamic Tanh | Transformers Without Normalization | config |
| instructions
Video Diffusion
Training Datasets
Due to the resource constraints of most models, we have decided to use the Moving MNIST dataset to train on. Moving MNIST is a simple dataset similar to MNIST, of digits which move around the screen. It is an unlabeled dataset, so we do not have access to text labels to determine which digits are moving around the screen, but we will address that deficiency as well. We train at a reduced resolution of 32x32, due to the resource constraints that most models require. This allows us to train most diffusion models on a T4 instance, which is free to run on Google Colab. We limit training and sample generation to 16 frames, even
Related Skills
qqbot-channel
349.0kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.3k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
349.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
