SkillAgentSearch skills...

Xdiffusion

A unified media (Image, Video, Audio, Text) diffusion repository, for education and learning.

Install / Use

/learn @swookey-thinky/Xdiffusion
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

xdiffusion

A unified media (Image, Video, Audio, Text) diffusion repository, for education and learning.

If you are looking for just the lessons on image diffusion models, checkout mindiffusion for more image diffusion model paper implementations.

Requirements

This package built using PyTorch and written in Python 3. Due to package dependencies, this repo requires python 3.10 or greater. To setup an environment to run all of the lessons, we suggest using conda or venv:

> python3 -m venv xdiffusion_env
> source xdiffusion_env/bin/activate
> pip install --upgrade pip
> pip install -r requirements.txt

I find pyenv-virtualenv to be very helpful as well, in managing both the virtual environment as well as the python version dependencies.

> pyenv install 3.10.15
> pyenv virtualenv 3.10.15 xdiffusion_env
> pyvenv activate xdiffusion_env     
> pip install --upgrade pip
> pip install -r requirements.txt

All lessons are designed to be run from the root of the repository, and you should set your python path to include the repository root:

> export PYTHONPATH=$(pwd)

If you have issues with PyTorch and different CUDA versions on your instance, make sure to install the correct version of PyTorch for the CUDA version on your machine. For example, if you have CUDA 11.8 installed, you can install PyTorch using:

> pip install torch==2.1.0 torchvision --index-url https://download.pytorch.org/whl/cu118

Image Diffusion

Training Datasets

In this repository, we will be working with the MNIST dataset because it is simple and can be trained in real time with minimal GPU power and memory. The main difference between MNIST and other datasets is the single channel of the imagery, versus 3 channels in most other datasets. We will make sure that the models we build can easily accomodate 1- or 3-channel data, so that you can test the models we build on other datasets.

Image Models

The following is a list of the supported image models, their current results, and a link to their configuration files and documentation.

| Date | Name | Paper | Config | Results | Instructions | :---- | :---- | ----- | ------ | ----- | ----- | June 2020 | DDPM | Denoising Diffusion Probabilistic Models | config | DDPM| instructions | November 2020 | Score-SDE | Score-Based Generative Modeling through Stochastic Differential Equations | config | Sub-VPE SDE| instructions | July 2021 | D3PM | Structured Denoising Diffusion Models in Discrete State-Spaces | | | | May 2022 | Imagen | Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding | config | Imagen | instructions | June 2022 | EDM | Elucidating the Design Space of Diffusion-Based Generative Models | config | EDM | instructions | September 2022 | Rectified Flow | Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow | config | Rectified Flow | instructions | December 2022 | LoRA | LoRA: Low-Rank Adaptation of Large Language Models | - | LoRA | instructions | December 2022 | DiT | Scalable Diffusion Models with Transformers | config | DiT | instructions | March 2023 | Consistency Models | Consistency Models | config | Consistency Model | instructions | September 2023 | PixArt-α | PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis | config | Pixart-Alpha | instructions | November 2023 | DiffuSSM | Diffusion Models Without Attention | config | DiffuSSM | instructions | March 2024 | Stable Diffusion 3 | Scaling Rectified Flow Transformers for High-Resolution Image Synthesis | config | SD3 | instructions | July 2024 | AuraFlow | Introducing AuraFlow v0.1, an Open Exploration of Large Rectified Flow Models | config | AuraFlow | instructions | August 2024 | Flux | Flux Announcement | config | Flux | instructions | October 2024 | Sana | SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers | config | Sana | instructions | October 2024 | Stable Diffusion 3.5 | Introducing Stable Diffusion 3.5 | config | SD 3.5 | instructions | November 2024 | | Training-free Regional Prompting for Diffusion Transformers | | | | November 2024 | JanusFlow | JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation | | | | March 2025 | Dynamic Tanh | Transformers Without Normalization | config | Flux | instructions

Video Diffusion

Training Datasets

Due to the resource constraints of most models, we have decided to use the Moving MNIST dataset to train on. Moving MNIST is a simple dataset similar to MNIST, of digits which move around the screen. It is an unlabeled dataset, so we do not have access to text labels to determine which digits are moving around the screen, but we will address that deficiency as well. We train at a reduced resolution of 32x32, due to the resource constraints that most models require. This allows us to train most diffusion models on a T4 instance, which is free to run on Google Colab. We limit training and sample generation to 16 frames, even

Related Skills

View on GitHub
GitHub Stars44
CategoryContent
Updated1mo ago
Forks4

Languages

Python

Security Score

90/100

Audited on Mar 3, 2026

No findings