Unidisc
UniDisc: A discrete diffusion model for joint multimodal generation, enabling controllable and efficient text-image synthesis, editing, and inpainting.
Install / Use
/learn @alexanderswerdlow/UnidiscREADME
Alexander Swerdlow<sup>1*</sup> Mihir Prabhudesai<sup>1*</sup> Siddharth Gandhi<sup>1</sup> Deepak Pathak<sup>1</sup> Katerina Fragkiadaki<sup>1</sup> <br>
<sup>1</sup> Carnegie Mellon University
<!-- [](https://huggingface.co/spaces/todo) --> </div>Hugging Face models
The UniDisc checkpoints are available on Hugging Face:
Getting Started
To install the dependencies, run:
git submodule update --init --recursive
uv sync --no-group dev
uv sync
For a more detailed installation guide, please refer to INSTALL.md.
Data
See DATA.md for details on how to download and preprocess the datasets. We provide processing scripts and instructions for all of the used datasets. Additionally, we release a synthetic dataset available here and the corresponding generation scripts as well as the raw data.
Training
See TRAIN.md for training commands.
Inference
Interactive demo:
mkdir -p ./ckpts/unidisc_interleaved
huggingface-cli download aswerdlow/unidisc_interleaved --local-dir ./ckpts/unidisc_interleaved
uv run demo/server.py experiments='[large_scale_train,large_scale_train_high_res_interleaved,eval_unified,large_scale_high_res_interleaved_inference]' trainer.load_from_state_dict="./ckpts/unidisc_interleaved/unidisc_interleaved.pt"
uv run demo/client.py
Training
See TRAINING.md for details.
Evaluation
See EVAL.md for details.
Citation
To cite our work, please use the following:
@article{swerdlow2025unidisc,
title = {Unified Multimodal Discrete Diffusion},
author = {Swerdlow, Alexander and Prabhudesai, Mihir and Gandhi, Siddharth and Pathak, Deepak and Fragkiadaki, Katerina},
journal = {arXiv preprint arXiv:2503.20853},
year = {2025},
doi = {10.48550/arXiv.2503.20853},
}
Credits
This repository is built on top of the following repositories:
Related Skills
node-connect
351.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
