E2d2

[NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Generate Convert Improve

Install / Use

/learn @kuleshov-group/E2d2

About this skill

Quality Score

0/100

README

E2D2: Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

This repository contains code and scripts for reproducing experimental results from our work.

0. Getting started

Setup environment

Install conda:

# For conda: https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh && \
bash miniconda.sh -b -p /opt/conda

Setup a conda environment and install dependencies using:

conda env create -f requirements.yaml

Activate the environment:

conda activate e2d2-env

We also include a setup_env.sh script that can be used to set up the environment on a new machine. Run the script using:

source setup_env.sh

You can also include this snippet in shell / slurm scripts to set up the environment on a compute node.

In this script, we set up WandB and HuggingFace tokens by sourcing a script which is expected to be in the /home/<YOUR_USER_NAME>/ directory. Copy the contents below into a shell script /home/<YOUR_USER_NAME>/setup_discdiff.sh and replace the placeholder tokens with your own:

# W&B / HF Setup
export WANDB__SERVICE_WAIT=600
export _WANDB_STARTUP_DEBUG="true"
export WANDB_ENTITY="<WANDB_ENTITY>"
export WANDB_API_KEY="<WANDB_API_KEY>"
echo "Logging into W&B as '${WANDB_ENTITY}'."

# HF Setup
export HUGGINGFACE_TOKEN="<HF_TOKEN>"
huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential

WandB token can be found here.
HuggingFace token can be setup here.

Contributing to the repo

We will try to use GitHub issues to track bugs, features, and todos. To contribute to the repo, please create a new issue and assign it to yourself. Then create a new branch from the issue and open a pull request.

We use pre-commit to run linters and formatters on the code. To install the pre-commit hooks, run:

pre-commit install

On every git commit, the pre-commit hooks will run automatically and report any issues / automatic fixes that were applied.

1. Code Organization

bash_scripts: These shells scripts can be used to reproduce the experiments from our work. See below.
configs: We utilize hydra config files to organize experiments.
1. config.yaml This config is the entry point for launching training experiments.
2. eval_config.yaml This config is the entry point for evaluations.
scripts: The main training and evaluation scripts
1. scripts/composer_scripts/train_discrete_denoiser.py: This script is the main training entry point.
2. scripts/evals: These scripts run the evaluation for the translation, summarization, and math reasoning datasets, as well as any likelihood evaluation.
src:
1. src/denoiser: During training, denoisers take in "noisy" inputs and predict clean signals. At inference, starting from a purely noisy signal, through iterative denoising, these classes produce samples that resemble data.
  1. AR: We can view autoregressive models within this paradigm. Noise is applied by masking tokens one at a time from right-to-left. Denoising is done one token at a time, left-to-right.
  2. Diffusion: We implement masked diffusion models:
    - MDLM: Standard masked diffusion.
    - BD3LM: Block diffusion models.
    - E2D2: Our encoder-decoder implementation.
2. src/backbone: These are the underlying neural networks the take in noisy inputs and produce logits. Each denoiser is parameterized by a backbone. The denoiser can optionally, post-process the logit outputs of the backbone to produce log-probs over the clean sequence.

2. Reproducing Experiments

The shell scripts provided in bash_scripts can be used to reproduce the training and evaluations from our work.

For training, the files follow a convention where the dataset and denoiser class are specified. For example, to train the fine-tuning E2D2 model on the GSM8K dataset, use the following shell script: run_train_e2d2_gsm8k.sh.
Once models have been trained, the provided evaluation scripts can be used to reproduce tables and figures from our work. For example, to evaluate models trained on the WMT translation dataset, use the following shell script: run_seq2seq_eval_wmt.sh. In that file, and similar ones for other evaluations, specify the path to the saved checkpoints, and uncomment the relevant section for a given denoiser class. We also provide scripts that will produce the generation throughput numbers we report. These files contain a _tput at the end of the script name.

Below are the evaluation scripts provided for various tasks:

Text summarization: run_seq2seq_eval_cnndm.sh,run_seq2seq_eval_cnndm_tput.sh
Machine translation: run_seq2seq_eval_wmt.sh, run_seq2seq_eval_wmt_tput.sh.
Mathematical reasoning: run_lm_eval_harness.sh, run_lm_eval_harness_tput.sh, run_likelihood_eval_gsm8k.sh
Likelihood estimation (trained on OpenWebText): run_likelihood_eval_owt.sh

3. HuggingFace Integration

We release the following models on HuggingFace:

80M E2D2 for text summarization (trained from scratch): kuleshov-group/e2d2-cnndm
250M E2D2 for machine translation (trained from scratch): kuleshov-group/e2d2-wmt
1.7B E2D2 for mathematical reasoning (fine-tuned from Qwen3): kuleshov-group/e2d2-gsm8k-finetune-Qwen3-2B
170M E2D2 trained on OpenWebText (trained from scratch): kuleshov-group/e2d2-owt

To use these models, follow the snippet below:

from transformers import AutoModelForMaskedLM, AutoTokenizer
from transformers.generation.stopping_criteria import EosTokenCriteria
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# model_config_overrides = {}  # Use this to optionally override config parameters
model = AutoModelForMaskedLM.from_pretrained(
    "kuleshov-group/e2d2-gsm8k-finetune-Qwen3-2B",  # Use one of the repos from above
    trust_remote_code=True,
    # **model_config_overrides,
).to(device)
tokenizer = AutoTokenizer.from_pretrained(
   "kuleshov-group/e2d2-gsm8k-finetune-Qwen3-2B",  # Use one of the repos from above
   trust_remote_code=True,
)

system_prompt = "Please reason step by step, and put your final answer within $\\boxed{}$."
user_prompt = "Every day, Wendi feeds each of her chickens three cups of mixed \
chicken feed, containing seeds, mealworms and vegetables to help keep them \
healthy. She gives the chickens their feed in three separate meals. In \
the morning, she gives her flock of chickens 15 cups of feed. In the \
afternoon, she gives her chickens another 25 cups of feed. How many cups \
of feed does she need to give her chickens in the final meal of the day if \
the size of Wendi’s flock is 20 chickens?"

inputs = tokenizer(
   tokenizer.eos_token + system_prompt + user_prompt + tokenizer.eos_token + "Answer:",
   return_tensors="pt").to(device)

with torch.inference_mode():
   output_ids = model.generate(
      inputs=inputs["input_ids"],
      max_new_tokens=256,  # Set a maximum length for the output
      stopping_criteria=EosTokenCriteria(tokenizer.eos_token_id), # Stop generation after next EOS
   )
print(tokenizer.decode(output_ids[0]))

These models can also be used in the evaluation scripts by setting pretrained_model_name_or_path= to one of the options above.

Citation

@inproceedings{
arriola2025e2d2,
title={Encoder-Decoder Diffusion Language Models for Efficient Training and Inference},
author={Marianne Arriola and Yair Schiff and Hao Phung and Aaron Gokaslan and Volodymyr Kuleshov},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://arxiv.org/abs/2510.22852}
}

Related Skills

node-connect

348.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

348.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

348.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。