Blazing-Fast Code Editing via Multi-Layer Speculation

<p align="center"> <a href="#multi-layer-speculation">📙About</a> • <a href="#installation">🔥Installation</a> • <a href="#evaluation-commands">🚀Commands</a> • <a href="#citation">📜Citation</a> • <a href="#acknowledgement">🙏Acknowledgement</a> </p>

🚀 We propose Blazedit, an extremely simple yet general speculative decoding method that accelerate whole-file code editing by up to 7.7x over a comprehensive set of editing scenarios.

demo

This README provides a quick overview and usage for our technique. A more detailed introduction of Blazedit can be find in our blog post.

Multi-Layer Speculation

overview

We start the introduction with the limitation of existing methods:

High Overhead in Assisted Decoding: Draft model can generate meaningful draft tokens during real edits instead of simply copying, leading to higher acceptance rates. Nonetheless, draft generation is still autoregressive and thus of non-negligible overhead, especially when the draft length is long.
Low Acceptance Rate in Prompt Lookup Decoding (PLD): PLD is efficient as the cost of drafting is negiligble. However, the "copying" mechanism can lead to very low acceptance rate in the validation step when the target model is making real edits.

Blazedit addresses these limitations using an elegant multi-layer speculative decoding strategy. In the high level, similar to assisted decoding, Blazedit uses a draft model to propose draft tokens, validated by the target model, for good acceptance rates. Meanwhile, Blazedit uses PLD to accelerate the draft model, reducing the overhead of draft-model generation. Specifically, the PLD step is performed multiple times to accumulate draft tokens before invoking a target-model forward pass. This allows the draft model to propose an adaptive number of draft tokens, which optimizes the target-model acceptance rate:

It detects the copy-intensive scenario when the PLD layer gets a high acceptance rate, such that the draft model can propose more draft tokens.
It detects the edit-intensive scenario when the PLD layer gets a low acceptance rate, such that the draft model can propose fewer draft tokens.

We evaluated Blazedit and baselines on A100 GPUs under a comprehensive set of editing scenarios.

| Target Model | | Regular | Assisted | PLD | Ours | Speedup (Worst) | Speedup (SOTA) | |-----------------------|--------|-------------|--------------|----------|----------|---------------------|--------------------| | Qwen2.5-Coder-32B | Avg. | 74.6 | 134.2 | 379.3 | 434.8 | 5.8x | 1.15x | | | P90 | 60.7 | 100.3 | 130.7 | 169.0 | 2.8x | 1.29x | | DeepSeekCoder-33B | Avg. | 55.3 | 123.4 | 364.2 | 424.5 | 7.7x | 1.17x | | | P90 | 45.1 | 97.8 | 120.9 | 173.4 | 3.8x | 1.43x |

Installation

git clone git@github.com:ise-uiuc/blazedit.git --recurse-submodules
cd blazedit
conda create -n spec-edit python=3.12
conda activate spec-edit
pip install -e submodules/transformers
pip install -r requirements.txt

Evaluation Commands

Generating experiment configurations for grid searching:

export PYTHONPATH=$(pwd)

# Grid-searching Blazedit configurations
python eval/configs/gen_2layer_controlled_experiment.py     \
  --draft-model  "deepseek-ai/deepseek-coder-1.3b-instruct" \
  --target-model "deepseek-ai/deepseek-coder-33b-instruct"

# Grid-searching baseline configurations (PLD, regular, assited)
python eval/configs/gen_baseline_controlled_experiment.py   \
  --draft-model  "deepseek-ai/deepseek-coder-1.3b-instruct" \
  --target-model "deepseek-ai/deepseek-coder-33b-instruct"

Commands above generates bash files over different GPUs so that experiments are run in a batched, parallelized, and balanced manner.

# Blazedit experiments
bash ./eval/configs/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct/control_2layer_g0.sh   # GPU 0
bash ./eval/configs/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct/control_2layer_g1.sh   # GPU 1
# ...

# Baseline experiments
bash ./eval/configs/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct/control_baseline_g0.sh  # GPU 0
bash ./eval/configs/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct/control_baseline_g1.sh  # GPU 1
# ...

Visualize the results:

python eval/controlled_experiment.py results/deepseek-coder-33b-instruct-deepseek-coder-1.3b-instruct

Citation

@misc{blazedit,
  author = {Daita, Vijay and Lian, Xinyu and Zhang, Lingming and Liu, Jiawei},
  title = {Blazing-Fast Code Editing via Multi-Layer Speculation},
  year = {2025},
  howpublished = {\url{https://github.com/ise-uiuc/blazedit}}
}

Acknowledgement

The following resources have been helpful in developing this project:

We thank Jiankun Wang (UIUC) and Zhihao Zhang (CMU) for their insightful discussion.

Blazedit

Install / Use

README