CodePlan
Data and evaluation scripts for "CodePlan: Repository-level Coding using LLMs and Planning", FSE 2024
Install / Use
/learn @microsoft/CodePlanREADME
Replication Package
For the paper titled "CodePlan: Repository-level Coding using LLMs and Planning", accepted FSE 2024.
Pre-print: CodePlan paper
Contents
This package contains the following
data/: Benchmark of repository edits along with output of our approach and baselines.scripts/: Script to compute key metrics presented in the paper.
Data
Data for each repository edit is presented in different directory with the names used in the paper.
Each these contains three sub-directories - (1) source: repository before the edit, target: repository after the ground truth edit and pred: output of our approach and baselines.
pred contains different sub-directories for each approach, where each subdirectory contains repo: state of the repository after edits, blocks: matched, missed and spurious blocks in the output, metrics.json: all metrics computed for the repo, diff.html: pretty printed textual diff of the output and source (along with diff between source and target for comparison).
data
| -- t1
| -- source
| -- target
| -- pred
| -- codeplan
| -- repo
| -- blocks
| -- metrics.json
| -- diff.html
| -- repair
| -- repo
| -- blocks
| -- diff.html
...
...
Quick Start!
To inspect the metrics for a particular repository edit, just navigate to its directory and open metrics.json.
To inspect the code change, you can similarly open diff.html
Replicating results
Code Setup
The scripts require python>=3.11 along with the following packages -
tqdm
evaluate
textdistance
Computing metrics
The main script for computing metrics is scripts/eval.py with the following options -
usage: eval.py [-h] [--repo {ext1,ext2,t1,t2,t3}]
[--approach APPROACH] [--all]
[--levenstein] [--save_default]
[--save_path SAVE_PATH] [--verbose]
[--debug]
options:
-h, --help show this help message and exit
--repo {ext1,ext2,t1,t2,t3}, -r {ext1,ext2,t1,t2,t3}
Repo name to compute metrics
for.
--approach APPROACH, -a APPROACH
Approach name to compute metrics
for.
--all Compute metrics for all
approaches on all repos.
--levenstein Compute levenstein distance
metric. Note that this may take
a really long time
--save_default Save metrics to default
location.
--save_path SAVE_PATH, -s SAVE_PATH
Path to save json with all
computed metrics to.
--verbose, -v Enable verbose (info) logging.
--debug, -d Enable debug logging.
Note that the names of the approach must match one of the sub-directories present within the pred directory for the repo being evaluated.
For example to compute metrics for approach codeplan on repo t1 and save the results to t1_codeplan_stats.json the following command can be used -
python scripts/eval.py --repo t1 --approach codeplan --save_path t1_codeplan_stats.json
This will compute text metrics (DiffBLEU, Levenshtein Distance) and block metrics (matched, missing spurious), print out a summary corresponding to a row in Table 3 of the paper and store detailed file-wise metrics to the provided path.
Related Skills
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
