PatchScope
Annotates files and lines of diffs (patches) with their purpose and type, and performs statistical analysis on annotation data
Install / Use
/learn @ncusi/PatchScopeREADME
PatchScope – A Modular Tool for Annotating and Analyzing Contributions
PatchScope project consists of two parts: a set of command line tools, and a web app. Command line tools annotate changed files and lines of diffs (patches) with their purpose and type, and perform statistical analysis on the generated annotation data. The web application visualizes project development using analysis generated and saved to JSON files by PatchScope's command line tools.
Note: this project was called 'python-diff-annotator' earlier in its history instead of 'PatchScope', and the python package was called 'diffannotator' instead of being called 'patchscope', so there are some references to that older name, for example in directory names in some Jupyter Notebooks.
You can find early draft of the project documentation at https://ncusi.github.io/PatchScope/.
Demo of the web application can be found at https://patchscope.mat.umk.pl/<br> and at https://patchscope-9d05e7f15fec.herokuapp.com/.
Check out our video demonstration on YouTube.
Disambiguation
There are a few research projects with a similar name:
- Rongkai Liu, Heyuan Shi, Shuning Liu, Chao Hu, Sisheng Li, Yuheng Shen, Runzhe Wang, Xiaohai Shi, Yu Jiang: "PatchScope: LLM-Enhanced Fine-Grained Stable Patch Classification for Linux Kernel" (2025) DOI:10.1145/3728944
- Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, Mor Geva: "Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models" (2024) arXiv:2401.06102
- Lei Zhao, Yuncong Zhu, Jiang Ming, Yichen Zhang, Haotian Zhang, Heng Yin: "PatchScope: Memory Object Centric Patch Diffing" (2020) DOI:10.1145/3372297.3423342
Installation
Use the package manager pip to install patchscope.
To avoid dependency conflicts, it is strongly recommended to create
a [virtual environment][venv] first, activate it, and install patchscope
into this environment.
python -m venv .venv
source .venv/bin/activate
To install the most recent version, use
python -m pip install patchscope@git+https://github.com/ncusi/PatchScope#egg=main
or (assuming that you can clone the repository with SSH)
python -m pip install patchscope@git+ssh://git@github.com/ncusi/PatchScope.git#egg=main
The command above does not install dependencies required to run the web app;
for this you need to install [web] optional dependency, for example with:
python -m pip install 'patchscope[web] @ git+https://github.com/ncusi/PatchScope#egg=main'
If you want to reproduce examples available in this repository, or those available from DagsHub, or if you want to modify PatchScope code to better suit your need, you can instead clone PatchScope repository, and install it from there.
git clone https://github.com/ncusi/PatchScope.git
cd PatchScope
python -m pip install --editable .[dev,web]
See also the "Development" section below.
Usage
This tool integrates four key components
- extracting patches from a version control system or user-provided folders<br>
either as separate step with
diff-generate, or integrated into the annotation step (diff-annotate) - applying specified annotation rules for selected patches<br>
using
diff-annotate, which generates one JSON data file per patch - generating configurable reports or summaries<br>
with various subcommands of
diff-gather-stats; each summary is saved as a single JSON file - advanced visualization with a web application (dashboard)<br>
which you can run it with
panel serve, see the description below
You can use PatchScope to analyze individual patch files, enhance patch-based datasets, and monitor contributions to repositories. The process for those use cases differs in details; here is a quick start tutorial for the case of repository contribution analysis.
Quick start: analyzing repository
The first step is to clone the repository you want to analyze, if not already present. Let's assume that you want to get insights about tqdm project development.
git clone https://github.com/tqdm/tqdm.git repos/tqdm
The clone might be bare.
Second step is to generate annotations with diff-annotate from-repo.
This might take a while for a larger repository, like the Linux kernel.
You can use Git revision selection
arguments to select changes to annotate; for example, you can use
--max-parents=1 to drop merges,
and --after=2020.01.01 to limit date range.
diff-annotate from-repo \
--output-dir=annotations/tqdm/since-2020 \
repos/tqdm --max-parents=1 --after=2020.01.01
Third step is to generate summary of annotations with diff-gather-stats,
gathering statistics into a single JSON file.
For repository visualization, you will need at least the timeline.
diff-gather-stats --annotations-dir='' \
timeline \
--purpose-to-annotation=data \
--purpose-to-annotation=documentation \
--purpose-to-annotation=markup \
--purpose-to-annotation=other \
--purpose-to-annotation=project \
--purpose-to-annotation=test \
stats/tqdm.timeline.purpose-to-type.json \
annotations/tqdm/
If you want to see Sankey flow diagram in the visualization, you need also to generate a summary of changed lines statistics.
diff-gather-stats --annotations-dir='' \
lines-stats \
--purpose-to-annotation=data \
--purpose-to-annotation=documentation \
--purpose-to-annotation=markup \
--purpose-to-annotation=other \
--purpose-to-annotation=project \
--purpose-to-annotation=test \
stats/tqdm.lines-stats.purpose-to-type.json \
annotations/tqdm/
Note that the web application assumes that generated timeline and lines-stats files follow the
*.timeline.*.jsonand*.lines-stats.*.jsonnaming convention.
You can find more information about the annotation process in "Annotation process" documentation.
Running web app (dashboard)
This package also includes a web dashboard, created using the Panel
framework. You would need to install additional dependencies, denoted [web],
as described above.
To run this web app, you can use the diffinsights-web command,
providing the directory with *.timeline.*.json files.
With the previous example, with diff-gather-stats timeline output
saved in stats/ subdirectory, it would be:
diffinsights-web stats/
Then open http://localhost:7860/?repo=tqdm in a web browser.
Currently, this web dashboard consists of two pages (two web apps), namely the Contributors Graph and the Author Statistics.
You can find the description of those two pages/apps, with screenshots, in PatchScope documentation:
Web app demo with example projects
The PatchScope repository includes annotations and gathered statistics
for a few example repositories in data/examples/
directory.
You can download data for more repos from https://dagshub.com/ncusi/PatchScope with [DVC][], see the "Examples and demos" section below.
There is a web app demo available for those repos at https://patchscope.mat.umk.pl/, and also basic demo on Heroku.
The simplest solution to run those demos locally is to clone the PatchScopes repository, and enter it:
git clone https://github.com/ncusi/PatchScope.git
cd PatchScope
Then, assuming that required [web] dependencies are installed
(in a current virtual environment),
you can run web dashboard with panel serve:
panel serve \
src/diffinsights_web/apps/contributors.py \
src/diffinsights_web/apps/author.py \
--index=contributors \
--reuse-sessions --global-loading-spinner
By default, it would make this web dashboard available at http://localhost:5006/.
There is also Dockerfile if you want to run web dashboard
using Docker (with example projects).
Examples and demos
The PatchScope repository also includes some examples demonstrating how this project works, and what it can be used for.
First time setup (for generating examples)
You can set up the environment for using this project, following
the recommended practices (described in the "Development"
section of this document), by running the examples-init.bash Bash script,
and following its instructions.
Note that this script assumes that it is run on Linux, or Linux-like system. For other operating systems, you are probably better following the steps described in this document manually.
This

