Duplo
Duplicates finder for various source code formats.
Install / Use
/learn @dlidstrom/DuploREADME
Duplo - Duplicate Source Code Block Finder <!-- omit in toc -->
This tool is seriously fast - it will process your codebase in seconds!
Updates:
- 🔥 v2.1 has been released!
- add support for Erlang (thanks @mrrubinos!)
- v2.0
🚀 Duplo just got a major speed boost! With new multithreading support, it now takes full advantage of modern CPUs to scan and detect duplicates faster than ever ⚡️ (thanks @cgkantidis!) - v1.2 support for reporting in json format (thanks @cgkantidis!)
- Performance improvements: it now takes ~9s to process Quake 2 source! (thanks @cgkantidis!)
- refactored xml/json reports for improved maintainability
- Introducing duplo-action for using Duplo in GitHub Actions!
- 👉 This action is now used to analyse duplicates in this repository.
- v1.1 improve memory usage (grabbed from @nachose fork), also re-enabled tests and organized code
- 🚀 v1.0 add build on Windows (thanks @chausner!)
- v0.8 adds improved Java support
🙌 Help needed! See 8.3 on how to support more languages.
For the impatient:
Find duplicated code blocks in a C++ codebase. Ignore pre-processor directives and require minimum 20 line duplicates.
Linux
# download latest release
curl -s "https://api.github.com/repos/dlidstrom/Duplo/releases/latest" \
| grep "browser_download_url" \
| grep "duplo-linux" \
| cut -d : -f 2,3 \
| tr -d '"' \
| wget -qi -
unzip duplo-linux.zip
find . -type f \( -iname "*.cpp" -o -iname "*.h" \) | ./duplo -ml 20 -ip - -
macOS
# download latest release
curl -s "https://api.github.com/repos/dlidstrom/Duplo/releases/latest" \
| grep "browser_download_url" \
| grep "duplo-macos" \
| cut -d : -f 2,3 \
| tr -d '"' \
| wget -qi -
unzip duplo-macos.zip
find . -type f \( -iname "*.cpp" -o -iname "*.h" \) | ./duplo -ml 20 -ip - -
Windows
> $url = (Invoke-RestMethod https://api.github.com/repos/dlidstrom/Duplo/releases/latest).assets.browser_download_url `
| ? { $_ -match "windows" }
> Invoke-WebRequest -Uri $url -OutFile duplo-windows.zip
> Expand-Archive ./duplo-windows.zip -DestinationPath Duplo.exe
> Get-ChildItem -Include "*.cpp", "*.h" -Recurse | % { $_.FullName } | ./Duplo.exe -ml 20 -ip - -
Table of Contents:
- 1. General Information
- 2. Maintainer
- 3. File Format Support
- 4. Installation
- 5. Usage
- 6. Feedback and Bug Reporting
- 7. Algorithm Background
- 8. Developing
- 9. Changes
- 10. Accompanying Software
- 11. License
- 12. Stargazers over time
1. General Information
Duplicated source code blocks can harm maintainability of software systems. Duplo is a tool to find duplicated code blocks in large code bases. Duplo has special support for some programming languages, meaning it can filter out (multi-line) comments and compiler directives. For example: C, C++, Java, C#, and VB.NET. Any other text format is also supported.
2. Maintainer
Duplo was originally developed by Christian M. Ammann and is now maintained and developed by Daniel Lidström.
3. File Format Support
Duplo has built in support for the following file formats:
- C/C++ (.c, .cpp, .cxx, .h, .hpp)
- Java
- C#
- VB
- GCC assembly
- Ada
This means that Duplo will remove preprocessor directives, block comments, using statements, etc, to only consider duplicates in actual code. In addition, Duplo can be used as a general (without special support) duplicates detector in arbitrary text files and will even detect duplicates found in the same file.
Sample output snippet:
...
src\engine\geometry\simple\TorusGeometry.cpp(56)
src\engine\geometry\simple\SphereGeometry.cpp(54)
pBuffer[currentIndex*size+3]=(i+1)/(float)subdsU;
pBuffer[currentIndex*size+4]=j/(float)subdsV;
currentIndex++;
pPrimitiveBuffer->unlock();
src\engine\geometry\subds\SubDsGeometry.cpp(37)
src\engine\geometry\SkinnedMeshGeometry.cpp(45)
pBuffer[i*size+0]=m_ct[0]->m_pColors[i*3];
pBuffer[i*size+1]=m_ct[0]->m_pColors[i*3+1];
pBuffer[i*size+2]=m_ct[0]->m_pColors[i*3+2];
...
4. Installation
4.1. Pre-built binaries
Duplo is also available as a pre-built binary for (Alpine) Linux, macOS and Windows. Grab the executable from the releases page.
You can of course build from source as well.
5. Usage
Duplo works with a list of files. You can either specify a file that contains
the list of files, or you can pass them using stdin.
Run duplo --help on the command line to see the detailed options.
5.1. Passing files using stdin
In each of the following commands, duplo will write the duplicated blocks into
out.txt in addition to the information written to stdout.
5.1.1. Bash
# unix
> find . -type f \( -iname "*.cpp" -o -iname "*.h" \) | duplo - out.txt
Let's break this down. find . -type f \( -iname "*.cpp" -o -iname "*.h" \) is
a syntax to look recursively in the current directory (the . part) for files
(the -type f part) matching *.cpp or *.h (case insensitive). The output
from find is piped into duplo which then reads the filenames from stdin
(the - tells duplo to get the filenames from stdin, a common unix
convention in many commandline applications). The result of the analysis is then
written to out.txt.
5.1.2. Windows
# windows
> Get-ChildItem -Include "*.cpp", "*.h" -Recurse | % { $_.FullName } | Duplo.exe - out.txt
This works similarly to the Bash command, but uses PowerShell commands to achieve the same effect.
5.2. Passing files using file
duplo can analyze files specified in a separate file:
# unix
> find . -type f \( -iname "*.cpp" -o -iname "*.h" \) > files.lst
> duplo files.lst out.txt
# windows
> Get-ChildItem -Include "*.cpp", "*.h" -Recurse | % { $_.FullName } | Out-File -encoding ascii files.lst
> Duplo.exe files.lst out.txt
Again, the duplicated blocks are written to out.txt.
5.3. Json output
Using -json <filename> you can output the result as json. This may be useful
if you want to process the result further.
5.4. Xml output
Duplo can also output xml and there is a stylesheet that will format the result for viewing in a browser. This can be used as a report tab in your continuous integration tool (GitHub Actions, TeamCity, etc).
6. Feedback and Bug Reporting
Please open an issue to discuss feedback, feature requests and bug reports.
7. Algorithm Background
Duplo uses the same techniques as Duploc to detect duplicated code blocks. See Duca99bCodeDuplication for further information.
7.1. Performance Measurements
| System | Files | Loc's | Time | |-|-|-|-| | Quake2 | 266 | 102740 | 9sec |
This was measured on modern hardware anno 2025. It means Duplo is able to process more than 10 thousand lines of code (or ~30 files) per second.
Note that this is single-threaded performance. Duplo now supports using multiple threads (
-joption) with an almost linear performance improvement.
8. Developing
8.1. Unix
You need CMake and preferrably fswatch for the best experience.
# build dependencies
/> brew install cmake
/> brew install fswatch
Compiling is best done using the continuous file watcher:
# CMake builds in the build folder
/> mkdir build
/> pushd build
build/> cmake ..
# now issue make
build/> make
build/> popd
# continuous build can now be used in root folder
# (needs fswatch)
> ./watch.sh
8.2. Windows
Use Visual Studio 2022 to open the included solution file (or try CMake).
8.3. Additional Language Support
Duplo can analyze all text files regardless of format, but it has special support for some programming languages (C++, C#, Java, for example). This allows Duplo to improve the duplication detection as it can ignore preprocessor directives and/or comments.
To implement support for a new language, there are a couple of options:
- Implement
FileTypeBasewhich has support for handling comments and preprocessor directives. You just need to decide what is a comment. With this option you need to implement a couple of methods, one which isCreateLineFilter. This is to remove multiline comments. Look atCstyleCommentsFilterfor an example. - Implement
IFileTypeinterface directly. This gives you the most freedom but also is the hardest option.
You can see an example of how Java support was added effortlessly. It involves copying an existing file type implementation and adjusting the lines that should be filtered and how comments should be removed.
Related Skills
node-connect
349.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
