CCGrep
Code Clone Detector like grep
Install / Use
/learn @yuy-m/CCGrepREADME
CCGrep
updated on 2019/11/15, written on 2018/12/06
CCGrep is a easy-to-use code clone detector like grep command.
Description
Many code clone detectors already exist, but their installation, configuration and execution are difficult to use. CCGrep is a simple clone detector based on grep command interface and you can use it instantly.
CCGrep can detect Type 1, 2(parameterized-clone or not), 3 clones.
Document in Japanese
Demo
Output samples with target Apache HTTP Server.
ccgrep -r -l c -p n 'T a = b();' httpd-2.4.33/
httpd-2.4.33/modules/arch/unix/mod_unixd.c:141: int rv = set_group_privs();
httpd-2.4.33/modules/arch/win32/mod_isapi.c:311: apr_status_t rv = apr_get_os_error();
httpd-2.4.33/modules/cache/cache_util.c:286: apr_time_t now = apr_time_now();
httpd-2.4.33/modules/cache/mod_socache_shmcb.c:583: apr_time_t now = apr_time_now();
httpd-2.4.33/modules/cache/mod_socache_shmcb.c:687: apr_time_t now = apr_time_now();
httpd-2.4.33/modules/cache/mod_socache_shmcb.c:865: apr_time_t now = apr_time_now();
httpd-2.4.33/modules/http2/h2_session.c:1738: apr_time_t now = apr_time_now();
ccgrep -r -l c -p nf 'if($$==$$){$$}' httpd-2.4.33/
httpd-2.4.33/modules/aaa/mod_access_compat.c
230: if (wl == dl) {
231: return 1; /* matched whole thing */
232: }
httpd-2.4.33/modules/aaa/mod_access_compat.c
322: if (a->order[method] == ALLOW_THEN_DENY) {
323: ret = HTTP_FORBIDDEN;
324: if (find_allowdeny(r, a->allows, method)) {
325: ret = OK;
326: }
327: if (find_allowdeny(r, a->denys, method)) {
328: ret = HTTP_FORBIDDEN;
329: }
330: }
Requirement
- Java8
- Apache Maven (required only to build, not to use)
Install
Optional.
Without installing, ccgrep and CCGrep.jar must be in same directory.
Unix
$ ./install
Windows
$ .\install.bat
add the path C:\Program Files\ccgrep\bin to PATH if you need.
Usage
$ ccgrep [OPTIONS]... QUERY_CODE [TARGETS]...$ ccgrep [OPTIONS]... -f QUERY_FILE [TARGETS]...
Language
- C (c, h)
- C++ (cpp, cc, c++, cxx, c, h, hpp)
- Java (java)
- Python3 (py)
Options
-b,--blind <LEVEL>set blind level.- none(Type 1) / consistent(p-match)(by default) / full(Type 2).
-e <PATTERN>use PATTERN for matching.--exclude <FILE_PATTERN>skip files matching FILE_PATTERN.-f,--file <FILES>obtain query from file.- CANNOT give query as code string at once.
--fix <ID>specify identifier to match exactly the same one.-h,--helpshow help.ignore-extensionsearch all files ignoring file extensions.--include <FILE_PATTERN>search ONLY files that match FILE_PATTERN.--jsonprint clones and execution information with JSON format.-l,--language <LANG>set target language.- c / c++ / java(by default) / python3.
- With
-foption, the language can be inferred from the file extension.
-m,--max-count <NUM>stop after NUM clones.--no-messagessuppress error messages.--no-overlapsearch without overlap.-p,--print <OPTION>set printing option c/l/n/f/e like-p fn.- If
cis given, print the count of clones file by file. - If
Cis given, print ONLY the count of all clones. - If
lis given, print ONLY file name per matched files. - If
his given, NOT print file names. - If
nis given, print line numbers. - If
Nis given, print pairs of start and end line number. - If
fis given, print whole code of clones. - If
ris given, print whole code of clones in one line. - If
ois given, print only the matched parts of a clone code. - If
eis given, comment out the file name and line numbers.
- If
--parallelsearch each file in parallel.-r,--recursivetraverse directories recursively.-s,--stdin-queryobtain query from standard input. CANNOT give query as code string at once.-x,--file-matchforce QUERY to match only whole file.--xmlprint clones with XML format.
Example
- search recursively
target/forquery.java, and print head lines of clones with file name and line number.$ ccgrep -r -p n -f query.java target/
- search recursively
target/for'int a = 1;', and print whole lines of clones with file name and line number.$ ccgrep -r -p fn 'int a = 1;' target/
Note: to specify a query code, you should use SINGLE quotes ' instead of DOUBLE quotes " because the variable expansion leads to unexpected results.
Files specified invividually are searched only when they have extensions
shown in Language or are text files.
When -r is given, files in specified
directories are searched only when they have the extensions shown below.
ClonesToDetect
Clone type to detect can be set by command line option -b MODE.
none: Each token matches exact same token (Type 1 clones).full: Each identifier token matches any identifier token. also literal token (Type 2).consistent: Same asfull, except that same tokens in the query match same ones in targets (p-match clones).(by default) clones).
FixedToken $token
In a query, identifiers and literals can starts with $ (e.g. $value, $10, $"Hello").
These tokens match exact same tokens regardless of the blind level.
Note: No spaces are allowed between $ and token.
Note: You can escape $ by backslash to use it as just a character (e.g. \$ab).
You can also use --fix option to fix identifiers.
AnyTokenSequence $$
Special token $$ in a query matches any token sequences (0 or more length) that have only balanced parentheses/braces/brackets
(e.g. { set(get()).value(); }).
Example: if($$){$$ return v;} detects if-statements returning value at the end.
You can also use $# for any sequence that can contain unbalanced brackets.
RegularExpression
Special tokens can be used for detection by regular expression.
They are regular expression characters with $ at the head.
- Grouping
$(,$) - Longest Selection
$|- matches longest.
$(a$|aa$)matches wholeaaofaa.
- matches longest.
- First Selection
$/- matches first.
$(a$/aa$)matches only firstaofaa.
- matches first.
- 0-or-more
$*- matches longest.
- 1-or-more
$+- matches longest.
- 0-or-1
$? - Any token
$. - Positive Look Ahead
$=- looks ahead match.
a $= ;matchesa ;, nota =, and its result is justa(;is discarded). With$(,$),$= $( a = b ; $)looks ahead wholea = b ;.
- looks ahead match.
- Negative Look Ahead
$!- looks ahead NOT match.
a $! ;matchesa =, nota ;.
- looks ahead NOT match.
Note: Identifiers whose first appearance is in selections or repetitions are NOT used for p-match.
e.g. a$*a matches both abc and aaa. And aa$* matches aaa but not abc.
Build
$ mvn package
Licence
This software is released under the MIT License, see LICENSE.txt.
This project uses the following components.
- ANTLR 4 : BSD License
- Apache Commons CLI : Apache License, Version 2.0
- Apache Commons IO : Apache License, Version 2.0
Author
Related Skills
node-connect
350.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
