Cbird
Command-line program for Content-Based Image Retrieval of images and videos. Includes tools for general search and de-duplication.
Install / Use
/learn @scrubbbbs/CbirdREADME

About cbird
cbird is a command-line program for finding duplicate images and videos that cannot be found by general methods such as file hashing. Content-Based Image Recognition (CBIR) is used, which examines the pixels of files to get comparable features and "perceptual" hash codes.
The main features are:
- Image and video search
- Cached search index
- GUI for evaluating duplicates
- Huge index/database is possible
- Comprehensive file format support
- Search inside zip files
- Hardware video decoding including multiple decoders/gpus
Installing
Compile
Compile it yourself using my detailed notes.
Download
Linux AppImage
Add execute permission and run
chmod +x cbird-0.8.0-x86_64.AppImage
./cbird-0.8.0-x86_64.AppImage -install # optional install helper
cbird [...]
- Required packages: trash-cli
- Optional packages: ocenaudio, kdenlive
AppImage Issues
- AppImage won't run (debian, Ubuntu, etc)
apt install libfuse2
- Missing libOpenGL.so.0
- debian:
apt install libopengl0 - redhat:
yum install libglvnd-opengl
- debian:
- Missing window titlebar (Fedora)
cbird -platform wayland-egl [...]
Mac OS X 11.0+ x86
- Unzip the distribution file and run
cbird/cbird-mac [...]
Windows 10+
- Unzip the distribution file and run the program
- Install helpers (optional): vlc, kdenlive
Windows PowerShell
Optional: create shortcuts for cbird
- Unzip into C:\ so you have C:\cbird\cbird.exe
- Enable script execution
- Run PowerShell as administrator
- Enter
Set-Execution-Policy RemoteSigned - Close PowerShell
- Create profile script (if you don't have one)
- Run PowerShell normally
New-Item -Type File $PROFILE -Force
- Add cbird shortcut to profile
OpenWith $PROFILESet-Alias -Name cbird -Value C:\cbird\cbird.exe
- Shortcut for pictures folder
function cbird-pics {cbird -use $HOME\Pictures $args}
Getting Started
Get Help
cbird -helpis very detailed, copied here for your convenience- github wiki has more details
- some commands that take a value will also accept "help" to get more info, e.g.
-i.algos help -list-*commands give more info about settings/configuration
Index the files in <path>, caching into <path>/_index
cbird -use <path> -create -update
Index files in cwd
cbird -create -update
Update existing index in cwd
cbird -update
Show exact duplicates (MD5 checksums)
cbird -dups -show
Search cwd, default threshold
cbird -similar -show
Search cwd, lowest threshold
cbird -p.dht 1 -similar -show
Using the GUI
This is lacking documentation at the moment. But for now...
- The GUI is displayed with
-showif there is a selection or results. - GUI windows have a context menu (right click) with all available actions.
- The two deletion actions ("Delete" and "Replace") use the trash/recycler by default. There is no way to permanently delete files (not even with batch deletion commands)
- You can modify keybindings in the config file (see
-aboutfor the location)
Use Cases
- Find exact duplicates (file checksums)
- Find modifications
- General transforms: resize, rotate, crop
- Image edits: blur, sharpen, noise, color-grade, grayscale
- Video edits: clipping, fps change, letter boxing
- Evaluate matches
- Compare attributes (resolution, file size, compression ratio)
- Flip between matches
- Zoom-in to see fine details, compression losses
- False-color visualization of differences for fast evaluation
- No-reference subjective quality estimate
- Jpeg compression quality estimate
- Align videos temporally, flip between them or play side-by-side
- File management
- Sort/rename based on similarity
- Rename files using regular expressions
- Move/rename files/directories within the index
File Formats
Common formats are supported, as well as many obscure formats. The available formats will ultimately vary based on the configuration of Qt and FFmpeg.
cbird -about lists the image and video extensions. Note that video extensions are not checked against FFmpeg at runtime, so they could be unavailable.
Additionally, zip files are supported for images.
To get the most formats you will need to compile FFmpeg and Qt with the necessary options. Additional image formats are also available with kimageformats.
Link Handling
Links are ignored by default. To follow links, use the index option -i.links 1
If the search path contains links, they are only considered when scanning for changes (-update), otherwise there is no special treatment. For example, deleting a link is the same as any other deletion operation.
Duplicate inodes are not followed by default. If there are duplicate inodes in the tree, the first inode in breadth-first traversal is indexed. To follow all inodes, for example to find duplicate hard links, use -i.dups 1.
The index stores relative paths (to the indexed/root path), this makes the index stable if the parent directory changes. However, if a path contains links, or is a link itself, it is stored as-is; which may be less stable than the storing the link target. To store the resolved links instead, use i.resolve 1. This is only possible if the link target is a child of the index root.
Note that cbird does not not prevent broken links from occurring, the link check is temporary during the index update.
Using Weeds
The "weed" feature allows fast deletion of deleted files that reappear in the future. A weed record is a pair of file checksums, one is the weed/deleted file, the other is the original/retained file. When the weed shows up again, it can be deleted without inspection (-nuke-weeds)
How weeds are recorded
- Two files are examined (matching pair) -- use
-p.mm 1or-p.eg 1to force pairs - Neither file is a zip member
- When one of the two files is deleted, it is marked as a weed of the first one
Broken weeds
There is nothing to prevent deletion of the original/retained file, so the weed record can become invalidated. If the original is no longer present, the association can be unset with the "Forget Weed" command.
cbird -weeds -show # show all weeds
cbird -nuke-weeds # delete all weeds
cbird -similar -with isWeed true # isolate weeds in search results
Environment Variables
There are a few for power users.
CBIRD_SETTINGS_FILEoverrides the path to the settings file (cbird -aboutshows the default)CBIRD_TRASH_DIRoverrides the path to trash folder, do not use the system trash binCBIRD_CONSOLE_WIDTHset character width of terminal console (default auto-detect)CBIRD_FORCE_COLORSuse colored output even if console is not detectedCBIRD_NO_COLORSdisable colored outputCBIRD_LOG_TIMESTAMPadd time delta to log messagesCBIRD_NO_BUNDLED_PROGSdo not use bundled programs like ffmpeg in the appimage/binary distributionQT_IMAGE_ALLOC_LIMIT_MBmaximum memory allocation for image files (default 256)QT_SCALE_FACTORglobal scale factor for UITMPDIRoverride default directory for temporary files; used for opening zip file contentsCBIRD_MAXIMIZE_HACKset if window manager/qt is not restoring maximized windows (default auto-detect)
Wish List, Bugs, Etc
Check the development notes for known bugs and feature ideas.
Report bugs or request features on github
Search Algorithms
There are several algorithms, some are better than others depending on the situation.
Discrete Cosine Transform (DCT) Hash (-p.alg dct)
Uses one 64-bit hash per image, similar to pHash. Very fast, good for rescaled images and lightly cropped images.
DCT Features -p.alg fdct
Uses DCT hashes centered on scale/rotation invariant features, up to 400 per image. Good for heavily cropped images, much faster than ORB.
Oriented Rotated Brief (ORB) Descriptors -p.alg orb
Uses 256-bit scale/rotation invariant feature descriptors, up to 400 per image. Good for rotated and cropped images, but slow.
Color Histogram -p.alg color
Uses histogram of up to 32 colors (256-byte) per image. Sometimes works when all else fails. This is the only algorithm that finds reflected images, others require -p.refl and must rehash the reflected image (very slow)
DCT Video Index -p.alg video
Uses DCT hashes of video frames. Frames are preprocessed to remove letterboxing. Can also find video thumbnails in the source video since they have the same hash type.
Template Matcher -p.tm 1
Filters results with a high resolution secondary matcher that finds the exact overlap of an image pair. This is most useful to drop poor matches from fdct and orb. Since it requires decompressing the source/destination image it is extremely slow. It can help to reduce the maximum number of matches per image with -p.mm #
How it Performs
Indexing
Indexing happens when -update is used. It can take a while the first time, however subsequent updates only consider changes.
Unused algorithms can be disabled to speed up indexing. If you have large images, you may as well enable all algorithms because image decompression dominates the process.
Table 1: Indexing 1000 6000px images, 8 GB, SSD
Arguments | Note | Time (seconds) --------------------|----------------|------ -update | all enabled | 46 -i.algos 0 -update | md5
