Qapyq
An image viewer and AI-assisted editing/captioning/masking tool that helps with curating datasets for generative AI models, finetunes and LoRA.
Install / Use
/learn @FennelFetish/QapyqREADME
qapyq
<sup>(CapPic)</sup><br /> An image viewer and AI-assisted editing tool that helps with curating datasets for generative AI models, finetunes and LoRA.
<br clear="left"/> <br /><br />
<a href="https://camo.githubusercontent.com/059f5cef1671955473d5d3e096263cf85910a2d52094c389b2924cca1b1a33c5/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f647261672d6e2d64726f702e676966"><img alt="Edit captions quickly with drag-and-drop support" src="https://camo.githubusercontent.com/059f5cef1671955473d5d3e096263cf85910a2d52094c389b2924cca1b1a33c5/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f647261672d6e2d64726f702e676966" width="30%"></img></a> <a href="https://camo.githubusercontent.com/71df5556ba81a944f3a28ed3760644b6f7c0c455b4ed639a80418b51d0cae704/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f7461675f6d75742d6578636c75736976652e676966"><img alt="Select one-of-many" src="https://camo.githubusercontent.com/71df5556ba81a944f3a28ed3760644b6f7c0c455b4ed639a80418b51d0cae704/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f7461675f6d75742d6578636c75736976652e676966" width="30%"></img></a> <a href="https://camo.githubusercontent.com/9403e354708969d4c5f1262583294913bd238e8b38df640e2a7fc36a313bf686/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f72756c65732e676966"><img alt="Apply sorting and filtering rules" src="https://camo.githubusercontent.com/9403e354708969d4c5f1262583294913bd238e8b38df640e2a7fc36a313bf686/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f72756c65732e676966" width="30%"></img></a>
<a href="https://camo.githubusercontent.com/74122b177a2f5a1cd4add5d749b90a49ac2f0cec631363ef861199a7c90566d7/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f63726f702e676966"><img alt="Quick cropping" src="https://camo.githubusercontent.com/74122b177a2f5a1cd4add5d749b90a49ac2f0cec631363ef861199a7c90566d7/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f63726f702e676966" width="30%"></img></a> <a href="https://camo.githubusercontent.com/d15df56575d4d69fe2cc04c5ed822e6cc95c0208185df3464a21fc351c4b04fb/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f636f6d706172652e676966"><img alt="Image comparison" src="https://camo.githubusercontent.com/d15df56575d4d69fe2cc04c5ed822e6cc95c0208185df3464a21fc351c4b04fb/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f636f6d706172652e676966" width="30%"></img></a> <a href="https://camo.githubusercontent.com/1583a08a56e63f4d6dae0c59f6572310558bb7c9f8e7b79e7e4e53af6e2663ee/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f6d61736b2d322e676966"><img alt="Draw masks manually or apply automatic detection and segmentation" src="https://camo.githubusercontent.com/1583a08a56e63f4d6dae0c59f6572310558bb7c9f8e7b79e7e4e53af6e2663ee/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f6d61736b2d322e676966" width="30%"></img></a>
<a href="https://camo.githubusercontent.com/b6cf81d56d9d4e9e2bbc8fb031e03e9380bc0a5c5e47b4885edd8ff0cc043b6b/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f636f6e642d666f6f74776561722d686169722d666c6f6f722e676966"><img alt="Transform tags using conditional rules" src="https://camo.githubusercontent.com/b6cf81d56d9d4e9e2bbc8fb031e03e9380bc0a5c5e47b4885edd8ff0cc043b6b/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f636f6e642d666f6f74776561722d686169722d666c6f6f722e676966" width="30%"></img></a> <a href="https://camo.githubusercontent.com/b094b255ba1d18d83253dba4f7f813ac6d64ea6cedac7330437eb0791479f062/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f6d756c7469656469742d666f6375732d636f6d707265737365642e676966"><img alt="Multi-Edit and Focus Mode" src="https://camo.githubusercontent.com/b094b255ba1d18d83253dba4f7f813ac6d64ea6cedac7330437eb0791479f062/68747470733a2f2f7777772e616c6368656d697374732e63682f71617079712f6d756c7469656469742d666f6375732d636f6d707265737365642e676966" width="60%"></img></a>
Features
-
Image Viewer: Display and navigate images
- Quick-starting desktop application built with Qt
- Runs smoothly with a million images
- Modular interface that lets you place windows on different monitors
- Open multiple tabs
- Zoom/pan and fullscreen mode
- Gallery with thumbnails and optionally captions <sup>?</sup>
- Semantic image sorting with text prompts <sup>?</sup>
- Compare two images <sup>?</sup>
- Measure size, area and pixel distances <sup>?</sup>
- Slideshow <sup>?</sup>
-
Image/Mask Editor: Prepare images for training
- Crop and save parts of images <sup>?</sup>
- Scale images, optionally using AI upscale models <sup>?</sup>
- Dynamic save paths with template variables <sup>?</sup>
- Manually edit masks with multiple layers <sup>?</sup>
- Generate masks with AI models <sup>?</sup>
- Record masking operations into macros <sup>?</sup>
- VAE-encode images and check their latent representation <sup>?</sup>
-
Captioning: Describe images with text
- Edit captions manually with drag-and-drop support <sup>?</sup>
- Save multiple captions in a JSON file per image <sup>?</sup>
- Multi-Edit Mode: Edit captions of multiple images simultaneously <sup>?</sup>
- Focus Mode: Add the same tags to many files quickly <sup>?</sup>
- Tag grouping, merging, sorting, filtering and replacement rules <sup>?</sup>
- Colored text highlighting
- Autocomplete with tags from your groups and CSV files <sup>?</sup>
- CLIP Token Counter <sup>?</sup>
- Automated captioning with support for grounding <sup>?</sup>
- Dynamic prompts with templates and text transformations <sup>?</sup>
- Multi-turn conversations with VLMs <sup>?</sup>
- Further refinement with LLMs
-
Stats/Filters: Summarize your data and get an overview
-
Batch Processing: Process whole folders at once
- Flexible batch captioning, tagging and transformation <sup>?</sup>
- Batch scaling of images
- Batch masking with user-defined macros
- Batch cropping of images using your macros
- Copy, move and rename files, create symlinks, ZIP captions for backups
-
AI Assistance:
- Support for state-of-the-art captioning and masking models
- Model and sampling settings, GPU acceleration with CPU offload support
- On-the-fly NF4 and INT8 quantization
- Run inference locally and/or on multiple remote machines over SSH <sup>?</sup>
- Separate inference subprocess isolates potential crashes and allows complete VRAM cleanup
Supported Models
These are the supported architectures with links to the original models.<br> Find more specialized finetuned models on huggingface.co.
-
Tagging<br> Models for generating keyword captions for images.
- JoyTag
- PixAI Tagger (onnx)
- WD (onnx) (eva02 recommended)
-
Captioning<br> Models for generating complete-sentence captions for images.
- Florence-2
- Gemma3 (GGUF)
- InternVL2, InternVL2.5, [InternVL2.5-MPO](https://huggingface.co/collecti
