Sist2
Lightning-fast file system indexer and search tool
Install / Use
/learn @sist2app/Sist2README
Demo: sist2.simon987.net
Community URL: Discord
sist2
sist2 (Simple incremental search tool)
Warning: sist2 is in early development

Features
- Fast, low memory usage, multi-threaded
- Manage & schedule scan jobs with simple web interface (Docker only)
- Mobile-friendly Web interface
- Extracts text and metadata from common file types *
- Generates thumbnails *
- Incremental scanning
- Manual tagging from the UI and automatic tagging based on file attributes via user scripts
- Recursive scan inside archive files **
- OCR support with tesseract ***
- Stats page & disk utilisation visualization
- Named-entity recognition (client-side) ****
* See format support
** See Archive files
*** See OCR
**** See Named-Entity Recognition
Getting Started
Using Docker Compose (Windows/Linux/Mac)
services:
elasticsearch:
image: elasticsearch:7.17.9
restart: unless-stopped
volumes:
# This directory must have 1000:1000 permissions (or update PUID & PGID below)
- /data/sist2-es-data/:/usr/share/elasticsearch/data
environment:
- "discovery.type=single-node"
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- "PUID=1000"
- "PGID=1000"
sist2-admin:
image: sist2app/sist2:x64-linux
restart: unless-stopped
volumes:
- /data/sist2-admin-data/:/sist2-admin/
- /<path to index>/:/host
ports:
- 4090:4090
# NOTE: Don't expose this port publicly!
- 8080:8080
working_dir: /root/sist2-admin/
entrypoint: python3
command:
- /root/sist2-admin/sist2_admin/app.py
Navigate to http://localhost:8080/ to configure sist2-admin.
Using the executable file (Linux/WSL only)
-
Choose search backend (See comparison):
- Elasticsearch: have an Elasticsearch (version >= 6.8.X, ideally >=7.14.0) instance running
- Download from official website
- (or) Run using docker:
docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.17.9
- SQLite: No installation required
- Elasticsearch: have an Elasticsearch (version >= 6.8.X, ideally >=7.14.0) instance running
-
Download the latest sist2 release. Select the file corresponding to your CPU architecture and mark the binary as executable with
chmod +x. -
See usage guide for command line usage.
Example usage:
- Scan a directory:
sist2 scan ~/Documents --output ./documents.sist2 - Prepare search index:
- Elasticsearch:
sist2 index --es-url http://localhost:9200 ./documents.sist2 - SQLite:
sist2 sqlite-index --search-index ./search.sist2 ./documents.sist2
- Elasticsearch:
- Start web interface:
- Elasticsearch:
sist2 web ./documents.sist2 - SQLite:
sist2 web --search-index ./search.sist2 ./documents.sist2
- Elasticsearch:
Format support
| File type | Library | Content | Thumbnail | Metadata |
|:--------------------------------------------------------------------------|:-----------------------------------------------------------------------------|:---------|:------------|:---------------------------------------------------------------------------------------------------------------------------------------|
| pdf,xps,fb2,epub | MuPDF | text+ocr | yes | author, title |
| cbz,cbr | libscan | - | yes | - |
| audio/* | ffmpeg | - | yes | ID3 tags |
| video/* | ffmpeg | - | yes | title, comment, artist |
| image/* | ffmpeg | ocr | yes | Common EXIF tags, GPS tags |
| raw, rw2, dng, cr2, crw, dcr, k25, kdc, mrw, pef, xf3, arw, sr2, srf, erf | LibRaw | no | yes | Common EXIF tags, GPS tags |
| ttf,ttc,cff,woff,fnt,otf | Freetype2 | - | yes, bmp | Name & style |
| text/plain | libscan | yes | no | - |
| html, xml | libscan | yes | no | - |
| tar, zip, rar, 7z, ar ... | Libarchive | yes* | - | no |
| docx, xlsx, pptx | libscan | yes | if embedded | creator, modified_by, title |
| doc (MS Word 97-2003) | antiword | yes | no | author, title |
| mobi, azw, azw3 | libmobi | yes | yes | author, title |
| wpd (WordPerfect) | libwpd | yes | no | planned |
| json, jsonl, ndjson | libscan | yes | - | - |
* See Archive files
Archive files
sist2 will scan files stored into archive files (zip, tar, 7z...) as if they were directly in the file system. Recursive (archives inside archives) scan is also supported.
Limitations:
- Support for parsing media files with formats that require seek (e.g.
.gif,.mp4w/ fragmented metadata etc.) is limitted (see--mem-bufferoption) - Archive files are scanned sequentially, by a single thread. On systems where sist2 is not I/O bound, scans might be faster when larger archives are split into smaller parts.
OCR
You can enable OCR support for ebook (pdf,xps,fb2,epub) or image file types with the
--ocr-lang <lang> option in combination with --ocr-images and/or --ocr-ebooks.
Download the language data files with your package manager (apt install tesseract-ocr-eng) or
directly from Github.
The sist2app/sist2 image comes with common languages
(hin, jpn, eng, fra, rus, spa, chi_sim, deu, pol) pre-installed.
You c
Related Skills
feishu-drive
340.5k|
things-mac
340.5kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
340.5kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
postkit
PostgreSQL-native identity, configuration, metering, and job queues. SQL functions that work with any language or driver
