SkillAgentSearch skills...

Sist2

Lightning-fast file system indexer and search tool

Install / Use

/learn @sist2app/Sist2
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

GitHub CodeFactor Development snapshots

Demo: sist2.simon987.net

Community URL: Discord

sist2

sist2 (Simple incremental search tool)

Warning: sist2 is in early development

search panel

Features

  • Fast, low memory usage, multi-threaded
  • Manage & schedule scan jobs with simple web interface (Docker only)
  • Mobile-friendly Web interface
  • Extracts text and metadata from common file types *
  • Generates thumbnails *
  • Incremental scanning
  • Manual tagging from the UI and automatic tagging based on file attributes via user scripts
  • Recursive scan inside archive files **
  • OCR support with tesseract ***
  • Stats page & disk utilisation visualization
  • Named-entity recognition (client-side) ****

* See format support
** See Archive files
*** See OCR
**** See Named-Entity Recognition

Getting Started

Using Docker Compose (Windows/Linux/Mac)

services:
  elasticsearch:
    image: elasticsearch:7.17.9
    restart: unless-stopped
    volumes:
      # This directory must have 1000:1000 permissions (or update PUID & PGID below)
      - /data/sist2-es-data/:/usr/share/elasticsearch/data
    environment:
      - "discovery.type=single-node"
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - "PUID=1000"
      - "PGID=1000"
  sist2-admin:
    image: sist2app/sist2:x64-linux
    restart: unless-stopped
    volumes:
      - /data/sist2-admin-data/:/sist2-admin/
      - /<path to index>/:/host
    ports:
      - 4090:4090
      # NOTE: Don't expose this port publicly!
      - 8080:8080
    working_dir: /root/sist2-admin/
    entrypoint: python3
    command:
      - /root/sist2-admin/sist2_admin/app.py

Navigate to http://localhost:8080/ to configure sist2-admin.

Using the executable file (Linux/WSL only)

  1. Choose search backend (See comparison):

    • Elasticsearch: have an Elasticsearch (version >= 6.8.X, ideally >=7.14.0) instance running
      1. Download from official website
      2. (or) Run using docker:
        docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.17.9
        
    • SQLite: No installation required
  2. Download the latest sist2 release. Select the file corresponding to your CPU architecture and mark the binary as executable with chmod +x.

  3. See usage guide for command line usage.

Example usage:

  1. Scan a directory: sist2 scan ~/Documents --output ./documents.sist2
  2. Prepare search index:
    • Elasticsearch: sist2 index --es-url http://localhost:9200 ./documents.sist2
    • SQLite: sist2 sqlite-index --search-index ./search.sist2 ./documents.sist2
  3. Start web interface:
    • Elasticsearch: sist2 web ./documents.sist2
    • SQLite: sist2 web --search-index ./search.sist2 ./documents.sist2

Format support

| File type | Library | Content | Thumbnail | Metadata | |:--------------------------------------------------------------------------|:-----------------------------------------------------------------------------|:---------|:------------|:---------------------------------------------------------------------------------------------------------------------------------------| | pdf,xps,fb2,epub | MuPDF | text+ocr | yes | author, title | | cbz,cbr | libscan | - | yes | - | | audio/* | ffmpeg | - | yes | ID3 tags | | video/* | ffmpeg | - | yes | title, comment, artist | | image/* | ffmpeg | ocr | yes | Common EXIF tags, GPS tags | | raw, rw2, dng, cr2, crw, dcr, k25, kdc, mrw, pef, xf3, arw, sr2, srf, erf | LibRaw | no | yes | Common EXIF tags, GPS tags | | ttf,ttc,cff,woff,fnt,otf | Freetype2 | - | yes, bmp | Name & style | | text/plain | libscan | yes | no | - | | html, xml | libscan | yes | no | - | | tar, zip, rar, 7z, ar ... | Libarchive | yes* | - | no | | docx, xlsx, pptx | libscan | yes | if embedded | creator, modified_by, title | | doc (MS Word 97-2003) | antiword | yes | no | author, title | | mobi, azw, azw3 | libmobi | yes | yes | author, title | | wpd (WordPerfect) | libwpd | yes | no | planned | | json, jsonl, ndjson | libscan | yes | - | - |

* See Archive files

Archive files

sist2 will scan files stored into archive files (zip, tar, 7z...) as if they were directly in the file system. Recursive (archives inside archives) scan is also supported.

Limitations:

  • Support for parsing media files with formats that require seek (e.g. .gif, .mp4 w/ fragmented metadata etc.) is limitted (see --mem-buffer option)
  • Archive files are scanned sequentially, by a single thread. On systems where sist2 is not I/O bound, scans might be faster when larger archives are split into smaller parts.

OCR

You can enable OCR support for ebook (pdf,xps,fb2,epub) or image file types with the --ocr-lang <lang> option in combination with --ocr-images and/or --ocr-ebooks. Download the language data files with your package manager (apt install tesseract-ocr-eng) or directly from Github.

The sist2app/sist2 image comes with common languages (hin, jpn, eng, fra, rus, spa, chi_sim, deu, pol) pre-installed.

You c

Related Skills

View on GitHub
GitHub Stars1.2k
CategoryData
Updated6d ago
Forks73

Languages

C

Security Score

100/100

Audited on Mar 23, 2026

No findings