Rocketdocket

The fastest, cleanest, most reproducible ways to OCR a document.

Generate Convert Improve

Install / Use

/learn @newsdev/Rocketdocket

About this skill

Quality Score

0/100

README

ROCKETDOCKET

OCRing a PDF requires a few steps if you'd like to do it in parallel.

Currently, we're taking in a single large PDF and producing a single text file for each page.

Decisions should be made about what to do with those text files. Put 'em in Elasticsearch? Combine them back into a giant PDF? Both? Neither?

Ghostscript

time gs\
    -o file-%05d.png\
    -sDEVICE=pngmono\
    -dNumRenderingThreads=$(sysctl -n hw.ncpu)\
    -dBandHeight=100\
    -dBufferSpace=1000000000\
    -dBandBufferSpace=500000000\
    -sBandListStorage=memory\
    -dBATCH\
    -dNOPAUSE\
    -dNOGC\ 
    -r72\
    original.pdf

Explanation

-o file-%05d.png: Outputs file names like file-00405.png, one for each page.
-dNumRenderingThreads=$(sysctl -n hw.ncpu): Makes a thread for every CPU (or virtual CPU) your computer claims to have.
-sDEVICE=pngmono: The fastest output, in my testing, is with the pngmono engine, which is a black-and-white PNG with no transparency. PNG is also the fastest for Tesseract to process.
-dBandHeight=100 -dBufferSpace=1000000000 -dBandBufferSpace=500000000 -sBandListStorage=memory: Sets up a large amount of memory for creating PNGs.
-dBATCH -dNOPAUSE -dNOGC: From the internet, things you can turn off to increase speed.
-r72: Produces a 72-dpi PNG image. Smaller numbers here are faster to generate, but OCR quality decreases quite a bit as you descend below this.

Tesseract

time ls *.png | xargs -n 1 -P $(sysctl -n hw.ncpu) ./ocr.sh

Explanation

ls *.png: Produces output where each filename is on a single line.
| xargs: Takes the output of ls *.png and feeds it to xargs
-n 1: Each line contains something we'd like to process, aka, a PNG file path.
-P $(sysctl -n hw.ncpu): xargs should use one process per CPU (or virtual CPU) your computer claims to have.
./ocr.sh: Runs whatever is in ocr.sh with the name of the PNG file as the argument.

Related Skills

node-connect

351.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.6k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。