Rocketdocket
The fastest, cleanest, most reproducible ways to OCR a document.
Install / Use
/learn @newsdev/RocketdocketREADME
ROCKETDOCKET
OCRing a PDF requires a few steps if you'd like to do it in parallel.
Currently, we're taking in a single large PDF and producing a single text file for each page.
Decisions should be made about what to do with those text files. Put 'em in Elasticsearch? Combine them back into a giant PDF? Both? Neither?
Ghostscript
time gs\
-o file-%05d.png\
-sDEVICE=pngmono\
-dNumRenderingThreads=$(sysctl -n hw.ncpu)\
-dBandHeight=100\
-dBufferSpace=1000000000\
-dBandBufferSpace=500000000\
-sBandListStorage=memory\
-dBATCH\
-dNOPAUSE\
-dNOGC\
-r72\
original.pdf
Explanation
-o file-%05d.png: Outputs file names likefile-00405.png, one for each page.-dNumRenderingThreads=$(sysctl -n hw.ncpu): Makes a thread for every CPU (or virtual CPU) your computer claims to have.-sDEVICE=pngmono: The fastest output, in my testing, is with thepngmonoengine, which is a black-and-white PNG with no transparency. PNG is also the fastest for Tesseract to process.-dBandHeight=100 -dBufferSpace=1000000000 -dBandBufferSpace=500000000 -sBandListStorage=memory: Sets up a large amount of memory for creating PNGs.-dBATCH -dNOPAUSE -dNOGC: From the internet, things you can turn off to increase speed.-r72: Produces a 72-dpi PNG image. Smaller numbers here are faster to generate, but OCR quality decreases quite a bit as you descend below this.
Tesseract
time ls *.png | xargs -n 1 -P $(sysctl -n hw.ncpu) ./ocr.sh
Explanation
ls *.png: Produces output where each filename is on a single line.| xargs: Takes the output ofls *.pngand feeds it toxargs-n 1: Each line contains something we'd like to process, aka, a PNG file path.-P $(sysctl -n hw.ncpu):xargsshould use one process per CPU (or virtual CPU) your computer claims to have../ocr.sh: Runs whatever is inocr.shwith the name of the PNG file as the argument.
Related Skills
node-connect
351.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
