SkillAgentSearch skills...

OCR4Linux

OCR CLI Tool for Extracting Text from Screenshots (images) using bash, and python scripts for both x11 and wayland

Install / Use

/learn @moheladwy/OCR4Linux

README

OCR4Linux

Version: 1.5.0

OCR4Linux is a versatile text extraction tool that allows you to take a screenshot of a selected area, extract text using OCR, and copy it to the clipboard. It supports both Wayland and X11 sessions and offers multiple language support.

Note: This script is currently only made for Arch Linux. It may work on other arch-based distributions, but it has not been tested yet.

Motivation

I didn't find any easy tool in Linux that does the same thing as the PowerToys app in Windows. This motivated me to create OCR4Linux, a simple and efficient tool to capture screenshots, extract text, and copy it to the clipboard, all in one seamless process.

Features

  • Screenshot Capture

    • Wayland support via grimblast
    • X11 support via scrot
    • Configurable screenshot directory
  • Text Extraction

    • Interactive language selection via rofi
    • Multi-language OCR support with custom language combinations
    • Automatic language detection fallback
    • Image preprocessing for better accuracy
    • UTF-8 text output
  • Clipboard Integration

    • Wayland: wl-copy and cliphist
    • X11: xclip
  • Additional Features

    • Interactive language selection menu
    • Optional screenshot retention
    • Comprehensive logging system
    • Command-line interface

Requirements

System Requirements

  • Arch Linux or arch-based distribution

  • Python 3.x

  • yay package manager (will be installed if needed)

  • tesseract OCR engine

  • tesseract-data-eng English language pack

  • tesseract-data-ara Arabic language pack

  • rofi for the interactive language selection feature.

  • If you need any other language other than the above two, search for it using the command:

    sudo pacman -Ss tesseract-data-{lang}
    

Python Dependencies

  • python-pillow
  • python-pytesseract

Session-Specific Requirements

  • Wayland:
    • grimblast-git
    • wl-clipboard
    • cliphist
  • X11:
    • scrot
    • xclip

Installation

Option 1: Install from AUR (Recommended)

The easiest way to install OCR4Linux on Arch Linux or any Arch-based distribution is directly from the AUR using any AUR helper (e.g., yay, paru):

yay -S ocr4linux-git

This will automatically install OCR4Linux and all its required dependencies.

Option 2: Build from Source (makepkg)

You can clone the repository and build the package manually using makepkg:

  1. Clone the repository:

    git clone https://github.com/moheladwy/OCR4Linux.git
    cd OCR4Linux
    
  2. Build and install the package:

    makepkg -si
    

Option 3: Manual Installation (setup.sh)

If you prefer a local installation in your home directory or want to use the automated setup script:

  1. Clone the repository:

    git clone https://github.com/moheladwy/OCR4Linux.git
    cd OCR4Linux
    
  2. Run the setup script:

    chmod +x setup.sh
    ./setup.sh
    

    Note: The setup script will:

    • Prompt you to confirm before proceeding with the manual installation
    • Install all required dependencies (tesseract, rofi, screenshot tools, etc.)
    • Copy all OCR4Linux files to ~/.config/OCR4Linux/
    • Set up the necessary directory structure

Usage

  1. Run the tool to take a screenshot, extract text, and copy it to the clipboard:

    If installed via AUR or makepkg:

    OCR4Linux
    

    If installed via setup.sh:

    ~/.config/OCR4Linux/OCR4Linux.sh
    

    Or if you're in the source directory:

    ./OCR4Linux.sh
    
  2. The script will:

    • With --lang option: Use specified languages directly (bypasses rofi menu)
    • Without --lang option: Display an interactive language selection menu via rofi
    • Allow you to select one or multiple languages for OCR processing
    • Take a screenshot of the selected area after language selection
    • Extract text from the image using the selected languages
    • Copy the extracted text to the clipboard

Language Selection

You have two options for language selection:

Option 1: Command Line (Direct)

Specify languages directly using the --lang option:

  • --lang all - Use all available languages
  • --lang eng - Use English only
  • --lang eng+ara+fra - Use multiple specific languages

Option 2: Interactive Menu (Rofi)

When you run the script without --lang, a rofi menu will appear with:

  • ALL: Select all available languages
  • Individual languages: Choose specific languages (e.g., eng, ara, fra, deu)
  • Multi-select: Hold Ctrl and click to select multiple languages

The selected languages will be used by Tesseract for more accurate text recognition in multi-language documents.

Workflow

The complete OCR4Linux workflow:

  1. Language Selection:
    • Command-line specified languages (with --lang) OR
    • Interactive rofi menu displays available languages (without --lang)
  2. Language Processing: Selected languages are validated and formatted
  3. Screenshot Capture: Area selection and image capture
  4. OCR Processing: Text extraction using selected languages
  5. Clipboard Integration: Extracted text copied to system clipboard
  6. Cleanup: Optional screenshot removal and logging

Command Line Arguments


OCR4Linux.sh

| Option | Description | Default | | ------------------ | ------------------------------------- | ---------------------------- | | -r | Remove screenshot after processing | false | | -d DIR | Set screenshot directory | $HOME/Pictures/screenshots | | -l | Keep logs | false | | -n, --notify | Show notification after screenshot | false | | --lang LANGUAGES | Specify OCR languages (bypasses rofi) | Interactive selection | | -v, --version | Print the package version, then exit | - | | -h, --help | Show help message, then exit | - |

Language Format for --lang:

  • Use all for all available languages
  • Use + to separate multiple languages (e.g., eng+ara+fra)
  • Single languages: eng, ara, fra, etc.

OCR4Linux.py

| Option | Description | Required | | --------------------- | ---------------------------- | -------- | | image_path | Path to input image | Yes | | output_path | Path to save extracted text | Yes | | --langs <languages> | Specify languages for OCR | No | | -l, --list-langs | List available OCR languages | No | | -h, --help | Show help message | No |

Language Format: Use + to separate multiple languages (e.g., eng+ara+fra)

Examples


Using OCR4Linux

# Basic usage (shows interactive rofi menu)
OCR4Linux

# Direct language specification (bypasses rofi)
OCR4Linux --lang eng
OCR4Linux --lang all
OCR4Linux --lang eng+ara+fra

# Save logs and remove screenshot after processing
OCR4Linux -l -r

# Custom screenshot directory with logging and notification
OCR4Linux -d ~/Documents/screenshots -l -n

# Combine language specification with other options
OCR4Linux --lang eng -l -r
OCR4Linux --lang all -d ~/screenshots -l

# Print version
OCR4Linux -v

# Show help
OCR4Linux -h

Note: If you are running the script manually without installation, replace OCR4Linux with ./OCR4Linux.sh.

Using OCR4Linux.py

# Basic usage (uses all available languages)
python OCR4Linux.py input.png output.txt

# Specify single language
python OCR4Linux.py input.png output.txt --langs eng

# Specify multiple languages
python OCR4Linux.py input.png output.txt --langs eng+ara+fra

# List available languages
python OCR4Linux.py --list-langs

# Show help
python OCR4Linux.py --help

Tips

  • Language Selection Options:

    • Command Line: Use --lang for automated/scripted usage

      • --lang all for maximum compatibility
      • --lang eng for English-only documents
      • --lang eng+ara for bilingual documents
    • Interactive Menu: Run without --lang for manual selection

      • Select "ALL" to use all available languages
      • Select specific languages for better performance
      • Use Ctrl+Click to select multiple languages
      • Press Escape to cancel the operation
  • Performance Optimization:

    • Use fewer specific languages for faster processing
    • Use --lang all only when document language is unknown
    • Command-line specification is faster than interactive selection
  • Keyboard Shortcuts: You can create a keyboard shortcut to run the script for easy access.

    Example for Hyprland users:

    • put the following lines in your hyprland.conf file:

      # If installed via AUR/makepkg
      bind = $mainMod SHIFT, E, exec, OCR4Linux # OCR4Linux with interactive selection
      bind = $mainMod SHIFT, T, exec, OCR4Linux --lang eng # OCR4Linux with English only
      
      # If installed via setup.sh
      # bind = $mainMod SHIFT, E, exec, ~/.config/OCR4Linux/OCR4Linux.sh
      

    Example for dwm users:

    • put the following lines in your config.h file:

      /* If installed via AUR/makepkg */
      static const char *ocr4linux[] = { "OCR4Linux", NULL };
      static const char *ocr4linux_eng[] = { "OCR4Linux", "--lang", "eng", NULL };
      
      { MODKEY | ShiftMask, XK_e, spawn, {.v = ocr4linux } },      // OCR4Linux interactive
      { MODKEY | ShiftMask, XK_t, spawn, {.v = ocr4l
      
View on GitHub
GitHub Stars70
CategoryDevelopment
Updated11h ago
Forks6

Languages

Shell

Security Score

100/100

Audited on Mar 28, 2026

No findings