OCR4Linux

Version: 1.5.0

OCR4Linux is a versatile text extraction tool that allows you to take a screenshot of a selected area, extract text using OCR, and copy it to the clipboard. It supports both Wayland and X11 sessions and offers multiple language support.

Note: This script is currently only made for Arch Linux. It may work on other arch-based distributions, but it has not been tested yet.

Motivation

I didn't find any easy tool in Linux that does the same thing as the PowerToys app in Windows. This motivated me to create OCR4Linux, a simple and efficient tool to capture screenshots, extract text, and copy it to the clipboard, all in one seamless process.

Features

Screenshot Capture
- Wayland support via grimblast
- X11 support via scrot
- Configurable screenshot directory
Text Extraction
- Interactive language selection via rofi
- Multi-language OCR support with custom language combinations
- Automatic language detection fallback
- Image preprocessing for better accuracy
- UTF-8 text output
Clipboard Integration
- Wayland: wl-copy and cliphist
- X11: xclip
Additional Features
- Interactive language selection menu
- Optional screenshot retention
- Comprehensive logging system
- Command-line interface

Requirements

System Requirements

Arch Linux or arch-based distribution
Python 3.x
yay package manager (will be installed if needed)
tesseract OCR engine
tesseract-data-eng English language pack
tesseract-data-ara Arabic language pack
rofi for the interactive language selection feature.
If you need any other language other than the above two, search for it using the command:
```
sudo pacman -Ss tesseract-data-{lang}
```

Python Dependencies

python-pillow
python-pytesseract

Session-Specific Requirements

Wayland:
- grimblast-git
- wl-clipboard
- cliphist
X11:
- scrot
- xclip

Installation

Option 1: Install from AUR (Recommended)

The easiest way to install OCR4Linux on Arch Linux or any Arch-based distribution is directly from the AUR using any AUR helper (e.g., yay, paru):

yay -S ocr4linux-git

This will automatically install OCR4Linux and all its required dependencies.

Option 2: Build from Source (makepkg)

You can clone the repository and build the package manually using makepkg:

Clone the repository:

git clone https://github.com/moheladwy/OCR4Linux.git
cd OCR4Linux

Build and install the package:
```
makepkg -si
```

Option 3: Manual Installation (setup.sh)

If you prefer a local installation in your home directory or want to use the automated setup script:

Clone the repository:

git clone https://github.com/moheladwy/OCR4Linux.git
cd OCR4Linux

Run the setup script:
```
chmod +x setup.sh
./setup.sh
```
Note: The setup script will:
- Prompt you to confirm before proceeding with the manual installation
- Install all required dependencies (tesseract, rofi, screenshot tools, etc.)
- Copy all OCR4Linux files to ~/.config/OCR4Linux/
- Set up the necessary directory structure

Usage

Run the tool to take a screenshot, extract text, and copy it to the clipboard:

If installed via AUR or makepkg:
```
OCR4Linux
```
If installed via setup.sh:
```
~/.config/OCR4Linux/OCR4Linux.sh
```
Or if you're in the source directory:
```
./OCR4Linux.sh
```
The script will:
- With --lang option: Use specified languages directly (bypasses rofi menu)
- Without --lang option: Display an interactive language selection menu via rofi
- Allow you to select one or multiple languages for OCR processing
- Take a screenshot of the selected area after language selection
- Extract text from the image using the selected languages
- Copy the extracted text to the clipboard

Language Selection

You have two options for language selection:

Option 1: Command Line (Direct)

Specify languages directly using the --lang option:

--lang all - Use all available languages
--lang eng - Use English only
--lang eng+ara+fra - Use multiple specific languages

Option 2: Interactive Menu (Rofi)

When you run the script without --lang, a rofi menu will appear with:

ALL: Select all available languages
Individual languages: Choose specific languages (e.g., eng, ara, fra, deu)
Multi-select: Hold Ctrl and click to select multiple languages

The selected languages will be used by Tesseract for more accurate text recognition in multi-language documents.

Workflow

The complete OCR4Linux workflow:

Language Selection:
- Command-line specified languages (with --lang) OR
- Interactive rofi menu displays available languages (without --lang)
Language Processing: Selected languages are validated and formatted
Screenshot Capture: Area selection and image capture
OCR Processing: Text extraction using selected languages
Clipboard Integration: Extracted text copied to system clipboard
Cleanup: Optional screenshot removal and logging

Command Line Arguments

OCR4Linux.sh

| Option | Description | Default | | ------------------ | ------------------------------------- | ---------------------------- | | -r | Remove screenshot after processing | false | | -d DIR | Set screenshot directory | $HOME/Pictures/screenshots | | -l | Keep logs | false | | -n, --notify | Show notification after screenshot | false | | --lang LANGUAGES | Specify OCR languages (bypasses rofi) | Interactive selection | | -v, --version | Print the package version, then exit | - | | -h, --help | Show help message, then exit | - |

Language Format for --lang:

Use all for all available languages
Use + to separate multiple languages (e.g., eng+ara+fra)
Single languages: eng, ara, fra, etc.

OCR4Linux.py

| Option | Description | Required | | --------------------- | ---------------------------- | -------- | | image_path | Path to input image | Yes | | output_path | Path to save extracted text | Yes | | --langs <languages> | Specify languages for OCR | No | | -l, --list-langs | List available OCR languages | No | | -h, --help | Show help message | No |

Language Format: Use + to separate multiple languages (e.g., eng+ara+fra)

Examples

Using OCR4Linux

# Basic usage (shows interactive rofi menu)
OCR4Linux

# Direct language specification (bypasses rofi)
OCR4Linux --lang eng
OCR4Linux --lang all
OCR4Linux --lang eng+ara+fra

# Save logs and remove screenshot after processing
OCR4Linux -l -r

# Custom screenshot directory with logging and notification
OCR4Linux -d ~/Documents/screenshots -l -n

# Combine language specification with other options
OCR4Linux --lang eng -l -r
OCR4Linux --lang all -d ~/screenshots -l

# Print version
OCR4Linux -v

# Show help
OCR4Linux -h

Note: If you are running the script manually without installation, replace OCR4Linux with ./OCR4Linux.sh.

Using OCR4Linux.py

# Basic usage (uses all available languages)
python OCR4Linux.py input.png output.txt

# Specify single language
python OCR4Linux.py input.png output.txt --langs eng

# Specify multiple languages
python OCR4Linux.py input.png output.txt --langs eng+ara+fra

# List available languages
python OCR4Linux.py --list-langs

# Show help
python OCR4Linux.py --help

Tips

Language Selection Options:
- Command Line: Use --lang for automated/scripted usage
  - --lang all for maximum compatibility
  - --lang eng for English-only documents
  - --lang eng+ara for bilingual documents
- Interactive Menu: Run without --lang for manual selection
  - Select "ALL" to use all available languages
  - Select specific languages for better performance
  - Use Ctrl+Click to select multiple languages
  - Press Escape to cancel the operation
Performance Optimization:
- Use fewer specific languages for faster processing
- Use --lang all only when document language is unknown
- Command-line specification is faster than interactive selection

Keyboard Shortcuts: You can create a keyboard shortcut to run the script for easy access.

Example for `Hyprland` users:

put the following lines in your hyprland.conf file:

# If installed via AUR/makepkg
bind = $mainMod SHIFT, E, exec, OCR4Linux # OCR4Linux with interactive selection
bind = $mainMod SHIFT, T, exec, OCR4Linux --lang eng # OCR4Linux with English only

# If installed via setup.sh
# bind = $mainMod SHIFT, E, exec, ~/.config/OCR4Linux/OCR4Linux.sh

Example for `dwm` users:

put the following lines in your config.h file:

/* If installed via AUR/makepkg */
static const char *ocr4linux[] = { "OCR4Linux", NULL };
static const char *ocr4linux_eng[] = { "OCR4Linux", "--lang", "eng", NULL };

{ MODKEY | ShiftMask, XK_e, spawn, {.v = ocr4linux } },      // OCR4Linux interactive
{ MODKEY | ShiftMask, XK_t, spawn, {.v = ocr4l

OCR4Linux

Install / Use

README

OCR4Linux

Motivation

Features

Requirements

System Requirements

Python Dependencies

Session-Specific Requirements

Installation

Option 1: Install from AUR (Recommended)

Option 2: Build from Source (makepkg)

Option 3: Manual Installation (setup.sh)

Usage

Language Selection

Option 1: Command Line (Direct)

Option 2: Interactive Menu (Rofi)

Workflow

Command Line Arguments

OCR4Linux.sh

OCR4Linux.py

Examples

Using OCR4Linux

Using OCR4Linux.py

Tips

Example for `Hyprland` users:

Example for `dwm` users:

OCR4Linux

Install / Use

README

OCR4Linux

Motivation

Features

Requirements

System Requirements

Python Dependencies

Session-Specific Requirements

Installation

Option 1: Install from AUR (Recommended)

Option 2: Build from Source (makepkg)

Option 3: Manual Installation (setup.sh)

Usage

Language Selection

Option 1: Command Line (Direct)

Option 2: Interactive Menu (Rofi)

Workflow

Command Line Arguments

OCR4Linux.sh

OCR4Linux.py

Examples

Using OCR4Linux

Using OCR4Linux.py

Tips

Example for Hyprland users:

Example for dwm users:

Example for `Hyprland` users:

Example for `dwm` users: