FenScribe
FenScribe - A Smart PDF Layout Optimizer
Install / Use
/learn @ordylan/FenScribeREADME
FenScribe - A Smart PDF Layout Optimizer
Reduce printing costs by automatically detecting and removing blank spaces in PDF documents. Home Page
Samples
Installation Dependencies
pip install PyMuPDF Pillow python-docx tkinterdnd2
Usage
- Run
gui.pywfor graphical interface operations (CLI version currently unavailable) - After processing, use the provided Office macro(
图片溢出缩小.bas) to resize overflowing images- The macro automatically detects first column width in Word and scales oversized images
Configuration Parameters
| Parameter | Description | |----------------|-----------------------------------------------------------------------------| | threshold | Brightness threshold for blank line detection (0-255) | | | - Converts RGB pixels to grayscale (average value) | | | - Rows with average grayscale ≥ threshold are considered blank | | dpi | Image resolution when converting PDF to images | | min_height | Content validity filter (in pixels) | | | - Only preserves content blocks with height ≥ specified value | | blank_height | Paragraph separation baseline | | | - Content is considered separate paragraphs when blank lines ≥ this value |
Important Notes
⚠️ Potential Issues:
- Small images/geometric shapes may be accidentally removed or cropped
- Narrow images/color blocks might be misidentified (adjust config parameters)
- Scanned documents must have horizontal text alignment (pre-process tilted pages)
- Manual margin trimming required to remove headers/footers
License
MIT License
Development Notes
This third-generation version features:
- GUI implementation for user-friendly operation
- Partial utilization of AI-assisted development tools
- Continuous optimization through multiple iterations
