Pdf2epub

Convert PDF files to nicely structured Markdown and EPUB format with intelligent layout detection using AI.

Generate Convert Improve

Install / Use

/learn @overcuriousity/Pdf2epub

About this skill

Quality Score

0/100

README

PDF2EPUB 📚

Convert PDF files to nicely structured Markdown and EPUB format with intelligent layout detection.

✨ Features

📖 Smart layout detection for books and academic papers
🔍 Advanced text extraction and OCR capabilities
📊 Table detection and formatting
🖼️ Image extraction and optimization
📝 Clean markdown output with preserved structure
📱 EPUB generation with customizable styling
🌍 Multi-language support
🚀 GPU acceleration support (NVIDIA & AMD)
🍎 Apple Silicon support

🛠️ Dependencies

Python 3.9+
PyTorch (with CUDA/ROCm support for GPU acceleration)
marker-pdf==0.3.10
transformers==4.45.2
markdown==3.7

💻 Installation

Install Python dependencies:

pip install -r requirements.txt

Install PyTorch:

For NVIDIA GPUs, install with CUDA support:

pip install torch torchvision torchaudio

For AMD GPUs, install with ROCm support:

pip3 uninstall torch torchvision torchaudio
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2

For Apple Silicon, install with MPS support:

pip3 uninstall torch torchvision torchaudio
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Verify GPU support:

import torch
print(torch.__version__)  # PyTorch version
print(torch.cuda.is_available())  # Should return True for NVIDIA
print(torch.mps.is_available())  # Should return True for Apple Silicon
print(torch.version.hip)  # Should print ROCm version for AMD

🚀 Usage

Basic Usage

Convert a single PDF file:

python main.py input.pdf

Convert all PDFs in a directory:

python main.py input_directory/

Advanced Options

python main.py [input_path] [output_path] [options]

Options:
  --batch-multiplier INT    Batch size multiplier for memory/speed tradeoff (default: 2)
  --max-pages INT          Maximum number of pages to process
  --start-page INT         Page number to start from
  --langs STRING           Comma-separated list of languages in document
  --skip-epub             Skip EPUB generation, only create markdown
  --skip-md               Skip markdown generation, use existing markdown files

Examples

Process a specific range of pages:

python main.py book.pdf --start-page 10 --max-pages 50

Process a multi-language document:

python main.py paper.pdf --langs "English,German"

Convert to markdown only:

python main.py thesis.pdf --skip-epub

Output Structure

output_directory/
├── document_name/
│   ├── document_name.md
│   ├── document_name.epub
│   ├── document_name_metadata.json
│   └── images/
│       ├── image1.png
│       ├── image2.jpg
│       └── ...

🤝 Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a new branch for your feature
Commit your changes
Push to your branch
Create a Pull Request

Please ensure your code follows the existing style and includes appropriate tests.

Development Setup

Clone the repository:

git clone https://github.com/yourusername/pdf2epub.git
cd pdf2epub

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

Install development dependencies:

pip install -r requirements.txt

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🐛 Known Issues

Some image embedding might need manual adjustment
Some complex mathematical equations might not be perfectly converted
Certain PDF layouts with multiple columns may require manual adjustment
Font detection might be imperfect in some cases

🙏 Acknowledgments

This project builds upon several excellent open-source libraries:

marker-pdf for PDF processing
mark2epub for markdown conversion
PyTorch for GPU acceleration
Transformers for advanced text processing

Related Skills

node-connect

349.2k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.5k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.2k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.2k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。