Pdf2epub
Convert PDF files to nicely structured Markdown and EPUB format with intelligent layout detection using AI.
Install / Use
/learn @overcuriousity/Pdf2epubREADME
PDF2EPUB 📚
Convert PDF files to nicely structured Markdown and EPUB format with intelligent layout detection.
✨ Features
- 📖 Smart layout detection for books and academic papers
- 🔍 Advanced text extraction and OCR capabilities
- 📊 Table detection and formatting
- 🖼️ Image extraction and optimization
- 📝 Clean markdown output with preserved structure
- 📱 EPUB generation with customizable styling
- 🌍 Multi-language support
- 🚀 GPU acceleration support (NVIDIA & AMD)
- 🍎 Apple Silicon support
🛠️ Dependencies
- Python 3.9+
- PyTorch (with CUDA/ROCm support for GPU acceleration)
- marker-pdf==0.3.10
- transformers==4.45.2
- markdown==3.7
💻 Installation
- Install Python dependencies:
pip install -r requirements.txt
- Install PyTorch:
- For NVIDIA GPUs, install with CUDA support:
pip install torch torchvision torchaudio
- For AMD GPUs, install with ROCm support:
pip3 uninstall torch torchvision torchaudio
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
- For Apple Silicon, install with MPS support:
pip3 uninstall torch torchvision torchaudio
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
- Verify GPU support:
import torch
print(torch.__version__) # PyTorch version
print(torch.cuda.is_available()) # Should return True for NVIDIA
print(torch.mps.is_available()) # Should return True for Apple Silicon
print(torch.version.hip) # Should print ROCm version for AMD
🚀 Usage
Basic Usage
Convert a single PDF file:
python main.py input.pdf
Convert all PDFs in a directory:
python main.py input_directory/
Advanced Options
python main.py [input_path] [output_path] [options]
Options:
--batch-multiplier INT Batch size multiplier for memory/speed tradeoff (default: 2)
--max-pages INT Maximum number of pages to process
--start-page INT Page number to start from
--langs STRING Comma-separated list of languages in document
--skip-epub Skip EPUB generation, only create markdown
--skip-md Skip markdown generation, use existing markdown files
Examples
Process a specific range of pages:
python main.py book.pdf --start-page 10 --max-pages 50
Process a multi-language document:
python main.py paper.pdf --langs "English,German"
Convert to markdown only:
python main.py thesis.pdf --skip-epub
Output Structure
output_directory/
├── document_name/
│ ├── document_name.md
│ ├── document_name.epub
│ ├── document_name_metadata.json
│ └── images/
│ ├── image1.png
│ ├── image2.jpg
│ └── ...
🤝 Contributing
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a new branch for your feature
- Commit your changes
- Push to your branch
- Create a Pull Request
Please ensure your code follows the existing style and includes appropriate tests.
Development Setup
- Clone the repository:
git clone https://github.com/yourusername/pdf2epub.git
cd pdf2epub
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
- Install development dependencies:
pip install -r requirements.txt
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🐛 Known Issues
- Some image embedding might need manual adjustment
- Some complex mathematical equations might not be perfectly converted
- Certain PDF layouts with multiple columns may require manual adjustment
- Font detection might be imperfect in some cases
🙏 Acknowledgments
This project builds upon several excellent open-source libraries:
- marker-pdf for PDF processing
- mark2epub for markdown conversion
- PyTorch for GPU acceleration
- Transformers for advanced text processing
Related Skills
node-connect
349.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
