Markdrop
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
Install / Use
/learn @shoryasethia/MarkdropREADME
A Python package for converting PDFs to structured Markdown and interactive HTML, with AI-powered image and table descriptions across six major LLM providers. Available on PyPI.
Features
- [x] PDF → Markdown conversion with formatting preservation (via Docling)
- [x] Automatic image extraction using XRef IDs
- [x] Table detection using Microsoft's Table Transformer
- [x] PDF URL support
- [x] AI-powered image and table descriptions — 6 providers: Gemini, OpenAI, Anthropic Claude, Groq, OpenRouter, LiteLLM
- [x] Interactive HTML output with downloadable Excel tables
- [x] Customisable image resolution and UI elements
- [x] Structured logging (never pollutes your app's root logger)
- [ ] Support for DOCX / PPTX input
Installation
Core install (PDF conversion + Gemini/OpenAI):
pip install markdrop
With Anthropic Claude:
pip install "markdrop[anthropic]"
With Groq:
pip install "markdrop[groq]"
With LiteLLM (routes to 100+ providers):
pip install "markdrop[litellm]"
Everything (including local HuggingFace models):
pip install "markdrop[all]"
OpenRouter is accessed through the
openaipackage (already included in core), so no extra install is needed.
Supported AI Providers
| Provider | --ai_provider | Default model | Vision |
|---|---|---|---|
| Google Gemini | gemini | gemini-3.1-flash-lite | ✅ |
| OpenAI | openai | gpt-5.4 | ✅ |
| Anthropic Claude | anthropic | claude-opus-4-6 | ✅ |
| Groq | groq | meta-llama/llama-4-maverick-17b-128e-instruct | ✅ |
| OpenRouter | openrouter | google/gemini-3.1-flash-lite (any model) | ✅ |
| LiteLLM | litellm | openai/gpt-5.4 (configurable) | ✅ |
All models are configurable — use
--modelto override for any provider, or setmodel_name_overrideinProcessorConfig.
Quick Start
CLI Usage
1. Convert PDF → Markdown + HTML
markdrop convert <input_path> --output_dir <dir> [--add_tables]
# Example
markdrop convert report.pdf --output_dir out --add_tables
# Also works with URLs:
markdrop convert https://arxiv.org/pdf/1706.03762 --output_dir out
2. Generate AI Descriptions for Images & Tables
markdrop describe <markdown_file> --ai_provider <provider> [--output_dir <dir>] [--remove_images] [--remove_tables]
| Provider | --ai_provider |
|---|---|
| Google Gemini 2.0 Flash | gemini |
| OpenAI GPT-4o | openai |
| Anthropic Claude Opus | anthropic |
| Groq Llama-4 Scout | groq |
| OpenRouter | openrouter |
| LiteLLM | litellm |
# Gemini (default)
markdrop describe doc.md --ai_provider gemini
# Anthropic Claude
markdrop describe doc.md --ai_provider anthropic --remove_images
# Groq (fastest inference)
markdrop describe doc.md --ai_provider groq
# OpenRouter (any model)
markdrop describe doc.md --ai_provider openrouter
# LiteLLM (unified gateway)
markdrop describe doc.md --ai_provider litellm
3. Set Up API Keys
markdrop setup <provider>
Keys are stored in <package-root>/.env with 0o600 permissions on POSIX systems.
markdrop setup gemini # → GEMINI_API_KEY
markdrop setup openai # → OPENAI_API_KEY
markdrop setup anthropic # → ANTHROPIC_API_KEY
markdrop setup groq # → GROQ_API_KEY
markdrop setup openrouter # → OPENROUTER_API_KEY
markdrop setup litellm # → LITELLM_API_KEY
4. Analyze Images in a PDF
markdrop analyze report.pdf --output_dir pdf_analysis --save_images
5. Batch Image Description Generation
markdrop generate images/ --output_dir descriptions/ --prompt "Describe in detail." \
--llm_client gemini openai
Available --llm_client values: qwen, gemini, openai, llama-vision, molmo, pixtral
Python API
PDF Conversion
from markdrop import markdrop, MarkDropConfig, add_downloadable_tables
from pathlib import Path
import logging
config = MarkDropConfig(
image_resolution_scale=2.0,
download_button_color='#444444',
log_level=logging.INFO,
log_dir='logs',
excel_dir='markdrop-excel-tables',
)
html_path = markdrop("path/to/input.pdf", "output", config)
downloadable_html = add_downloadable_tables(html_path, config)
AI Descriptions
from markdrop import process_markdown, ProcessorConfig, AIProvider, setup_keys
# One-time key setup (writes to .env)
setup_keys('anthropic')
config = ProcessorConfig(
input_path="doc.md",
output_dir="output",
ai_provider=AIProvider.ANTHROPIC, # GEMINI | OPENAI | ANTHROPIC | GROQ | OPENROUTER | LITELLM
remove_images=False,
remove_tables=False,
table_descriptions=True,
image_descriptions=True,
max_retries=3,
retry_delay=2,
# Override default models (all providers have matching config fields):
anthropic_model_name="claude-sonnet-4-5", # faster / cheaper
anthropic_text_model_name="claude-sonnet-4-5",
)
output_path = process_markdown(config)
Using OpenRouter to access any model
config = ProcessorConfig(
input_path="doc.md",
output_dir="output",
ai_provider=AIProvider.OPENROUTER,
openrouter_model_name="meta-llama/llama-4-scout", # any model on openrouter.ai/models
openrouter_text_model_name="anthropic/claude-sonnet-4-5",
openrouter_site_url="https://yoursite.com",
openrouter_site_name="My App",
)
Using LiteLLM for any 100+ provider
import os
os.environ["ANTHROPIC_API_KEY"] = "..." # set any provider's key
config = ProcessorConfig(
input_path="doc.md",
output_dir="output",
ai_provider=AIProvider.LITELLM,
litellm_model_name="anthropic/claude-opus-4-6",
litellm_text_model_name="groq/llama-3.3-70b-versatile",
)
Batch Image Description Generation
from markdrop import generate_descriptions
generate_descriptions(
input_path='images/',
output_dir='output/',
prompt='Give a highly detailed description of this image.',
llm_client=['gemini', 'llama-vision'],
)
API Reference
ProcessorConfig – AI Provider Fields
| Field | Default | Notes |
|---|---|---|
| gemini_model_name | gemini-2.0-flash | Vision model |
| gemini_text_model_name | gemini-2.0-flash | Text model |
| openai_model_name | gpt-4o | Vision + text |
| openai_text_model_name | gpt-4o | |
| anthropic_model_name | claude-opus-4-6 | Vision |
| anthropic_text_model_name | claude-sonnet-4-5 | Text (cheaper) |
| groq_model_name | meta-llama/llama-4-scout-17b-16e-instruct | Vision |
| groq_text_model_name | llama-3.3-70b-versatile | Text |
| openrouter_model_name | google/gemini-2.0-flash-001 | Any model string from openrouter.ai/models |
| openrouter_text_model_name | anthropic/claude-sonnet-4-5 | |
| litellm_model_name | openai/gpt-4o | provider/model format |
| litellm_text_model_name | openai/gpt-4o | |
MarkDropConfig
| Field | Default | Notes |
|---|---|---|
| image_resolution_scale | 2.0 | Scale factor for extracted images |
| download_button_color | '#444444' | HTML button colour |
| log_level | logging.INFO | |
| log_dir | 'logs' | |
| excel_dir | 'markdrop_excel_tables' | |
Contributing
We welcome contributions! See CONTRIBUTING.md.
git clone https://github.com/shoryasethia/markdrop.git
cd markdrop
python -m venv venv && source venv/bin/activate # Windows: venv\Scripts\activate
pip install -e ".[all]"
Project Structure
markdrop/
├── setup.py
├── requirements.txt
├── README.md
└── markdrop/
├── __init__.py
├── main.py ← CLI entry-point
├── process.py ← PDF conversion
├── parse.py ← AI description engine (all 6 providers)
├── helper.py ← PDF image analysis
├── utils.py ← PDF download helpers
├── setup_keys.py ← Interactive API key manager
├── ignore_warnings.py
├── src/
│ └── markdrop-logo.png
└── models/
├── img_descriptions.py
├── model_loader.py ← Local HF model loader
├── responder.py
└── logger.py
Star History
License
GPL-3.0 — see LICENSE.
Changelog
See CHANGELOG.md.
