Docpull

Crawl any website and convert it to clean, AI-ready Markdown — async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output

Generate Convert Improve

Install / Use

/learn @raintree-technology/Docpull

About this skill

Quality Score

0/100

README

docpull

Pull documentation from any website and convert it to clean, AI-ready Markdown.

Install

pip install docpull

Usage

# Basic fetch
docpull https://docs.example.com

# With options
docpull https://aptos.dev --max-pages 100 --output-dir ./docs

# Filter paths
docpull https://docs.example.com --include-paths "/api/*" --exclude-paths "/changelog/*"

# Enable caching for incremental updates
docpull https://docs.example.com --cache

# JavaScript-heavy sites
pip install docpull[js]
docpull https://spa-site.com --js

Profiles

docpull https://site.com --profile rag      # Optimized for RAG/LLM (default)
docpull https://site.com --profile mirror   # Full site archive with caching
docpull https://site.com --profile quick    # Fast sampling (50 pages, depth 2)

Options

Crawl:
  --max-pages N           Maximum pages to fetch
  --max-depth N           Maximum crawl depth
  --include-paths P       Only crawl matching URL patterns
  --exclude-paths P       Skip matching URL patterns
  --js                    Enable JavaScript rendering

Cache:
  --cache                 Enable caching for incremental updates
  --cache-dir DIR         Cache directory (default: .docpull-cache)
  --cache-ttl DAYS        Days before cache expires (default: 30)

Content:
  --streaming-dedup       Real-time duplicate detection
  --language CODE         Filter by language (e.g., en)

Output:
  --output-dir, -o DIR    Output directory (default: ./docs)
  --dry-run               Show what would be fetched
  --verbose, -v           Verbose output

See docpull --help for all options.

Python API

import asyncio
from docpull import Fetcher, DocpullConfig, ProfileName, EventType

async def main():
    config = DocpullConfig(
        url="https://docs.example.com",
        profile=ProfileName.RAG,
        crawl={"max_pages": 100},
        cache={"enabled": True},
    )

    async with Fetcher(config) as fetcher:
        async for event in fetcher.run():
            if event.type == EventType.FETCH_PROGRESS:
                print(f"{event.current}/{event.total}: {event.url}")

        print(f"Done: {fetcher.stats.pages_fetched} pages")

asyncio.run(main())

Output

Each page becomes a Markdown file with YAML frontmatter:

---
title: "Getting Started"
source: https://docs.example.com/guide
---

# Getting Started
...

Security

HTTPS-only, mandatory robots.txt compliance
Blocks private/internal network IPs
Path traversal and XXE protection

Troubleshooting

docpull --doctor              # Check installation
docpull URL --verbose         # Verbose output
docpull URL --dry-run         # Test without downloading

License

MIT

Related Skills

prose

344.1k

OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.

claude-opus-4-5-migration

96.8k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

Writing Hookify Rules

96.8k

This skill should be used when the user asks to "create a hookify rule", "write a hook rule", "configure hookify", "add a hookify rule", or needs guidance on hookify rule syntax and patterns.

Command Development

96.8k

This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.

raintree-technology

View profile

View on GitHub

GitHub Stars20

CategoryCustomer

Updated16d ago

Forks1

raintree-technology/docpull

Languages

Python

Security Score

95/100

Audited on Mar 16, 2026

No findings

Docpull

Install / Use

README

docpull

Install

Usage

Profiles

Options

Python API

Output

Security

Troubleshooting

Links

License

Related Skills