SkillAgentSearch skills...

Markdocify

๐Ÿค– Transform any documentation site into clean, LLM-ready markdown

Install / Use

/learn @vladkampov/Markdocify
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

๐Ÿ“š markdocify

Comprehensively scrape documentation sites into beautiful, LLM-ready Markdown

Go Version License: MIT Release CI Security codecov Go Report Card

markdocify is a powerful CLI tool that comprehensively scrapes documentation websites and converts them into well-formatted, single Markdown files. Perfect for creating LLM training data, offline documentation, or comprehensive knowledge bases.

โœจ Features

  • ๐ŸŽฏ Comprehensive Coverage: Scrapes deep hierarchical documentation (8 levels by default)
  • ๐Ÿง  Intelligent Content Detection: Auto-detects documentation patterns across popular frameworks
  • ๐Ÿšซ Smart Filtering: Automatically excludes navigation, ads, and non-documentation content
  • โšก High Performance: Concurrent scraping with configurable workers and delays
  • ๐Ÿ“Š Progress Reporting: Real-time progress updates for long scrapes
  • ๐Ÿ”ง Zero Configuration: Works out-of-the-box for most documentation sites
  • ๐ŸŽจ Clean Output: Generates well-formatted Markdown with table of contents
  • ๐Ÿ›ก๏ธ Respectful Scraping: Built-in rate limiting and robots.txt compliance

๐Ÿš€ Quick Start

Installation

๐Ÿบ Homebrew (macOS/Linux) - Recommended

# Add our tap and install
brew tap vladkampov/tap
brew install markdocify

# Or install directly
brew install vladkampov/tap/markdocify

โฌ‡๏ธ Direct Download

# Download latest release for your platform
curl -L https://github.com/vladkampov/markdocify/releases/latest/download/markdocify-linux-amd64 -o markdocify
chmod +x markdocify

# Or for macOS
curl -L https://github.com/vladkampov/markdocify/releases/latest/download/markdocify-darwin-amd64 -o markdocify
chmod +x markdocify

๐Ÿณ Docker

# Run directly with Docker
docker run --rm -v $(pwd):/workspace ghcr.io/vladkampov/markdocify:latest https://example.com/docs

# Or use as base image
FROM ghcr.io/vladkampov/markdocify:latest

๐Ÿ”ง Build from Source

git clone https://github.com/vladkampov/markdocify.git
cd markdocify
make build

Go Install

go install github.com/vladkampov/markdocify/cmd/markdocify@latest

Basic Usage

# Comprehensive scrape (recommended) - captures full documentation
markdocify https://vercel.com/docs

# Quick scrape - lighter, faster
markdocify https://docs.example.com -d 3

# Custom output file
markdocify https://react.dev/docs -o react-complete-docs.md

# Adjust performance settings
markdocify https://site.com/docs -d 5 --concurrency 4

๐Ÿ’ก Use Cases

๐Ÿ“– LLM Training Data

Create comprehensive, clean Markdown datasets from documentation sites:

markdocify https://nextjs.org/docs -o nextjs-training-data.md
markdocify https://docs.python.org -o python-docs.md  
markdocify https://kubernetes.io/docs -o k8s-complete.md

๐Ÿ“š Offline Documentation

Generate complete offline documentation archives:

markdocify https://docs.aws.amazon.com/ec2 -o aws-ec2-offline.md
markdocify https://tailwindcss.com/docs -o tailwind-offline.md

๐Ÿ” Knowledge Bases

Create searchable, comprehensive knowledge bases:

markdocify https://docs.github.com -o github-docs-complete.md
markdocify https://api.stripe.com/docs -o stripe-api-complete.md

๐ŸŽฏ Supported Sites

markdocify works great with most documentation sites, including:

  • Frameworks: React, Vue, Angular, Next.js, Nuxt, SvelteKit, Astro
  • Platforms: Vercel, Netlify, AWS, Google Cloud, Azure
  • Languages: Python, Go, Rust, JavaScript, TypeScript docs
  • Tools: Docker, Kubernetes, Terraform, GitHub, GitLab
  • Databases: PostgreSQL, MongoDB, Redis documentation
  • And many more!

โš™๏ธ Configuration

Command Line Options

markdocify [URL] [flags]

Flags:
  -c, --config string      Configuration file path
  -o, --output string      Output file path  
  -d, --depth int          Maximum crawl depth (default 8)
      --concurrency int    Number of concurrent workers (default 3)
  -h, --help              Help for markdocify
  -v, --version           Version information

Advanced Configuration

For complex sites, use YAML configuration files:

# custom-config.yml
name: "Custom Documentation"
base_url: "https://example.com"
output_file: "custom-docs.md"

start_urls:
  - "https://example.com/docs"
  - "https://example.com/api"

follow_patterns:
  - "^https://example\\.com/docs/.*"
  - "^https://example\\.com/api/.*"

processing:
  max_depth: 10
  concurrency: 5
  delay: 0.5
  preserve_code_blocks: true
  generate_toc: true

selectors:
  title: "h1, .page-title"
  content: "main, .documentation"
  exclude:
    - "nav"
    - ".sidebar"
    - "footer"

Use with: markdocify -c custom-config.yml

๐Ÿ“Š Performance & Output

Typical Results

| Site | Pages Scraped | Output Size | Time | |------|---------------|-------------|------| | Vercel Docs | 100+ pages | 2-5MB | 3-5 min | | Next.js Docs | 80+ pages | 1-3MB | 2-4 min | | React Docs | 50+ pages | 800KB-2MB | 1-3 min |

Output Quality

markdocify generates:

  • ๐Ÿ“‘ Table of Contents with deep linking
  • ๐Ÿท๏ธ Metadata including source URLs and timestamps
  • ๐ŸŽจ Clean formatting with preserved code blocks
  • ๐Ÿ”— Resolved links and proper heading hierarchy
  • ๐Ÿงน Filtered content with navigation/ads removed

๐Ÿ› ๏ธ Development

Prerequisites

  • Go 1.21+
  • Make

Building

# Clone repository
git clone https://github.com/vladkampov/markdocify.git
cd markdocify

# Download dependencies
go mod tidy

# Build
make build

# Run tests
make test

# Cross-platform build
make build-all

Project Structure

markdocify/
โ”œโ”€โ”€ cmd/markdocify/          # CLI application
โ”œโ”€โ”€ internal/
โ”‚   โ”œโ”€โ”€ config/             # Configuration handling
โ”‚   โ”œโ”€โ”€ scraper/            # Web scraping engine
โ”‚   โ”œโ”€โ”€ converter/          # HTML to Markdown conversion
โ”‚   โ”œโ”€โ”€ aggregator/         # Document aggregation & TOC
โ”‚   โ””โ”€โ”€ types/              # Shared types
โ”œโ”€โ”€ configs/examples/       # Example configurations
โ””โ”€โ”€ README.md

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes with tests
  4. Test thoroughly: make test && make lint
  5. Commit with clear messages
  6. Submit a pull request

Areas We Need Help

  • ๐ŸŒ JavaScript rendering support (ChromeDP integration)
  • ๐Ÿ” More content selectors for different documentation frameworks
  • ๐ŸŽจ Output formats (JSON, HTML, etc.)
  • ๐Ÿš€ Performance optimizations
  • ๐Ÿ“š Documentation improvements
  • ๐Ÿงช Test coverage expansion

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built with Colly for web scraping
  • Powered by html-to-markdown for conversion
  • CLI built with Cobra
  • Inspired by the need for high-quality LLM training data

๐Ÿ“ž Support


<p align="center"> <strong>Made with โค๏ธ for the developer community</strong><br> Star โญ this repo if you find it useful! </p>

Related Skills

View on GitHub
GitHub Stars21
CategoryDevelopment
Updated6d ago
Forks2

Languages

Go

Security Score

90/100

Audited on Mar 26, 2026

No findings