LibreCrawl
Free desktop SEO crawler - open source alternative to Screaming Frog and similar tools. Crawl websites, analyze links, extract SEO data, and export results without subscription fees. Fully customizable and extensible!
Install / Use
/learn @PhialsBasement/LibreCrawlREADME
LibreCrawl
A web-based multi-tenant crawler for SEO analysis and website auditing.
🌐 Website: librecrawl.com
Try the Live Demo: https://librecrawl.com/app/
API Documentation: https://librecrawl.com/api/docs/
Plugin Workshop https://librecrawl.com/workshop/
LibreCrawl will always be free and open source. If it's replacing your $259/year Screaming Frog license, deepcrawl license or sitebulb license, buy me a coffee.
What it does
LibreCrawl crawls websites and gives you detailed information about pages, links, SEO elements, and performance. It's built as a web application using Python Flask with a modern web interface supporting multiple concurrent users.
Features
- 🚀 Multi-tenancy - Multiple users can crawl simultaneously with isolated sessions
- 🎨 Custom CSS styling - Personalize the UI with your own CSS themes
- 💾 Browser localStorage persistence - Settings saved per browser
- 🔄 JavaScript rendering for dynamic content (React, Vue, Angular, etc.)
- 📊 SEO analysis - Extract titles, meta descriptions, headings, etc.
- 🔗 Link analysis - Track internal and external links with detailed relationship mapping
- 📈 PageSpeed Insights integration - Analyze Core Web Vitals
- 💾 Multiple export formats - CSV, JSON, or XML
- 🔍 Issue detection - Automated SEO issue identification
- ⚡ Real-time crawling progress with live statistics
Getting started
Quick Start (Automatic Installation)
The easiest way to run LibreCrawl - just run the startup script and it handles everything:
Windows:
start-librecrawl.bat
Linux/Mac:
chmod +x start-librecrawl.sh
./start-librecrawl.sh
What it does automatically:
- Checks for Docker - if found, runs LibreCrawl in a container (recommended)
- If no Docker, checks for Python - if not found, downloads and installs it (Windows only temporairly disabled since it causes some bat issues)
- Installs all dependencies automatically (
pip install -r requirements.txt) - Installs Playwright browsers for JavaScript rendering
- Starts LibreCrawl in local mode (no authentication)
- Opens your browser to
http://localhost:5000
Manual Installation
If you prefer to install manually or want more control:
Option 1: Docker (Recommended)
Requirements:
- Docker and Docker Compose
Steps:
# Clone the repository
git clone https://github.com/PhialsBasement/LibreCrawl.git
cd LibreCrawl
# Copy environment file
cp .env.example .env
# Start LibreCrawl
docker-compose up -d
# Open browser to http://localhost:5000
By default, LibreCrawl runs in local mode for easy personal use. The .env file controls this:
# .env file
LOCAL_MODE=true
HOST_BINDING=127.0.0.1
REGISTRATION_DISABLED=false
For production deployment with user authentication, edit your .env file:
# .env file
LOCAL_MODE=false
HOST_BINDING=0.0.0.0
REGISTRATION_DISABLED=false
Option 2: Python
- Python 3.8 or later
- Modern web browser (Chrome, Firefox, Safari, Edge)
Installation
-
Clone or download this repository
-
Install dependencies:
pip install -r requirements.txt
- For JavaScript rendering support (optional):
playwright install chromium
- Run the application:
# Standard mode (with authentication and tier system)
python main.py
# Local mode (all users get admin tier, no rate limits)
python main.py --local
# or
python main.py -l
- Open your browser and navigate to:
- Local:
http://localhost:5000 - Network:
http://<your-ip>:5000
- Local:
LibreCrawl Plugins
Drop your custom plugin files in /web/static/plugins/! Each .js file will automatically create a new tab in LibreCrawl.
🔌 Quick Start
- Create a new
.jsfile in this folder (e.g.,my-plugin.js) - Register your plugin using the LibreCrawl Plugin API
- Refresh the app - your new tab appears automatically!
📝 Example Plugin Structure
LibreCrawlPlugin.register({
// Required: Unique ID (used for tab identification)
id: 'my-plugin',
// Required: Display name
name: 'My Plugin',
// Required: Tab configuration
tab: {
label: 'My Tab',
icon: '🔥', // Optional emoji
},
// Called when your tab is activated
onTabActivate(container, data) {
// data contains: { urls, links, issues, stats }
container.innerHTML = `
<div class="plugin-content" style="padding: 20px; overflow-y: auto; max-height: calc(100vh - 280px);">
<h2>My Custom Analysis</h2>
<p>Found ${data.urls.length} URLs!</p>
</div>
`;
},
// Optional: Called during live crawls when data updates
onDataUpdate(data) {
if (this.isActive) {
// Update your UI
}
}
});
🎯 Available Data
Your plugin receives the same data as built-in tabs:
urls- Array of all crawled URLs with full metadatalinks- All discovered links (internal/external)issues- Detected SEO issuesstats- Crawl statistics (discovered, crawled, depth, speed)
📚 Full API Reference
Plugin Configuration
{
id: string, // Unique identifier
name: string, // Display name
version: string, // Optional version
author: string, // Optional author
description: string, // Optional description
tab: {
label: string, // Tab button text
icon: string, // Optional emoji/icon
position: number // Optional position (default: append to end)
}
}
Lifecycle Hooks
onLoad()- Called when plugin loadsonTabActivate(container, data)- Called when tab becomes activeonTabDeactivate()- Called when user switches awayonDataUpdate(data)- Called during live crawlsonCrawlComplete(data)- Called when crawl finishes
Utilities
Access built-in utilities via this.utils:
this.utils.showNotification(message, type) // 'success', 'error', 'info'
this.utils.formatUrl(url)
this.utils.escapeHtml(text)
🎨 Styling
Use these CSS classes to match LibreCrawl's design:
.plugin-content- Main container.plugin-header- Header section.data-table- Tables (auto-styled).stat-card- Statistic cards.score-good/.score-needs-improvement/.score-poor- Score indicators
Important: Always add these styles to your main plugin container for proper scrolling:
container.innerHTML = `
<div class="plugin-content" style="padding: 20px; overflow-y: auto; max-height: calc(100vh - 280px);">
<!-- Your content here -->
</div>
`;
The max-height: calc(100vh - 280px) ensures your content scrolls properly within the tab pane.
Example Plugins
Check out these example plugins to get started:
_example-plugin.js- Basic template (ignored by loader)e-e-a-t.js- E-E-A-T analyzer example
Running Modes
Standard Mode (default):
- Full authentication system with login/register
- Tier-based access control (Guest, User, Extra, Admin)
- Guest users limited to 3 crawls per 24 hours (IP-based)
- Ideal for public-facing demos or shared hosting
Local Mode (--local or -l):
- All users automatically get admin tier access
- No rate limits or tier restrictions
- Perfect for personal use or single-user self-hosting
- Recommended for local development and testing
Configuration
Click "Settings" to configure:
- Crawler settings: depth (up to 5M URLs), delays, external links
- Request settings: user agent, timeouts, proxy, robots.txt
- JavaScript rendering: browser engine, wait times, viewport size
- Filters: file types and URL patterns to include/exclude
- Export options: formats and fields to export
- Custom CSS: personalize the UI appearance with custom styles
- Issue exclusion: patterns to exclude from SEO issue detection
For PageSpeed analysis, add a Google API key in Settings > Requests for higher rate limits (25k/day vs limited).
Export formats
- CSV: Spreadsheet-friendly format
- JSON: Structured data with all details
- XML: Markup format for other tools
Multi-tenancy
LibreCrawl supports multiple concurrent users with isolated sessions:
- Each browser session gets its own crawler instance and data
- Settings are stored in browser localStorage (persistent across restarts)
- Custom CSS themes are per-browser
- Sessions expire after 1 hour of inactivity
- Crawl data is isolated between users
Known limitations
- PageSpeed API has rate limits (works better with API key)
- Large sites may take time to crawl completely
- JavaScript rendering is slower than HTTP-only crawling
- Settings stored in localStorage (cleared if browser data is cleared)
Files
main.py- Main application and Flask serversrc/crawler.py- Core crawling enginesrc/settings_manager.py- Configuration managementweb/- Frontend interface files
License
MIT License - see LICENSE file for details.
Related Skills
bluebubbles
342.0kUse when you need to send or manage iMessages via BlueBubbles (recommended iMessage integration). Calls go through the generic message tool with channel="bluebubbles".
claude-opus-4-5-migration
84.7kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
bear-notes
342.0kCreate, search, and manage Bear notes via grizzly CLI.
model-usage
342.0kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
