Teracrawl

High-performance web crawler API optimized for LLMs. Turn any search or website into clean Markdown using remote browsers. Firecrawl alternative

Generate Convert Improve

Install / Use

/learn @BrowserCash/Teracrawl

About this skill

Quality Score

0/100

README

<div align="center"> <h1>⭐ Teracrawl</h1> <p> <strong>High-performance web crawler & scraper API optimized for LLMs.</strong> </p> <p> Powered by <a href="https://browser.cash/developers">Browser.cash</a> remote browsers. </p> <p> <a href="#features">Features</a> • <a href="#quick-start">Quick Start</a> • <a href="#api-reference">API Reference</a> • <a href="#configuration">Configuration</a> • <a href="#docker">Docker</a> </p> <p> <img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"> <img src="https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen" alt="Node.js Version"> <img src="https://img.shields.io/badge/typescript-5.6-blue" alt="TypeScript"> <img src="https://img.shields.io/badge/powered%20by-browser.cash-orange" alt="Visit Browser.cash"> </p> <p> <a href="https://x.com/aibrowsers"> <img src="https://img.shields.io/badge/Follow%20on%20X-000000?style=for-the-badge&logo=x&logoColor=white" alt="Follow on X" /> </a> <a href="https://linkedin.com/company/megatera"> <img src="https://img.shields.io/badge/Follow%20on%20LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="Follow on LinkedIn" /> </a> <a href="https://discord.gg/F9afFJPtYb"> <img src="https://img.shields.io/badge/Join%20our%20Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Join our Discord" /> </a> </p> <br> <p> ⚠️ <strong>Important:</strong> Search functionality (`/crawl`) requires a running instance of <a href="https://github.com/BrowserCash/browser-serp"><strong>browser-serp</strong></a>. </p> </div>

📊 Benchmarks

<div align="center"> <img src="scrape-evals.png" alt="Teracrawl achieves #1 coverage at 82.1%" width="700"> <p><strong>Teracrawl</strong> achieves <strong>#1 coverage (84.2%)</strong> across 14 scraping providers on the <a href="https://github.com/firecrawl/scrape-evals/pull/13">scrape-evals</a> benchmark, an open evaluation framework that tests web scrapers against 1,000 diverse URLs for success rate and content quality.</p> </div>

🚀 What is Teracrawl?

Teracrawl is a production-ready API designed to turn websites into clean, LLM-ready Markdown. It handles the complexity of JavaScript rendering, anti-bot measures, and parallel execution allowing AI systems to access real-time data quickly.

Unlike simple HTML scrapers, Teracrawl uses real managed Chrome browsers, ensuring high success rates even across protected sites.

Why use Teracrawl?

🤖 LLM-Optimized Output: Converts complex HTML into clean, semantic Markdown perfect for RAG and context windows.
⚡ Smart Two-Phase Crawling:
- Fast Mode: Optimized for static/SSR pages (reuses contexts, blocks heavy assets).
- Dynamic Mode: Automatic fallback for complex SPAs (waits for hydration/rendering).
🔍 Search & Scrape: Single endpoint to query Google and scrape the top results in parallel.
🏎️ High Concurrency: Built on a robust <a href="https://github.com/BrowserCash/browser-pool">session pool</a> to handle multiple pages simultaneously.

<a name="features"></a>✨ Features

Search + Scrape: Query Google and scrape top N results in a single API call.
Direct Scraping: Convert any specific URL to Markdown.
Smart Content Extraction: Automatically detects main content areas (article, main, etc.) and removes clutter (scripts, styles, navs).
Safety & Performance:
- Blocks ads, trackers, and analytics.
- Removes base64 images to save token count.
- Automatic timeout handling and error recovery.
Docker Ready: Deploy anywhere with a lightweight container.

<a name="quick-start"></a>🛠️ Quick Start

Prerequisites

Node.js 18+ installed.
A Browser.cash API Key.
A running SERP service like browser-serp on port 8080 (optional, only for /crawl endpoint).

Installation

# Clone the repository
git clone https://github.com/BrowserCash/teracrawl.git
cd teracrawl

# Install dependencies
npm install

Configuration

Copy the example environment file and configure your settings:

cp .env.example .env

Open .env and set your BROWSER_API_KEY:

BROWSER_API_KEY=your_browser_cash_api_key_here

Running the Server

# Development mode
npm run dev

# Production build & start
npm run build
npm start

The server will start at http://0.0.0.0:8085.

<a name="api-reference"></a>📚 API Reference

1. Search & Crawl

Performs a Google search and scrapes the content of the top results.

Endpoint: POST /crawl

CURL Request:

curl -X POST http://localhost:8085/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "q": "What is the capital of France?",
    "count": 3
  }'

Response:

{
  "query": "What is the capital of France?",
  "results": [
    {
      "url": "https://en.wikipedia.org/wiki/Paris",
      "title": "Paris - Wikipedia",
      "markdown": "# Paris\n\nParis is the capital and most populous city of France...",
      "status": "success"
    },
    {
      "url": "https://...",
      "status": "error",
      "error": "Timeout exceeded"
    }
  ]
}

2. Single Page Scrape

Scrapes a specific URL and converts it to Markdown.

Endpoint: POST /scrape

CURL Request:

curl -X POST http://localhost:8085/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/blog/post-1"
  }'

Response:

{
  "url": "https://example.com/blog/post-1",
  "title": "My Blog Post",
  "markdown": "# My Blog Post\n\nContent of the post...",
  "status": "success"
}

3. SERP Search Only

Proxies a search request to the underlying SERP service without scraping content.

Endpoint: POST /serp/search

CURL Request:

curl -X POST http://localhost:8085/serp/search \
  -H "Content-Type: application/json" \
  -d '{
    "q": "browser automation",
    "count": 5
  }'

Response:

{
  "results": [
    {
      "url": "https://...",
      "title": "Result Title",
      "description": "Result description..."
    }
  ]
}

4. Health Check

Endpoint: GET /health

CURL Request:

curl http://localhost:8085/health

Response:

{
  "ok": true
}

<a name="configuration"></a>⚙️ Configuration

Server & Infrastructure

Crawler Tuning

<a name="docker"></a>🐳 Docker

You can run Teracrawl easily using Docker.

Build & Run

# Build the image
docker build -t teracrawl .

# Run with env file
docker run -p 8085:8085 --env-file .env teracrawl

Docker Compose

version: "3.8"
services:
  teracrawl:
    build: .
    ports:
      - "8085:8085"
    environment:
      - BROWSER_API_KEY=${BROWSER_API_KEY}
      - SERP_SERVICE_URL=http://serp:8080
    depends_on:
      - serp

  serp:
    image: ghcr.io/mega-tera/browser-serp:latest
    ports:
      - "8080:8080"

🤝 Contributing

Contributions are welcome! We appreciate your help in making Teracrawl better.

How to Contribute

Fork the Project: click the 'Fork' button at the top right of this page.
Create your Feature Branch: git checkout -b feature/AmazingFeature
Commit your Changes: git commit -m 'Add some AmazingFeature'
Push to the Branch: git push origin feature/AmazingFeature
Open a Pull Request: Submit your changes for review.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Related Skills

node-connect

337.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

83.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

337.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

83.2k

Commit, push, and open a PR