Teracrawl
High-performance web crawler API optimized for LLMs. Turn any search or website into clean Markdown using remote browsers. Firecrawl alternative
Install / Use
/learn @BrowserCash/TeracrawlQuality Score
Category
Development & EngineeringSupported Platforms
README
📊 Benchmarks
<div align="center"> <img src="scrape-evals.png" alt="Teracrawl achieves #1 coverage at 82.1%" width="700"> <p><strong>Teracrawl</strong> achieves <strong>#1 coverage (84.2%)</strong> across 14 scraping providers on the <a href="https://github.com/firecrawl/scrape-evals/pull/13">scrape-evals</a> benchmark, an open evaluation framework that tests web scrapers against 1,000 diverse URLs for success rate and content quality.</p> </div>🚀 What is Teracrawl?
Teracrawl is a production-ready API designed to turn websites into clean, LLM-ready Markdown. It handles the complexity of JavaScript rendering, anti-bot measures, and parallel execution allowing AI systems to access real-time data quickly.
Unlike simple HTML scrapers, Teracrawl uses real managed Chrome browsers, ensuring high success rates even across protected sites.
Why use Teracrawl?
- 🤖 LLM-Optimized Output: Converts complex HTML into clean, semantic Markdown perfect for RAG and context windows.
- ⚡ Smart Two-Phase Crawling:
- Fast Mode: Optimized for static/SSR pages (reuses contexts, blocks heavy assets).
- Dynamic Mode: Automatic fallback for complex SPAs (waits for hydration/rendering).
- 🔍 Search & Scrape: Single endpoint to query Google and scrape the top results in parallel.
- 🏎️ High Concurrency: Built on a robust <a href="https://github.com/BrowserCash/browser-pool">session pool</a> to handle multiple pages simultaneously.
<a name="features"></a>✨ Features
- Search + Scrape: Query Google and scrape top N results in a single API call.
- Direct Scraping: Convert any specific URL to Markdown.
- Smart Content Extraction: Automatically detects main content areas (article, main, etc.) and removes clutter (scripts, styles, navs).
- Safety & Performance:
- Blocks ads, trackers, and analytics.
- Removes base64 images to save token count.
- Automatic timeout handling and error recovery.
- Docker Ready: Deploy anywhere with a lightweight container.
<a name="quick-start"></a>🛠️ Quick Start
Prerequisites
- Node.js 18+ installed.
- A Browser.cash API Key.
- A running SERP service like browser-serp on port 8080 (optional, only for
/crawlendpoint).
Installation
# Clone the repository
git clone https://github.com/BrowserCash/teracrawl.git
cd teracrawl
# Install dependencies
npm install
Configuration
Copy the example environment file and configure your settings:
cp .env.example .env
Open .env and set your BROWSER_API_KEY:
BROWSER_API_KEY=your_browser_cash_api_key_here
Running the Server
# Development mode
npm run dev
# Production build & start
npm run build
npm start
The server will start at http://0.0.0.0:8085.
<a name="api-reference"></a>📚 API Reference
1. Search & Crawl
Performs a Google search and scrapes the content of the top results.
Endpoint: POST /crawl
CURL Request:
curl -X POST http://localhost:8085/crawl \
-H "Content-Type: application/json" \
-d '{
"q": "What is the capital of France?",
"count": 3
}'
| Field | Type | Default | Description |
| :------ | :------- | :----------- | :------------------------------------ |
| q | string | Required | The search query. |
| count | number | 3 | Number of results to scrape (max 20). |
Response:
{
"query": "What is the capital of France?",
"results": [
{
"url": "https://en.wikipedia.org/wiki/Paris",
"title": "Paris - Wikipedia",
"markdown": "# Paris\n\nParis is the capital and most populous city of France...",
"status": "success"
},
{
"url": "https://...",
"status": "error",
"error": "Timeout exceeded"
}
]
}
2. Single Page Scrape
Scrapes a specific URL and converts it to Markdown.
Endpoint: POST /scrape
CURL Request:
curl -X POST http://localhost:8085/scrape \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/blog/post-1"
}'
Response:
{
"url": "https://example.com/blog/post-1",
"title": "My Blog Post",
"markdown": "# My Blog Post\n\nContent of the post...",
"status": "success"
}
3. SERP Search Only
Proxies a search request to the underlying SERP service without scraping content.
Endpoint: POST /serp/search
CURL Request:
curl -X POST http://localhost:8085/serp/search \
-H "Content-Type: application/json" \
-d '{
"q": "browser automation",
"count": 5
}'
Response:
{
"results": [
{
"url": "https://...",
"title": "Result Title",
"description": "Result description..."
}
]
}
4. Health Check
Endpoint: GET /health
CURL Request:
curl http://localhost:8085/health
Response:
{
"ok": true
}
<a name="configuration"></a>⚙️ Configuration
Server & Infrastructure
| Variable | Default | Description |
| :----------------- | :---------------------- | :-------------------------------------------------------------------- |
| BROWSER_API_KEY | Required | Your Browser.cash API key. |
| PORT | 8085 | Port for the API server. |
| HOST | 0.0.0.0 | Host to bind to. |
| SERP_SERVICE_URL | http://localhost:8080 | URL of the upstream SERP/Search service. |
| POOL_SIZE | 1 | Number of concurrent browser sessions to maintain. |
| DEBUG_LOG | false | Enable verbose logging for debugging. |
| DATALAB_API_KEY | Optional | Datalab API key for PDF-to-Markdown conversion. |
Crawler Tuning
| Variable | Default | Description |
| :---------------------------- | :------ | :--------------------------------------------------------------- |
| CRAWL_TABS_PER_SESSION | 8 | Max concurrent tabs per browser session. |
| CRAWL_MIN_CONTENT_LENGTH | 200 | Minimum markdown char length to consider a scrape successful. |
| CRAWL_NAVIGATION_TIMEOUT_MS | 10000 | Timeout for "Fast" scraping mode (ms). |
| CRAWL_SLOW_TIMEOUT_MS | 20000 | Timeout for "Slow" scraping mode (ms). |
| CRAWL_JITTER_MS | 0 | Max random delay (ms) between requests to avoid thundering herd. |
<a name="docker"></a>🐳 Docker
You can run Teracrawl easily using Docker.
Build & Run
# Build the image
docker build -t teracrawl .
# Run with env file
docker run -p 8085:8085 --env-file .env teracrawl
Docker Compose
version: "3.8"
services:
teracrawl:
build: .
ports:
- "8085:8085"
environment:
- BROWSER_API_KEY=${BROWSER_API_KEY}
- SERP_SERVICE_URL=http://serp:8080
depends_on:
- serp
serp:
image: ghcr.io/mega-tera/browser-serp:latest
ports:
- "8080:8080"
🤝 Contributing
Contributions are welcome! We appreciate your help in making Teracrawl better.
How to Contribute
- Fork the Project: click the 'Fork' button at the top right of this page.
- Create your Feature Branch:
git checkout -b feature/AmazingFeature - Commit your Changes:
git commit -m 'Add some AmazingFeature' - Push to the Branch:
git push origin feature/AmazingFeature - Open a Pull Request: Submit your changes for review.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Skills
node-connect
337.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
337.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.2kCommit, push, and open a PR
