SkillAgentSearch skills...

Goscraper

My Web Scraper with Golang

Install / Use

/learn @ramusaaa/Goscraper
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

GoScraper 🚀

Enterprise-Grade Web Scraping Library & Microservice for Go

Modern, fast, and stealth web scraping library with AI-powered extraction, anti-bot detection, and microservice architecture. Perfect for e-commerce, news, and data extraction at scale.

Go Version License Go Report Card GoDoc

🌟 Key Features

🤖 AI-Powered Smart Extraction

  • Multiple AI Providers: OpenAI GPT-4, Anthropic Claude, Local models
  • Smart Content Detection: Automatically identifies and extracts structured data
  • Confidence Scoring: Quality assurance for extracted data
  • Fallback Chain: CSS/XPath extraction when AI fails

🏗️ Microservice Architecture

  • HTTP API Server: RESTful endpoints for scraping operations
  • Docker Support: Container-ready with Docker Compose
  • Kubernetes Ready: Production deployment manifests included
  • Load Balancing: Nginx configuration for horizontal scaling

⚙️ Flexible Configuration System

  • JSON Configuration: File-based configuration management
  • Environment Variables: 12-factor app compliance
  • CLI Tools: Interactive setup and validation
  • Hot Reloading: Runtime configuration updates

🌐 Multi-Engine Browser Support

  • ChromeDP: High-performance Chrome automation
  • Rod: Lightning-fast browser control
  • Stealth Mode: Advanced anti-detection techniques
  • Headless & GUI: Flexible rendering options

🚀 Production Features

  • Rate Limiting: Configurable request throttling
  • Caching: Redis and in-memory caching
  • Proxy Support: IP rotation and geo-targeting
  • Health Checks: Monitoring and observability
  • Graceful Shutdown: Clean resource management

📦 Installation

go get github.com/ramusaaa/goscraper

🚀 Quick Start

Method 1: Interactive Setup (Recommended)

# 1. Initialize configuration
make init-config

# 2. Interactive setup wizard
make setup
# Follow prompts to configure AI keys, caching, etc.

# 3. Validate configuration
make validate-config

# 4. Start the server
make run

Method 2: Environment Variables

# Set your API keys
export OPENAI_API_KEY="your-openai-key"
export GOSCRAPER_AI_ENABLED=true

# Start the server
go run ./cmd/api

Method 3: Manual Configuration

# Create config file
cp goscraper.example.json goscraper.json

# Edit configuration
vim goscraper.json

# Start server
go run ./cmd/api

💻 Usage Examples

Basic Library Usage

package main

import (
    "fmt"
    "log"
    
    "github.com/ramusaaa/goscraper"
)

func main() {
    // Simple scraping
    scraper := goscraper.New()
    
    resp, err := scraper.Get("https://example.com")
    if err != nil {
        log.Fatal(err)
    }
    
    title := resp.Document.Find("title").Text()
    fmt.Printf("Page title: %s\n", title)
}

Advanced Configuration

scraper := goscraper.New(
    goscraper.WithTimeout(30*time.Second),
    goscraper.WithUserAgent("MyBot/1.0"),
    goscraper.WithHeaders(map[string]string{
        "Accept-Language": "en-US,en;q=0.9",
    }),
    goscraper.WithRateLimit(500*time.Millisecond),
    goscraper.WithMaxRetries(3),
    goscraper.WithProxy("http://proxy.example.com:8080"),
    goscraper.WithStealth(true),
)

HTTP API Usage

# Health check
curl http://localhost:8080/health

# Get configuration
curl http://localhost:8080/config

# Scrape a website
curl -X POST http://localhost:8080/api/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Smart AI-powered scraping
curl -X POST http://localhost:8080/api/smart-scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://shop.example.com/products"}'

Client SDK Usage

package main

import (
    "fmt"
    "log"
    
    "github.com/ramusaaa/goscraper/client"
)

func main() {
    // Create client for remote scraper service
    client := client.NewScraperClient("http://localhost:8080")
    
    // Health check
    if err := client.Health(); err != nil {
        log.Fatal("Service unavailable:", err)
    }
    
    // Scrape website
    data, err := client.Scrape("https://example.com")
    if err != nil {
        log.Fatal("Scraping failed:", err)
    }
    
    fmt.Printf("Title: %s\n", data.Title)
    fmt.Printf("Status: %d\n", data.StatusCode)
}

📋 Configuration Reference

Configuration File Structure

{
  "server": {
    "port": "8080",
    "host": "0.0.0.0",
    "read_timeout": "30s",
    "write_timeout": "30s"
  },
  "ai": {
    "enabled": true,
    "provider": "openai",
    "confidence_threshold": 0.8,
    "fallback_chain": ["openai", "css", "xpath"],
    "models": {
      "openai": {
        "api_key": "your-openai-key",
        "model": "gpt-4"
      },
      "anthropic": {
        "api_key": "your-anthropic-key",
        "model": "claude-3-sonnet-20240229"
      }
    }
  },
  "browser": {
    "engine": "chromedp",
    "headless": true,
    "stealth": true,
    "pool_size": 5
  },
  "cache": {
    "enabled": true,
    "type": "redis",
    "ttl": "1h",
    "redis": {
      "host": "localhost",
      "port": 6379
    }
  },
  "rate_limit": {
    "requests_per_second": 10,
    "delay": "100ms"
  }
}

Environment Variables

# Server Configuration
GOSCRAPER_PORT=8080
GOSCRAPER_HOST=0.0.0.0

# AI Configuration
GOSCRAPER_AI_ENABLED=true
GOSCRAPER_AI_PROVIDER=openai
OPENAI_API_KEY=your-openai-key
ANTHROPIC_API_KEY=your-anthropic-key

# Browser Configuration
GOSCRAPER_BROWSER_ENGINE=chromedp
GOSCRAPER_BROWSER_HEADLESS=true
GOSCRAPER_BROWSER_STEALTH=true

# Cache Configuration
GOSCRAPER_CACHE_ENABLED=true
GOSCRAPER_CACHE_TYPE=redis
REDIS_HOST=localhost
REDIS_PORT=6379

# Rate Limiting
GOSCRAPER_RATE_LIMIT_RPS=10
GOSCRAPER_RATE_LIMIT_DELAY=100ms

🛠️ CLI Tools

Available Commands

# Configuration Management
make init-config          # Create default configuration
make setup                # Interactive setup wizard
make validate-config      # Validate configuration
make show-config          # Display current configuration

# Development
make build                # Build binaries
make run                  # Start API server
make test                 # Run tests

# Docker
make docker-build         # Build Docker image
make docker-compose-up    # Start with Docker Compose
make docker-compose-down  # Stop Docker services

# Kubernetes
make k8s-deploy          # Deploy to Kubernetes
make k8s-delete          # Remove from Kubernetes

CLI Usage Examples

# Initialize new project
goscraper init

# Interactive setup
goscraper setup

# Validate configuration
goscraper validate

# Show current configuration
goscraper config

🏗️ Architecture Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Load Balancer │    │   API Gateway   │    │  Web Dashboard  │
│     (Nginx)     │    │   (Optional)    │    │   (Optional)    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  Scraper Node 1 │    │  Scraper Node 2 │    │  Scraper Node N │
│                 │    │                 │    │                 │
│ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
│ │HTTP API     │ │    │ │HTTP API     │ │    │ │HTTP API     │ │
│ │Server       │ │    │ │Server       │ │    │ │Server       │ │
│ └─────────────┘ │    │ └─────────────┘ │    │ └─────────────┘ │
│ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
│ │Browser Pool │ │    │ │Browser Pool │ │    │ │Browser Pool │ │
│ │+ AI Engine  │ │    │ │+ AI Engine  │ │    │ │+ AI Engine  │ │
│ └─────────────┘ │    │ └─────────────┘ │    │ └─────────────┘ │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
    ┌─────────────────────────────────────────────────────────┐
    │                Infrastructure Layer                      │
    │                                                         │
    │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
    │  │    Redis    │  │  Config     │  │   Proxy     │     │
    │  │   Cache     │  │  Storage    │  │  Rotation   │     │
    │  └─────────────┘  └─────────────┘  └─────────────┘     │
    │                                                         │
    │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
    │  │   OpenAI    │  │ Anthropic   │  │   Local     │     │
    │  │    API      │  │    API      │  │   Models    │     │
    │  └─────────────┘  └─────────────┘  └─────────────┘     │
    └─────────────────────────────────────────────────────────┘

🚀 Deployment Options

1. Standalone Binary

# Build and run
go build -o goscraper ./cmd/api
./goscraper

2. Docker Container

# Build image
docker build -t goscraper:latest .

# Run container
docker run -p 8080:8080 \
  -e OPENAI_API_KEY=your-key \
  -e GOSCRAPER_AI_ENABLED=true \
  goscraper:latest

3. Docker Compose

# Start services
docker-compose up -d

# View logs
docker-compose logs -f scraper-api

# Stop services
docker-compose down

4. Kubernetes

Related Skills

View on GitHub
GitHub Stars5
CategoryDevelopment
Updated3mo ago
Forks0

Languages

Go

Security Score

67/100

Audited on Jan 3, 2026

No findings