SkillAgentSearch skills...

Browser4

Browser4: a lightning-fast, coroutine-safe browser for your AI.

Install / Use

/learn @platonai/Browser4
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

🤖 Browser4

Docker Pulls License: APACHE2


English | 简体中文 | 中国镜像

<!-- TOC -->

Table of Contents

<!-- /TOC -->

🌟 Introduction

💖 Browser4: a lightning-fast, coroutine-safe browser engine for your AI 💖

✨ Key Capabilities

  • 👽 Browser Agents — Fully autonomous browser agents that reason, plan, and execute end-to-end tasks.
  • 🤖 Browser Automation — High-performance automation for workflows, navigation, and data extraction.
  • ⚙️ Machine Learning Agent - Learns field structures across complex pages without consuming tokens.
  • Extreme Performance — Fully coroutine-safe; supports 100k ~ 200k complex page visits per machine per day.
  • 🧬 Data Extraction — Hybrid of LLM, ML, and selectors for clean data across chaotic pages.

⚡ Quick Example: Agentic Workflow

// Give your Agent a mission, not just a script.
val agent = AgenticContexts.getOrCreateAgent()

// The Agent plans, navigates, and executes using Browser4 as its hands and eyes.
val result = agent.run("""
    1. Go to amazon.com
    2. Search for '4k monitors'
    3. Analyze the top 5 results for price/performance ratio
    4. Return the best option as JSON
""")

🎥 Demo Videos

🎬 YouTube: Watch the video

📺 Bilibili: https://www.bilibili.com/video/BV1fXUzBFE4L


🚀 Quick Start

Prerequisites: Java 17+

  1. Clone the repository

    git clone https://github.com/platonai/browser4.git
    cd browser4
    
  2. Configure your LLM API key

    Edit application.properties and add your API key.

  3. Build the project

    ./mvnw -DskipTests
    
  4. Run examples

    ./mvnw -pl examples/browser4-examples exec:java -D"exec.mainClass=ai.platon.pulsar.examples.agent.Browser4AgentKt"
    

    If you have encoding problem on Windows:

    ./bin/run-examples.ps1
    

    Explore and run examples in the browser4-examples module to see Browser4 in action.

For Docker deployment, see our Docker Hub repository.

Windows Users: You can also build Browser4 as a standalone Windows installer. See the Windows Installer Guide for details.


💡 Usage Examples

Browser Agents

Autonomous agents that understand natural language instructions and execute complex browser workflows.

val agent = AgenticContexts.getOrCreateAgent()

val task = """
    1. go to amazon.com
    2. search for pens to draw on whiteboards
    3. compare the first 4 ones
    4. write the result to a markdown file
    """

agent.run(task)

Workflow Automation

Low-level browser automation & data extraction with fine-grained control.

Features:

  • Both live DOM access and offline snapshot parsing
  • Direct and full Chrome DevTools Protocol (CDP) control, coroutine safe
  • Precise element interactions (click, scroll, input)
  • Fast data extraction using CSS selectors/XPath
val session = AgenticContexts.getOrCreateSession()
val agent = session.companionAgent
val driver = session.getOrCreateBoundDriver()

// Load the initial page referenced by your input URL
var page = session.open(url)

// Drive the browser with natural-language instructions
agent.act("scroll to the comment section")
// Read the first matching comment node directly from the live DOM
val content = driver.selectFirstTextOrNull("#comments")

// Snapshot the page to an in-memory document for offline parsing
var document = session.parse(page)
// Map CSS selectors to structured fields in one call
var fields = session.extract(document, mapOf("title" to "#title"))

// Let the companion agent execute a multi-step navigation/search flow
val history = agent.run(
    "Go to amazon.com, search for 'smart phone', open the product page with the highest ratings"
)

// Capture the updated browser state back into a PageSnapshot
page = session.capture(driver)
document = session.parse(page)
// Extract additional attributes from the captured snapshot
fields = session.extract(document, mapOf("ratings" to "#ratings"))

LLM + X-SQL

Ideal for high-complexity data-extraction pipelines with multiple-dozen entities and several hundred fields per entity.

Benefits:

  • Extract 10x more entities and 100x more fields compared to traditional methods
  • Combine LLM intelligence with precise CSS selectors/XPath
  • SQL-like syntax for familiar data queries
val context = AgenticContexts.create()
val sql = """
select
  llm_extract(dom, 'product name, price, ratings') as llm_extracted_data,
  dom_first_text(dom, '#productTitle') as title,
  dom_first_text(dom, '#bylineInfo') as brand,
  dom_first_text(dom, '#price tr td:matches(^Price) ~ td, #corePrice_desktop tr td:matches(^Price) ~ td') as price,
  dom_first_text(dom, '#acrCustomerReviewText') as ratings,
  str_first_float(dom_first_text(dom, '#reviewsMedley .AverageCustomerReviews span:contains(out of)'), 0.0) as score
from load_and_select('https://www.amazon.com/dp/B08PP5MSVB -i 1s -njr 3', 'body');
"""
val rs = context.executeQuery(sql)
println(ResultSetFormatter(rs, withHeader = true))

Example code:

High-Speed Parallel Processing

Achieve extreme throughput with parallel browser control and smart resource optimization.

Performance:

  • 10k ~ 20k complex page visits per machine per day
  • Concurrent session management
  • Resource blocking for faster page loads
val args = "-refresh -dropContent -interactLevel fastest"
val blockingUrls = listOf("*.png", "*.jpg")
val links = LinkExtractors.fromResource("urls.txt")
    .map { ListenableHyperlink(it, "", args = args) }
    .onEach {
        it.eventHandlers.browseEventHandlers.onWillNavigate.addLast { page, driver ->
            driver.addBlockedURLs(blockingUrls)
        }
    }

session.submitAll(links)

🎬 YouTube: Watch the video

📺 Bilibili: https://www.bilibili.com/video/BV1kM2rYrEFC


Auto Extraction

Automatic, large-scale, high-precision field discovery and extraction powered by self-/unsupervised machine learning — no LLM API calls, no tokens, deterministic and fast.

What it does:

  • Learns every extractable field on item/detail pages (often dozens to hundreds) with high precision.
  • Open source when browser4 has 10K stars on GitHub.

Why not just LLMs?

  • LLM extraction adds latency, cost, and token limits.
  • ML-based auto extraction is local, reproducible, and scalable to 100k+ ~ 200k pages/day.
  • You can still combine both: use Auto Extraction for structured baseline + LLM for semantic enrichment.

Quick Commands (PulsarRPAPro):

# NOTE: MongoDB required
curl -L -o PulsarRPAPro.jar https://github.com/platonai/PulsarRPAPro/releases/download/v4.6.0/PulsarRPAPro.jar

Integration Status:

  • Available today via the companion project PulsarRPAPro.
  • Native Browser4 API exposure is planned; follow releases for updates.

Key Advantages:

  • High precision: >95% fields discovered; majority with >99% accuracy (indicative on tested domains).
  • Resilient to selector churn & HTML noise.
  • Zero external dependency (no API key) → cost-efficient at scale.
  • Explainable: generated selectors & SQL are transparent and auditable.

👽 Extract data with machine learning agents:

Auto Extraction Result Snapshot

(Coming soon: richer in-repo examples and direct API hooks.)


📦 Modules Overview

| Module | Description | |-------------------|---------------------------------------------------------| | pulsar-core | Core engine: sessions, scheduling, DOM, browser control | | pulsar-agentic | Agent implementation, MCP, and skill registration | | pulsar-rest | Spring Boot REST layer & command endpoints | | browser4-spa | Single Page Application for browser agents | | browser4-agents | Agent & crawler orchestration with product packaging | | sdks | Kotlin/Python SDKs plus tests and examples | | examples | Runnable examples and demos | | pulsar-tests | E2E & heavy integration & scenario tests |


📜 SDK

SDKs are

View on GitHub
GitHub Stars1.0k
CategoryDevelopment
Updated2h ago
Forks144

Languages

Kotlin

Security Score

100/100

Audited on Apr 1, 2026

No findings