SkillAgentSearch skills...

Arc

High-performance analytical database. DuckDB SQL engine + Parquet storage + Arrow format. 18M+ records/sec ingestion. 6M+ rows/sec queries. Use for analytics, observability, AI, IoT, logs. Single Go binary. S3/Azure native. No vendor lock-in. AGPL-3.0

Install / Use

/learn @Basekick-Labs/Arc

README

Arc

Ingestion Query Go License

Docs Website Discord GitHub

High-performance columnar analytical database. 18M+ records/sec ingestion, 6M+ rows/sec queries. Built on DuckDB + Parquet + Arrow. Use for product analytics, observability, AI agents, IoT, logs, or data warehousing. Single binary. No vendor lock-in. AGPL-3.0


The Problem

Modern applications generate massive amounts of data that needs fast ingestion and analytical queries:

  • Product Analytics: Events, clickstreams, user behavior, A/B testing
  • Observability: Metrics, logs, traces from distributed systems
  • AI Agent Memory: Conversation history, context, RAG, embeddings
  • Industrial IoT: Manufacturing telemetry, sensors, equipment monitoring
  • Security & Compliance: Audit logs, SIEM, security events
  • Data Warehousing: Analytics, BI, reporting on time-series or event data

Traditional solutions have problems:

  • Expensive: Cloud data warehouses cost thousands per month at scale
  • Complex: ClickHouse/Druid require cluster management expertise
  • Vendor lock-in: Proprietary formats trap your data
  • Slow ingestion: Most analytical DBs struggle with high-throughput writes
  • Overkill: Need simple deployment, not Kubernetes orchestration

Arc solves this: 18M+ records/sec ingestion, 6M+ rows/sec queries, portable Parquet files you own, single binary deployment.

-- Product analytics: user events
SELECT
  user_id,
  event_type,
  COUNT(*) as event_count,
  COUNT(DISTINCT session_id) as sessions
FROM analytics.events
WHERE timestamp > NOW() - INTERVAL '7 days'
  AND event_type IN ('page_view', 'click', 'purchase')
GROUP BY user_id, event_type
HAVING COUNT(*) > 100;

-- Observability: error rate by service
SELECT
  service_name,
  DATE_TRUNC('hour', timestamp) as hour,
  COUNT(*) as total_requests,
  SUM(CASE WHEN status >= 500 THEN 1 ELSE 0 END) as errors,
  (SUM(CASE WHEN status >= 500 THEN 1 ELSE 0 END)::FLOAT / COUNT(*)) * 100 as error_rate
FROM logs.http_requests
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY service_name, hour
HAVING error_rate > 1.0;

-- AI agent memory: conversation search
SELECT
  agent_id,
  conversation_id,
  user_message,
  assistant_response,
  created_at
FROM ai.conversations
WHERE agent_id = 'support-bot-v2'
  AND created_at > NOW() - INTERVAL '30 days'
  AND user_message ILIKE '%refund%'
ORDER BY created_at DESC
LIMIT 100;

Standard DuckDB SQL. Window functions, CTEs, joins. No proprietary query language.


Live Demo

See Arc in action: https://basekick.net/demos


Performance

Benchmarked on Apple MacBook Pro M3 Max (14 cores, 36GB RAM, 1TB NVMe). Test config: 12 concurrent workers, 1000-record batches, columnar data.

Ingestion

| Protocol | Throughput | p50 Latency | p99 Latency | |----------|------------|-------------|-------------| | MessagePack Columnar | 18.6M rec/s | 0.46ms | 3.68ms | | MessagePack + Zstd | 16.8M rec/s | 0.55ms | 3.23ms | | MessagePack + GZIP | 15.4M rec/s | 0.63ms | 3.17ms | | Line Protocol | 3.7M rec/s | 2.63ms | 10.63ms |

Compaction

Automatic background compaction merges small Parquet files into optimized larger files:

| Metric | Before | After | Reduction | |--------|--------|-------|-----------| | Files | 43 | 1 | 97.7% | | Size | 372 MB | 36 MB | 90.4% |

Benefits:

  • 10x storage reduction via better compression and encoding
  • Faster queries - scan 1 file vs 43 files
  • Lower cloud costs - less storage, fewer API calls

Query (March 2026)

Arrow IPC format provides up to 3.6x throughput vs JSON for large result sets:

| Query | Arrow (ms) | JSON (ms) | Speedup | |-------|------------|-----------|---------| | COUNT(*) - 1.88B rows | 1.9 | 1.8 | 0.95x | | SELECT LIMIT 10K | 70 | 75 | 1.07x | | SELECT LIMIT 100K | 88 | 106 | 1.20x | | SELECT LIMIT 500K | 127 | 253 | 1.99x | | SELECT LIMIT 1M | 159 | 438 | 2.75x | | Time Range (7d) LIMIT 10K | 45 | 51 | 1.13x | | Time Bucket (1h, 7d) | 986 | 1089 | 1.10x | | Date Trunc (day, 30d) | 2013 | 2190 | 1.09x |

Best throughput:

  • Arrow: 6.29M rows/sec (1M row SELECT)
  • JSON: 2.28M rows/sec (1M row SELECT)
  • COUNT(*): ~1.1T rows/sec (1.88B rows, 1.8ms)

Why Go

  • Stable memory: Go's GC returns memory to OS. No leaks.
  • Single binary: Deploy one executable. No dependencies.
  • Native concurrency: Goroutines handle thousands of connections efficiently.
  • Production GC: Sub-millisecond pause times at scale.

Quick Start

# Build
make build

# Run
./arc

# Verify
curl http://localhost:8000/health

Installation

Docker

docker run -d \
  -p 8000:8000 \
  -v arc-data:/app/data \
  ghcr.io/basekick-labs/arc:latest

Debian/Ubuntu

wget https://github.com/basekick-labs/arc/releases/download/v26.03.1/arc_26.03.1_amd64.deb
sudo dpkg -i arc_26.03.1_amd64.deb
sudo systemctl enable arc && sudo systemctl start arc

RHEL/Fedora

wget https://github.com/basekick-labs/arc/releases/download/v26.03.1/arc-26.03.1-1.x86_64.rpm
sudo rpm -i arc-26.03.1-1.x86_64.rpm
sudo systemctl enable arc && sudo systemctl start arc

Kubernetes (Helm)

helm install arc https://github.com/basekick-labs/arc/releases/download/v26.03.1/arc-26.03.1.tgz

Build from Source

# Prerequisites: Go 1.26+

# Clone and build
git clone https://github.com/basekick-labs/arc.git
cd arc
make build

# Or build directly with Go (the duckdb_arrow tag is required)
go build -tags=duckdb_arrow ./cmd/arc

# Run
./arc

Ecosystem & Integrations

| Tool | Description | Link | |------|-------------|------| | VS Code Extension | Browse databases, run queries, visualize results | Marketplace | | Grafana Data Source | Native Grafana plugin for dashboards and alerting | GitHub | | Telegraf Output Plugin | Ship data from 300+ Telegraf inputs directly to Arc | Docs | | Python SDK | Query and ingest from Python applications | PyPI | | Superset Dialect (JSON) | Apache Superset connector using JSON transport | GitHub | | Superset Dialect (Arrow) | Apache Superset connector using Arrow transport | GitHub |


Features

Core Capabilities

  • Columnar storage: Parquet format with DuckDB query engine

  • Multi-use-case: Product analytics, observability, AI, IoT, logs, data warehousing

  • Ingestion: MessagePack columnar (fastest), InfluxDB Line Protocol

  • Query: DuckDB SQL engine, JSON and Apache Arrow IPC responses

  • Storage: Local filesystem, S3, MinIO

  • Auth: Token-based authentication with in-memory caching

  • Durability: Optional write-ahead log (WAL)

  • Compaction: Tiered (hourly/daily) automatic file merging

  • Data Management: Retention policies, continuous queries, GDPR-compliant delete

  • Observability: Prometheus metrics, structured logging, graceful shutdown

  • Reliability: Circuit breakers, retry with exponential backoff


Configuration

Arc uses TOML configuration with environment variable overrides.

[server]
host = "0.0.0.0"
port = 8000

[storage]
backend = "local"        # local, s3, minio
local_path = "./data/arc"

[ingest]
flush_interval = "5s"
max_buffer_size = 50000

[auth]
enabled = true

Environment variables use ARC_ prefix:

export ARC_SERVER_PORT=8000
export ARC_STORAGE_BACKEND=s3
export ARC_AUTH_ENABLED=true

See arc.toml for complete configuration reference.


Project Structure

arc/
├── cmd/arc/              # Application entry point
├── internal/
│   ├── api/              # HTTP handlers (Fiber) — query, write, import, TLE, admin
│   ├── audit/            # Audit logging for API operations
│   ├── auth/             # Token authentication and RBAC
│   ├── backup/           # Backup and restore (data, metadata, config)
│   ├── circuitbreaker/   # Resilience patterns (retry, backoff)
│   ├── cluster/          # Raft consensus, node roles, WAL replication
│   ├── compaction/       # Tiered hourly/daily Parquet file merging
│   ├── config/           # TOML configuration with env var overrides
│   ├── database/         # DuckDB connection pool
│   ├── governance/       # Per-token query quotas and rate limiting
│   ├── ingest/           # MessagePack, Line Protocol, TLE, Arrow writer
│   ├── license/          # License validation and feature gating
│   ├── logger/           # Structured logging (zerolog)
│   ├── metrics/          # Prometheus metrics
│   ├── mqtt/             # MQTT subscriber — topic-to-measurement ingestion
│   ├── pruning/          # Query-time partition pruning
│   ├── query/            # Parallel partition executor
│   ├── queryregistry/    # Active/completed query tracking
│   ├── scheduler/        # Continuous queries and retention policies
│  
View on GitHub
GitHub Stars565
CategoryData
Updated3h ago
Forks32

Languages

Go

Security Score

100/100

Audited on Mar 27, 2026

No findings