<div align="center"> <img src="assets/porter-logo.svg" alt="Porter Logo" width="140" height="140"> <h1>Porter</h1> <p>A streaming-first Arrow server for DuckDB — Flight SQL and WebSocket, simple and built for motion.</p> </div>

🧭 Overview

Porter is a DuckDB-backed Arrow server with two transport protocols:

Flight SQL — gRPC-based Arrow Flight SQL
WebSocket — HTTP-based Arrow streaming

SQL goes in. Arrow streams out. Everything else is detail.

Both transports share the same execution engine, ensuring identical query semantics.

Summary Benchmark Results

| Metric | WebSocket | FlightSQL (gRPC) | | ----------- | ------------ | ---------------- | | Ops | 12 | 12 | | Success | 12 | 12 | | Errors | 0 | 0 | | Rows/sec | 130,712,427 | 121,704,008 | | Throughput | 1014.32 MB/s | 928.53 MB/s | | Latency p50 | 26 ms | 17 ms | | Latency p95 | 41 ms | 60 ms | | Latency p99 | 41 ms | 60 ms |

See the Benchmark Report for details.

⚡ Key Characteristics

Streaming-first execution model (Arrow RecordBatch streams)
Dual transport support: Flight SQL + WebSocket
Bulk Ingest — Arrow RecordBatch → DuckDB with transactional semantics
Shared execution engine for semantic parity
Native DuckDB execution via ADBC
Full prepared statement lifecycle with parameter binding
TTL-based handle management with background GC
Live status surface with pipeline flow, pressure, and backpressure visibility

🏗️ Architecture

           +-------------------+
           |   Flight Client   |  <-- ADBC / Flight SQL
           +-------------------+
                     |
               gRPC / Flight
                     |
           +-------------------+
           |   Porter Server   |
           |-------------------|
           | Shared Engine     |  <-- BuildStream()
           +-------------------+
                     |
           +-------------------+
           |     DuckDB        |
           |   (via ADBC)     |
           +-------------------+
                     |
           +-------------------+
           | Arrow RecordBatches|
           +-------------------+

The server is intentionally thin: routing, lifecycle, and streaming glue only. DuckDB does the heavy lifting.

🚀 Getting Started

You have three ways to run Porter:

Docker (fastest path)
go install (clean local toolchain)
Build from source (full control)

🐳 Option 1 — Run with Docker

docker build -t porter .
docker run -p 32010:32010 -p 8080:8080 porter --ws

Run with a persistent database:

docker run -p 32010:32010 -p 8080:8080 -v $(pwd)/data:/data porter --db /data/porter.duckdb --ws

Defaults:

Flight SQL: 0.0.0.0:32010
WebSocket: 0.0.0.0:8080 (when --ws enabled)
Status: 0.0.0.0:9091 (enabled by default)
Database: in-memory (:memory:)

Prerequisites

Install dbc and required ADBC drivers:

curl -LsSf https://dbc.columnar.tech/install.sh | sh
dbc install duckdb
dbc install flightsql

⚙️ Option 2 — Install via `go install`

1. Install Porter

go install github.com/TFMV/porter/cmd/porter@latest

This installs porter into your $GOBIN.

🛠 Option 3 — Build from Source

1. Clone

git clone https://github.com/TFMV/porter.git
cd porter

2. Run

go run ./cmd/porter serve

💻 CLI Usage

porter --help

Quick Start

porter              # Start Flight SQL server on :32010
porter serve        # Same as above

With WebSocket

porter --ws                        # Flight SQL + WebSocket
porter serve --ws                   # Same as above
porter serve --ws --ws-port 9090   # Custom WebSocket port
porter serve --status-port 9191    # Custom status surface
porter serve --ducklake --ducklake-catalog-type duckdb --ducklake-catalog-dsn ./metadata.ducklake
porter serve --ducklake --ducklake-catalog-type sqlite --ducklake-catalog-dsn ./catalog.sqlite --ducklake-data-path ./ducklake-data

Full Flags

| Flag | Description | Default | |------|-------------|---------| | --db | DuckDB file path | :memory: | | --port | Flight SQL port | 32010 | | --ws | Enable WebSocket | false | | --ws-port | WebSocket port | 8080 | | --status | Enable live status surface | true | | --status-port | Status server port | 9091 | | --ducklake | Enable DuckLake during server startup | false | | --ducklake-catalog-type | DuckLake metadata backend: duckdb, sqlite, postgres, mysql | duckdb | | --ducklake-catalog-dsn | DuckLake metadata DSN or file path | metadata.ducklake | | --ducklake-data-path | DuckLake Parquet/object storage path | empty | | --ducklake-name | Attached DuckLake catalog name | my_ducklake |

Execute a query

porter query "SELECT 1 AS value"

REPL

porter repl

Load Parquet

porter load data.parquet

Inspect schema

porter schema table_name

Environment variables

PORTER_DB
PORTER_PORT
PORTER_WS
PORTER_WS_PORT
PORTER_STATUS
PORTER_STATUS_PORT
PORTER_DUCKLAKE
PORTER_DUCKLAKE_CATALOG_TYPE
PORTER_DUCKLAKE_CATALOG_DSN
PORTER_DUCKLAKE_DATA_PATH
PORTER_DUCKLAKE_NAME

DuckLake Startup

When --ducklake is enabled, Porter initializes DuckLake during server startup and keeps the existing FlightSQL/Arrow execution path unchanged. DuckLake is treated as database configuration, not as a separate query mode.

Supported catalog backends:

duckdb
sqlite
postgres
mysql

Examples:

porter serve --ducklake \
  --ducklake-catalog-type duckdb \
  --ducklake-catalog-dsn ./metadata.ducklake

porter serve --ducklake \
  --ducklake-catalog-type sqlite \
  --ducklake-catalog-dsn ./catalog.sqlite \
  --ducklake-data-path ./ducklake-data

porter serve --ducklake \
  --ducklake-catalog-type postgres \
  --ducklake-catalog-dsn postgres://user:pass@host/db \
  --ducklake-data-path s3://bucket/prefix \
  --ducklake-name my_ducklake

Startup initialization:

INSTALL ducklake;
LOAD ducklake;
ATTACH 'ducklake:<catalog>' AS my_ducklake (DATA_PATH '...');
USE my_ducklake;

Per-connection initialization:

LOAD ducklake;
LOAD <catalog-extension>;
USE my_ducklake;

DuckLake inspection and maintenance functions are available through the existing SQL path, for example:

FROM ducklake_snapshots('my_ducklake');
SELECT * FROM ducklake_table_info('my_ducklake');
SELECT * FROM my_table AT (VERSION => 2);
CALL ducklake_merge_adjacent_files('my_ducklake');
CALL ducklake_expire_snapshots('my_ducklake', dry_run => true);
CALL ducklake_cleanup_old_files('my_ducklake', dry_run => true, cleanup_all => true);

Live Status Surface

Porter now exposes a dedicated status server with a living cross-section of the pipeline:

/status — live instrument panel UI
/status/live — current JSON snapshot
/status/stream — SSE stream of snapshots
/status/history — rolling snapshot history
/status/health — deterministic health status

The flow view tracks:

ingress -> transport -> execution -> egress
rows/sec and MB/sec per stage
queue depth and pressure buildup
p50/p95/p99 latency divergence
live structured activity feed
WebSocket vs FlightSQL vs ingest path comparison

Porter Status

🌐 Wire Contract

Flight SQL

| Operation | Behavior | | -------------------- | ----------------------------------- | | SQL Query | Raw SQL → FlightInfo → DoGet stream | | Prepared Statements | Handle-based execution with binding | | Schema Introspection | Lightweight probe execution | | ExecuteUpdate | DDL/DML via DoPutCommandStatementUpdate |

WebSocket

Send JSON query request:

{"query": "SELECT * FROM table"}

Receive:

Schema message: {"type": "schema", "fields": ["col1", "col2"]}
Binary IPC frames containing Arrow RecordBatches

📥 Bulk Ingest

Porter supports high-throughput Arrow RecordBatch ingestion via Flight SQL's DoPut:

// Engine interface
IngestStream(ctx, table, reader, opts) (int64, error)

Features:

| Feature | Description | |---------|-------------| | Transactional | One stream = one DB transaction | | Schema validation | Incoming Arrow schema must match target table | | Backpressure | Configurable MaxUncommittedBytes (default 64MB) | | Table locking | Per-table mutex prevents concurrent writes to same table | | Auto-commit | Automatically commits on successful ingest, rolls back on failure |

IngestOptions:

| Option | Description | |--------|-------------| | Catalog | Target catalog name | | DBSchema | Target schema name | | Temporary | Create as temporary table | | IngestMode | Append, replace, or create | | MaxUncommittedBytes | Memory limit before fail-fast (default 64MB) |

Flow:

Client → DoPut (Arrow RecordBatch stream) → Engine.IngestStream → SegmentWriter → Commit → DuckDB

The SegmentWriter accumulates RecordBatches in memory, then atomically publishes them on commit. If MaxUncommittedBytes is exceeded, ingestion fails fast with rollback.

🌊 Streaming Core

Both transports use the same execution primitive:

BuildStream(ctx, sql, params) (*arrow.Schema, <-chan StreamChunk, error)

DuckDB → Arrow RecordReader → Channel → StreamChunk

Backpressure is enforced naturally via the channel boundary.

🛣️ Roadmap

[x] Streaming Flight SQL execution
[x] WebSocket transport
[x] Shared execution engine
[x] Bulk Ingest (DoPut)
[x] Prepared statements
[x] TTL-based lifecycle
[x] Background GC
[ ] Se

Porter

Install / Use

README

🧭 Overview

Summary Benchmark Results

⚡ Key Characteristics

🏗️ Architecture

🚀 Getting Started

🐳 Option 1 — Run with Docker

Prerequisites

⚙️ Option 2 — Install via `go install`

1. Install Porter

🛠 Option 3 — Build from Source

1. Clone

2. Run

💻 CLI Usage

Quick Start

With WebSocket

Full Flags

Execute a query

REPL

Load Parquet

Inspect schema

Environment variables

DuckLake Startup

Live Status Surface

🌐 Wire Contract

Flight SQL

WebSocket

📥 Bulk Ingest

🌊 Streaming Core

🛣️ Roadmap

Porter

Install / Use

README

🧭 Overview

Summary Benchmark Results

⚡ Key Characteristics

🏗️ Architecture

🚀 Getting Started

🐳 Option 1 — Run with Docker

Prerequisites

⚙️ Option 2 — Install via go install

1. Install Porter

🛠 Option 3 — Build from Source

1. Clone

2. Run

💻 CLI Usage

Quick Start

With WebSocket

Full Flags

Execute a query

REPL

Load Parquet

Inspect schema

Environment variables

DuckLake Startup

Live Status Surface

🌐 Wire Contract

Flight SQL

WebSocket

📥 Bulk Ingest

🌊 Streaming Core

🛣️ Roadmap

⚙️ Option 2 — Install via `go install`