Duckarrow
DuckDB extension for querying Apache Arrow Flight SQL servers
Install / Use
/learn @JC1738/DuckarrowREADME
DuckArrow - DuckDB Flight SQL Extension
A DuckDB extension written in Go that enables querying remote Apache Arrow Flight SQL servers directly from DuckDB SQL.
Features
- Simple syntax: Query remote tables with
SELECT * FROM duckarrow."TableName" - DDL/DML support: Execute CREATE, DROP, INSERT, UPDATE, DELETE via
duckarrow_execute() - Column projection pushdown: Only fetches requested columns (7-9x speedup)
- Connection pooling: Reuses gRPC connections across queries
- Full type support: 20+ Arrow types including DECIMAL, LIST, STRUCT, MAP
- Security: SQL injection prevention, TLS support, input validation
- Multi-platform: Builds for Linux, macOS, and Windows (x86_64 and ARM64)
Quick Start
Note: This extension requires a Flight SQL server to connect to. It was developed and tested with GizmoSQL, but should work with any Arrow Flight SQL compliant server.
# Clone with submodules
git clone --recursive <repo-url>
cd duckarrow
# Build the extension
make build
# Load in DuckDB (requires -unsigned flag)
duckdb -unsigned
-- Load the extension
LOAD './build/linux_amd64/duckarrow.duckdb_extension';
-- Configure your Flight SQL server
SELECT duckarrow_configure('grpc+tls://localhost:31337', 'username', 'password');
-- Query remote tables
SELECT * FROM duckarrow."Orders" LIMIT 10;
SELECT id, customer_name FROM duckarrow."Orders" WHERE status = 'COMPLETED';
Installation
Download Pre-built Release (Recommended)
Download the latest release for your platform from GitHub Releases:
# Using gh CLI (recommended)
gh release download --repo JC1738/duckarrow --pattern "*linux_amd64*" # Linux x86_64
gh release download --repo JC1738/duckarrow --pattern "*osx_arm64*" # macOS Apple Silicon
gh release download --repo JC1738/duckarrow --pattern "*osx_amd64*" # macOS Intel
gh release download --repo JC1738/duckarrow --pattern "*windows_amd64*" # Windows
# Or download specific version with curl (replace VERSION with desired tag)
VERSION=v0.0.3
curl -LO "https://github.com/JC1738/duckarrow/releases/download/${VERSION}/duckarrow-${VERSION}-linux_amd64.duckdb_extension"
Build from Source
Prerequisites
- Go 1.24.0+
- Python 3 (for metadata embedding)
- CGO enabled
Clone and Build
# Clone with submodules
git clone --recursive <repo-url>
cd duckarrow
# Or if already cloned without --recursive:
git submodule update --init
# Build for current platform
make build
# Output: build/<platform>/duckarrow.duckdb_extension
Multi-Platform Builds
# Auto-detect current platform
make build
# Build for specific platforms
make build-linux-amd64 # Linux x86_64
make build-linux-arm64 # Linux ARM64
make build-darwin-amd64 # macOS Intel
make build-darwin-arm64 # macOS Apple Silicon
make build-windows-amd64 # Windows x86_64
make build-windows-arm64 # Windows ARM64
# Build all platforms for an OS
make build-linux # Both Linux platforms
make build-darwin # Both macOS platforms
make build-windows # Both Windows platforms
Note: Cross-compilation requires appropriate C toolchains. Native builds on each platform are recommended.
Verify Installation
duckdb -unsigned -c "LOAD './build/linux_amd64/duckarrow.duckdb_extension';"
# Should complete without error
Usage
Check Version
SELECT duckarrow_version();
-- Returns: v0.0.3 (or "dev" for local builds)
Configuration
Configure your Flight SQL server credentials once per session:
SELECT duckarrow_configure(uri, username, password);
-- For self-signed certificates, skip TLS verification:
SELECT duckarrow_configure(uri, username, password, true);
| Parameter | Type | Required | Default | Example |
|-----------|------|----------|---------|---------|
| uri | VARCHAR | Yes | - | 'grpc+tls://localhost:31337' |
| username | VARCHAR | No | '' | 'admin' |
| password | VARCHAR | No | '' | 'secret' |
| skip_verify | BOOLEAN | No | false | true |
Supported URI schemes:
grpc://- Unencrypted gRPCgrpc+tls://- TLS-encrypted gRPC (recommended)
TLS Certificate Verification:
- By default, TLS certificates are verified (
skip_verify = false) - Set
skip_verify = trueonly for development/testing with self-signed certificates - For production, use properly signed certificates and keep verification enabled
Password Security
⚠️ Security Notice: DuckDB CLI displays all function parameters in plain text. To avoid exposing passwords in terminal history or screen sharing:
Option 1 - Environment Variable (Recommended):
export DUCKARROW_PASSWORD='your_secret_password'
export DUCKARROW_USERNAME='your_username' # Optional
duckdb <<EOF
LOAD './build/duckarrow.duckdb_extension';
SELECT duckarrow_configure('grpc+tls://localhost:31337', 'username', '', true);
SELECT * FROM duckarrow."TableName" LIMIT 10;
EOF
Option 2 - Traditional (Password visible in CLI):
SELECT duckarrow_configure('grpc+tls://localhost:31337', 'username', 'password');
-- ⚠️ Password will be visible in terminal and duckdb history
Environment Variables:
DUCKARROW_PASSWORD: Password fallback when password parameter is empty stringDUCKARROW_USERNAME: Username fallback when username parameter is empty string
Priority Order: Function parameter > Environment variable > Empty string
Query Syntax
Replacement scan (recommended):
-- Simple select
SELECT * FROM duckarrow."TableName";
-- With column projection (only fetches needed columns)
SELECT id, name FROM duckarrow."Orders";
-- Filtering, aggregation, joins all work
SELECT COUNT(*) FROM duckarrow."Orders" WHERE status = 'COMPLETED';
Direct table function:
SELECT * FROM duckarrow_query(
'grpc+tls://server:port',
'SELECT * FROM "TableName"'
);
DDL/DML Execution
For statements that don't return results (CREATE, DROP, INSERT, UPDATE, DELETE), use duckarrow_execute():
-- Create a table on the remote server
SELECT duckarrow_execute('CREATE TABLE "my_table" (id INTEGER, name VARCHAR)');
-- Insert data
SELECT duckarrow_execute('INSERT INTO "my_table" VALUES (1, ''Alice'')');
-- Drop a table
SELECT duckarrow_execute('DROP TABLE "my_table"');
The function returns the number of affected rows (or -1 if the server doesn't provide this information).
Note: Unlike duckarrow.* syntax which only works for SELECT queries, duckarrow_execute() is required for DDL/DML because DuckDB's replacement scan only intercepts table references in FROM clauses.
Examples
-- Basic queries
SELECT * FROM duckarrow."Orders" LIMIT 5;
SELECT id, customer_name, total FROM duckarrow."Orders";
-- Aggregation
SELECT status, COUNT(*) as count, SUM(total) as revenue
FROM duckarrow."Orders"
GROUP BY status;
-- Join with local data
CREATE TEMP TABLE local_ids AS SELECT 1 as id UNION SELECT 2;
SELECT o.* FROM duckarrow."Orders" o JOIN local_ids l ON o.id = l.id;
-- Subqueries
SELECT * FROM (
SELECT id, name FROM duckarrow."Products" LIMIT 100
) sub WHERE id > 50;
-- Inspect schema
DESCRIBE SELECT * FROM duckarrow."Orders";
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ User SQL Query │
│ SELECT id, name FROM duckarrow."Orders" │
└─────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Replacement Scan │
│ • Validates table name (SQL injection prevention) │
│ • Rewrites to duckarrow_query() table function │
└─────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Table Function - Bind Phase │
│ • Connect to Flight SQL server (via connection pool) │
│ • Execute schema query: SELECT * FROM "Orders" WHERE 1=0 │
│ • Store column metadata for projection pushdown │
└─────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Table Function - Init Phase │
│ • DuckDB provides list of needed columns │
│ • Build optimized query: SELECT id, name FROM "Orders" │
│ • Execute query with Flight SQL client │
└─────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Table Function - Scan Phase │
│ • Stream Arrow RecordBatches from Flight SQL │
│ • Convert Arrow → DuckDB (type mapping, NULL handling) │
│ • Return data chunks to DuckDB │
└─────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ DuckDB Result │
└────────────────────────────────────────────────────
