SkillAgentSearch skills...

Duckarrow

DuckDB extension for querying Apache Arrow Flight SQL servers

Install / Use

/learn @JC1738/Duckarrow
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

DuckArrow - DuckDB Flight SQL Extension

CI Website

A DuckDB extension written in Go that enables querying remote Apache Arrow Flight SQL servers directly from DuckDB SQL.

Features

  • Simple syntax: Query remote tables with SELECT * FROM duckarrow."TableName"
  • DDL/DML support: Execute CREATE, DROP, INSERT, UPDATE, DELETE via duckarrow_execute()
  • Column projection pushdown: Only fetches requested columns (7-9x speedup)
  • Connection pooling: Reuses gRPC connections across queries
  • Full type support: 20+ Arrow types including DECIMAL, LIST, STRUCT, MAP
  • Security: SQL injection prevention, TLS support, input validation
  • Multi-platform: Builds for Linux, macOS, and Windows (x86_64 and ARM64)

Quick Start

Note: This extension requires a Flight SQL server to connect to. It was developed and tested with GizmoSQL, but should work with any Arrow Flight SQL compliant server.

# Clone with submodules
git clone --recursive <repo-url>
cd duckarrow

# Build the extension
make build

# Load in DuckDB (requires -unsigned flag)
duckdb -unsigned
-- Load the extension
LOAD './build/linux_amd64/duckarrow.duckdb_extension';

-- Configure your Flight SQL server
SELECT duckarrow_configure('grpc+tls://localhost:31337', 'username', 'password');

-- Query remote tables
SELECT * FROM duckarrow."Orders" LIMIT 10;
SELECT id, customer_name FROM duckarrow."Orders" WHERE status = 'COMPLETED';

Installation

Download Pre-built Release (Recommended)

Download the latest release for your platform from GitHub Releases:

# Using gh CLI (recommended)
gh release download --repo JC1738/duckarrow --pattern "*linux_amd64*"    # Linux x86_64
gh release download --repo JC1738/duckarrow --pattern "*osx_arm64*"      # macOS Apple Silicon
gh release download --repo JC1738/duckarrow --pattern "*osx_amd64*"      # macOS Intel
gh release download --repo JC1738/duckarrow --pattern "*windows_amd64*"  # Windows

# Or download specific version with curl (replace VERSION with desired tag)
VERSION=v0.0.3
curl -LO "https://github.com/JC1738/duckarrow/releases/download/${VERSION}/duckarrow-${VERSION}-linux_amd64.duckdb_extension"

Build from Source

Prerequisites

  • Go 1.24.0+
  • Python 3 (for metadata embedding)
  • CGO enabled

Clone and Build

# Clone with submodules
git clone --recursive <repo-url>
cd duckarrow

# Or if already cloned without --recursive:
git submodule update --init

# Build for current platform
make build
# Output: build/<platform>/duckarrow.duckdb_extension

Multi-Platform Builds

# Auto-detect current platform
make build

# Build for specific platforms
make build-linux-amd64        # Linux x86_64
make build-linux-arm64        # Linux ARM64
make build-darwin-amd64       # macOS Intel
make build-darwin-arm64       # macOS Apple Silicon
make build-windows-amd64      # Windows x86_64
make build-windows-arm64      # Windows ARM64

# Build all platforms for an OS
make build-linux              # Both Linux platforms
make build-darwin             # Both macOS platforms
make build-windows            # Both Windows platforms

Note: Cross-compilation requires appropriate C toolchains. Native builds on each platform are recommended.

Verify Installation

duckdb -unsigned -c "LOAD './build/linux_amd64/duckarrow.duckdb_extension';"
# Should complete without error

Usage

Check Version

SELECT duckarrow_version();
-- Returns: v0.0.3 (or "dev" for local builds)

Configuration

Configure your Flight SQL server credentials once per session:

SELECT duckarrow_configure(uri, username, password);

-- For self-signed certificates, skip TLS verification:
SELECT duckarrow_configure(uri, username, password, true);

| Parameter | Type | Required | Default | Example | |-----------|------|----------|---------|---------| | uri | VARCHAR | Yes | - | 'grpc+tls://localhost:31337' | | username | VARCHAR | No | '' | 'admin' | | password | VARCHAR | No | '' | 'secret' | | skip_verify | BOOLEAN | No | false | true |

Supported URI schemes:

  • grpc:// - Unencrypted gRPC
  • grpc+tls:// - TLS-encrypted gRPC (recommended)

TLS Certificate Verification:

  • By default, TLS certificates are verified (skip_verify = false)
  • Set skip_verify = true only for development/testing with self-signed certificates
  • For production, use properly signed certificates and keep verification enabled

Password Security

⚠️ Security Notice: DuckDB CLI displays all function parameters in plain text. To avoid exposing passwords in terminal history or screen sharing:

Option 1 - Environment Variable (Recommended):

export DUCKARROW_PASSWORD='your_secret_password'
export DUCKARROW_USERNAME='your_username'  # Optional
duckdb <<EOF
  LOAD './build/duckarrow.duckdb_extension';
  SELECT duckarrow_configure('grpc+tls://localhost:31337', 'username', '', true);
  SELECT * FROM duckarrow."TableName" LIMIT 10;
EOF

Option 2 - Traditional (Password visible in CLI):

SELECT duckarrow_configure('grpc+tls://localhost:31337', 'username', 'password');
-- ⚠️ Password will be visible in terminal and duckdb history

Environment Variables:

  • DUCKARROW_PASSWORD: Password fallback when password parameter is empty string
  • DUCKARROW_USERNAME: Username fallback when username parameter is empty string

Priority Order: Function parameter > Environment variable > Empty string

Query Syntax

Replacement scan (recommended):

-- Simple select
SELECT * FROM duckarrow."TableName";

-- With column projection (only fetches needed columns)
SELECT id, name FROM duckarrow."Orders";

-- Filtering, aggregation, joins all work
SELECT COUNT(*) FROM duckarrow."Orders" WHERE status = 'COMPLETED';

Direct table function:

SELECT * FROM duckarrow_query(
    'grpc+tls://server:port',
    'SELECT * FROM "TableName"'
);

DDL/DML Execution

For statements that don't return results (CREATE, DROP, INSERT, UPDATE, DELETE), use duckarrow_execute():

-- Create a table on the remote server
SELECT duckarrow_execute('CREATE TABLE "my_table" (id INTEGER, name VARCHAR)');

-- Insert data
SELECT duckarrow_execute('INSERT INTO "my_table" VALUES (1, ''Alice'')');

-- Drop a table
SELECT duckarrow_execute('DROP TABLE "my_table"');

The function returns the number of affected rows (or -1 if the server doesn't provide this information).

Note: Unlike duckarrow.* syntax which only works for SELECT queries, duckarrow_execute() is required for DDL/DML because DuckDB's replacement scan only intercepts table references in FROM clauses.

Examples

-- Basic queries
SELECT * FROM duckarrow."Orders" LIMIT 5;
SELECT id, customer_name, total FROM duckarrow."Orders";

-- Aggregation
SELECT status, COUNT(*) as count, SUM(total) as revenue
FROM duckarrow."Orders"
GROUP BY status;

-- Join with local data
CREATE TEMP TABLE local_ids AS SELECT 1 as id UNION SELECT 2;
SELECT o.* FROM duckarrow."Orders" o JOIN local_ids l ON o.id = l.id;

-- Subqueries
SELECT * FROM (
    SELECT id, name FROM duckarrow."Products" LIMIT 100
) sub WHERE id > 50;

-- Inspect schema
DESCRIBE SELECT * FROM duckarrow."Orders";

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         User SQL Query                          │
│            SELECT id, name FROM duckarrow."Orders"              │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Replacement Scan                             │
│  • Validates table name (SQL injection prevention)              │
│  • Rewrites to duckarrow_query() table function                 │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                 Table Function - Bind Phase                     │
│  • Connect to Flight SQL server (via connection pool)           │
│  • Execute schema query: SELECT * FROM "Orders" WHERE 1=0       │
│  • Store column metadata for projection pushdown                │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                 Table Function - Init Phase                     │
│  • DuckDB provides list of needed columns                       │
│  • Build optimized query: SELECT id, name FROM "Orders"         │
│  • Execute query with Flight SQL client                         │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                 Table Function - Scan Phase                     │
│  • Stream Arrow RecordBatches from Flight SQL                   │
│  • Convert Arrow → DuckDB (type mapping, NULL handling)         │
│  • Return data chunks to DuckDB                                 │
└─────────────────────────┬───────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────────┐
│                       DuckDB Result                             │
└────────────────────────────────────────────────────
View on GitHub
GitHub Stars5
CategoryData
Updated3d ago
Forks0

Languages

Go

Security Score

85/100

Audited on Apr 2, 2026

No findings