Icerunner
IceRunner is an Apache Arrow Flight Server Implementation for Apache Iceberg Tables
Install / Use
/learn @TFMV/IcerunnerREADME
IceRunner
IceRunner is a PyArrow Flight server implementation for Apache Iceberg tables. It provides a seamless way to read and write data to Iceberg tables using PyArrow Flight protocol.
Features
- Flight Server: Exposes Iceberg tables through a PyArrow Flight interface
- Concurrent Access: Supports multiple readers and writers simultaneously
- DuckDB Integration: Uses DuckDB for efficient SQL queries on Iceberg tables
- Strong Typing: Full type support between PyArrow and Iceberg schemas
- Simple CLI: Easy-to-use command-line interface for server, reader, and writer operations
- Flight-to-Flight Replication: Sync data between Flight servers across regions or data platforms
Installation
Prerequisites
- Python 3.8+
- PyArrow 10.0.0+
- PyIceberg
- DuckDB
Install Dependencies
pip install pyarrow pyiceberg duckdb
Usage
IceRunner can be run in four modes:
1. Server Mode
Starts a Flight server that exposes Iceberg tables:
python -m icerunner serve -w /path/to/warehouse -p 8816 -n my_table
2. Reader Mode
Starts a client that continuously reads data from the server:
python -m icerunner read -p 8816 -n my_table -i 2
3. Writer Mode
Starts a client that continuously writes data to the server:
python -m icerunner write -p 8816 -n my_table -i 3
4. Mirror Mode
Starts a client that replicates data from a remote Flight server to a local Iceberg table:
python -m icerunner mirror -s grpc://remotehost:8815/source_table -n target_table -i 60
For a one-time sync instead of continuous replication:
python -m icerunner mirror -s grpc://remotehost:8815/source_table -n target_table --one-time
Command Line Options
-w, --warehouse-path: Path to the warehouse directory (default: "warehouse")-p, --port: Port for the Flight server (default: 8816)-n, --table-name: Name of the table to read/write (default: "icerunner_test")-i, --interval: Interval in seconds between operations (default: 1)-s, --source: Source Flight server URL for mirror mode (e.g., grpc://hostname:port/table)-b, --batch-size: Number of rows to process in each batch during mirroring (default: 1000)--one-time: Perform a one-time sync rather than continuous (for mirror mode)
Architecture
IceRunner consists of these main components:
-
IceRunnerConnector: Manages the connection to Iceberg tables, handling schema conversion, table creation, and data operations.
-
IceRunnerFlightServer: Implements the PyArrow Flight interface, exposing endpoints for reading and writing data.
-
Client Utilities: Provides utilities for reading from and writing to the Flight server.
-
Mirror Service: Enables flight-to-flight replication between Flight servers and Iceberg tables.
Architecture Diagram
graph TD
subgraph "Client Layer"
Writer[Writer Client]
Reader[Reader Client]
Mirror[Mirror Client]
end
subgraph "Server Layer"
FS[Flight Server]
RFS[Remote Flight Server]
style FS fill:#b3e0ff,stroke:#0066cc,stroke-width:2px
style RFS fill:#b3e0ff,stroke:#0066cc,stroke-width:2px
end
subgraph "Data Layer"
IRC[IceRunnerConnector]
style IRC fill:#d9f2d9,stroke:#339933,stroke-width:2px
DDB[(DuckDB)]
style DDB fill:#ffe6cc,stroke:#ff9933,stroke-width:2px
IB[(Iceberg Tables)]
style IB fill:#e6ccff,stroke:#8000ff,stroke-width:2px
end
Writer -->|do_put| FS
Reader -->|do_get| FS
Mirror -->|do_get| RFS
Mirror -->|do_put| FS
FS <-->|query/insert| IRC
IRC <-->|SQL queries| DDB
DDB <-->|read/write| IB
classDef client fill:#ffcccc,stroke:#ff6666,stroke-width:2px
class Writer,Reader,Mirror client
Data Flow
Standard Operation
Writer → PyArrow Flight Client → Flight Server → IceRunnerConnector → Iceberg Table
Reader ← PyArrow Flight Client ← Flight Server ← IceRunnerConnector ← Iceberg Table
Flight-to-Flight Replication
Remote Flight Server → Mirror Client → Local Flight Server → IceRunnerConnector → Iceberg Table
Flight-to-Flight Replication
The mirror mode enables replication between Flight servers, allowing you to:
- Sync data between regions
- Migrate data between different storage systems
- Create local copies of remote datasets
- Build cross-platform data integration pipelines
Features
- Batch Processing: Efficiently processes data in configurable batch sizes
- Continuous or One-time: Run as a continuous sync service or one-time migration
- Automatic Table Discovery: Can discover source tables if not explicitly specified
- Schema Compatibility: Handles schema translation between Flight and Iceberg formats
- Robust Error Handling: Continues operation even when errors occur
- Progress Tracking: Logs detailed progress and statistics
Use Cases
- Multi-Region Replication: Keep data synchronized across geographical regions
- Cloud Migration: Migrate data from on-premises to cloud storage
- Data Integration: Connect disparate data platforms
- Disaster Recovery: Maintain backup copies of critical datasets
- Edge-to-Cloud Sync: Replicate data from edge servers to central cloud storage
Type Conversion
IceRunner automatically converts between PyArrow and Iceberg types:
| PyArrow Type | Iceberg Type | |--------------|--------------| | Int64 | LongType | | String | StringType | | Timestamp | TimestampType| | Boolean | BooleanType | | Float64 | DoubleType | | Float32 | FloatType |
Example
# Start the server in one terminal
python -m icerunner serve
# In another terminal, start a writer
python -m icerunner write
# In a third terminal, start a reader
python -m icerunner read
# In a fourth terminal, mirror from another Flight server
python -m icerunner mirror -s grpc://otherserver:8815/remote_table
License
This project is licensed under the MIT License - see the LICENSE file for details.
Related Skills
node-connect
349.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
