Gigapi
GigAPI is a Timeseries lakehouse for real-time data and sub-second queries, powered by DuckDB OLAP + Parquet Query Engine, Compactor w/ Cloud-Native Storage. Drop-in FDAP alternative ⭐
Install / Use
/learn @gigapi/GigapiREADME
<img src="https://github.com/user-attachments/assets/5b0a4a37-ecab-4ca6-b955-1a2bbccad0b4" />
<img src="https://github.com/user-attachments/assets/74a1fa93-5e7e-476d-93cb-be565eca4a59" height=25 /> GigAPI: The Infinite Timeseries Lakehouse
Like a durable parquet floor, GigAPI provides rock-solid data foundation for your queries and analytics
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=18 /> Problem
Traditional "always-on" OLAP databases such as ClickHouse are fast but expensive to operate, complex to manage and scale, often promoting a cloud product. Data lakes and Lake houses are cheaper but can't always handle real-time ingestion or compaction and querying growing datasets such as timeseries brings back costly operations and complexity. Various "opencore" poison solutions out there.
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=18 /> Solution
GigAPI is a timeseries optimized "lakehouse" designed for realtime data - lots of it - and returning queries as fast as possible. By combining DuckDB's performance, FlightSQL efficiency and Parquet's reliablity with smart metadata we've created a simple, lightweight solution ready to decimate complexity and infrastructure costs for ourselves and others. GigAPI is 100% opensource - no open core or cloud product gimmicks.
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=18 /> GigAPI Features
- Fast: DuckDB SQL + Parquet powered OLAP API Engine
- Flexible: Schema-less Parquet Ingestion & Compaction
- Simple: Low Maintenance, Portable Catalog, Infinitely Scalable
- Smart: Independent storage/write and compute/read components
- Extensible: Built-In Query Engine (DuckDB) or BYODB (ClickHouse, Datafusion, etc)
[!WARNING]
GigAPI is an open beta developed in public. Bugs and changes should be expected. Use at your own risk.
<img src="https://github.com/user-attachments/assets/74a1fa93-5e7e-476d-93cb-be565eca4a59" height=20 /> Usage
Here's the most basic example. For more complex usage samples see the examples directory
services:
gigapi:
image: ghcr.io/gigapi/gigapi:latest
container_name: gigapi
hostname: gigapi
restart: unless-stopped
volumes:
- ./data:/data
ports:
- "7971:7971"
environment:
- GIGAPI_ROOT=/data
- GIGAPI_LAYERS_0_NAME=default
- GIGAPI_LAYERS_0_TYPE=fs
- GIGAPI_LAYERS_0_URL=file:///data
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=18 /> Settings
| Env Var Name | Description | Default Value |
|----------------------------|---------------------------------------------------------------------|---------------|
| GIGAPI_ROOT | Root folder for all the data files | |
| GIGAPI_MERGE_TIMEOUT_S | Base timeout between merges (in seconds) | 10 |
| GIGAPI_SAVE_TIMEOUT_S | Timeout before saving the new data to the disk (in seconds) | 1 |
| GIGAPI_NO_MERGES | Disable merging | false |
| GIGAPI_UI | Enable UI for querier | true |
| GIGAPI_MODE | Execution mode (readonly, writeonly, compaction, aio) | "aio" |
| GIGAPI_METADATA_TYPE | Metadata Type (json for local, redis for distributed) | "json" |
| GIGAPI_METADATA_URL | Metadata Type URL for redis (ie: redis://redis:6379/0 | |
| HTTP_PORT | Port to listen on for HTTP server | 7971 |
| HTTP_HOST | Host to bind to for HTTP server | "0.0.0.0" |
| HTTP_BASIC_AUTH_USERNAME | Username for HTTP basic authentication | |
| HTTP_BASIC_AUTH_PASSWORD | Password for HTTP basic authentication | |
| FLIGHTSQL_PORT | Port to run FlightSQL server | 8082 |
| FLIGHTSQL_ENABLE | Enable FlightSQL server | true |
| LOGLEVEL | Log level (debug, info, warn, error, fatal) | "info" |
| DUCKDB_MEM_LIMIT | DuckDB memory limit (e.g. 1GB) | "1GB" |
| DUCKDB_THREAD_LIMIT | DuckDB thread limit (int) | 1 |
| GIGAPI_LAYER_X_NAME | X - layer index from 0. Layer unique name. | |
| GIGAPI_LAYER_X_TYPE | fs for file system, s3 for s3 | |
| GIGAPI_LAYER_X_GLOBAL | true if all the cluster has an access to the layer | |
| GIGAPI_LAYER_X_URL | path or url to s3 | |
| GIGAPI_LAYER_X_TTL | timeout before send data to the next layer or drop it 0 for no drop | 0 |
<br>You can override the defaults by setting these environment variables before starting the service.
<img src="https://github.com/user-attachments/assets/74a1fa93-5e7e-476d-93cb-be565eca4a59" height=20 /> Write Support
As write requests come in to GigAPI they are parsed and progressively appeanded to parquet files alongside their metadata. The ingestion buffer is flushed to disk at configurable intervals using a hive partitioning schema. Generated parquet files and their respective metadata are progressively compacted and sorted over time based on configuration parameters.
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=18 /> API
GigAPI provides an HTTP API for clients to write, currently supporting the InfluxDB Line Protocol format
cat <<EOF | curl -X POST "http://localhost:7971/write?db=mydb" --data-binary @/dev/stdin
weather,location=us-midwest,season=summer temperature=82
weather,location=us-east,season=summer temperature=80
weather,location=us-west,season=summer temperature=99
EOF
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=18 /> FlightSQL
[!NOTE] FlightSQL ingestion is coming soon!
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=18 /> Data Schema
GigAPI is a schema-on-write database managing databases, tables and schemas on the fly. New columns can be added or removed over time, leaving reconciliation up to readers.
/data
/mydb
/weather
/date=2025-04-10
/hour=14
*.parquet
metadata.json
/hour=15
*.parquet
metadata.json
GigAPI managed parquet files use the following naming schema:
{UUID}.{LEVEL}.parquet
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=18 /> Parquet Compactor
GigAPI files are progressively compacted based on the following logic (subject to future changes)
| Merge Level | Source | Target | Frequency | Max Size |
|---------------|--------|--------|------------------------|----------|
| Level 1 -> 2 | .1 | .2 | MERGE_TIMEOUT_S = 10 | 100 MB |
| Level 2 -> 3 | .2 | .3 | MERGE_TIMEOUT_S * 10 | 400 MB |
| Level 3 -> 4 | .3 | .3 | MERGE_TIMEOUT_S * 10 * 10 | 4 GB |
<img src="https://github.com/user-attachments/assets/74a1fa93-5e7e-476d-93cb-be565eca4a59" height=20 /> Read Support
As read requests come in to GigAPI they are parsed and transpiled using the GigAPI Metadata catalog to resolve data location based on database, table and timerange in requests. Series can be used with or without time ranges, ie for calculating averages, etc.
Query Data
$ curl -X POST "http://localhost:7972/query?db=mydb" \
-H "Content-Type: application/json" \
-d {"query": "SELECT time, temperature FROM weather WHERE time >= epoch_ns('2025-04-24T00:00:00'::TIMESTAMP)"}
Series can be used with or without time ranges, ie for counting, calculating averages, etc.
$ curl -X POST "http://localhost:7972/query?db=mydb" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT count(*), avg(temperature) FROM weather"}'
{"results":[{"avg(temperature)":87.025,"count_star()":"40"}]}
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=24 /> FlightSQL
GigAPI data can be accessed using FlightSQL GRPC clients in any language
from flightsql import connect, FlightSQLClient
client = FlightSQLClient(host='localhost',port=8082,insecure=True,metadata={'bucket':'hep'})
conn = connect(client)
cursor = conn.cursor()
cursor.execute('SELECT count(*), avg(temperature) FROM weather')
print("rows:", [r for r in cursor])
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=24 /> GigAPI UI
The embedded GigAPI UI can be used to explore and query data using SQL with advanced features
<img src="https://github.com/user-attachments/assets/a9aa3ebd-9164-476d-aedf-97b817078350" width=24 /> Grafana
GigAPI can be used from Grafana using the InfluxDB3 Flight GRPC Datasource
GigAPI readers can be imple
Related Skills
gh-issues
337.7kFetch GitHub issues, spawn sub-agents to implement fixes and open PRs, then monitor and address PR review comments. Usage: /gh-issues [owner/repo] [--label bug] [--limit 5] [--milestone v1.0] [--assignee @me] [--fork user/repo] [--watch] [--interval 5] [--reviews-only] [--cron] [--dry-run] [--model glm-5] [--notify-channel -1002381931352]
oracle
337.7kBest practices for using the oracle CLI (prompt + file bundling, engines, sessions, and file attachment patterns).
tmux
337.7kRemote-control tmux sessions for interactive CLIs by sending keystrokes and scraping pane output.
xurl
337.7kA CLI tool for making authenticated requests to the X (Twitter) API. Use this skill when you need to post tweets, reply, quote, search, read posts, manage followers, send DMs, upload media, or interact with any X API v2 endpoint.
