TimeFusion

<p align="center"> <img src="https://img.shields.io/badge/Built%20with-Rust-dca282?style=for-the-badge&logo=rust" alt="Built with Rust"> <img src="https://img.shields.io/badge/PostgreSQL-Compatible-4169E1?style=for-the-badge&logo=postgresql&logoColor=white" alt="PostgreSQL Compatible"> <img src="https://img.shields.io/badge/Storage-S3-569A31?style=for-the-badge&logo=amazons3&logoColor=white" alt="S3 Storage"> </p> <p align="center"> <a href="#-features">Features</a> • <a href="#-quick-start">Quick Start</a> • <a href="#-architecture">Architecture</a> • <a href="#-configuration">Configuration</a> • <a href="#-usage">Usage</a> • <a href="#-performance">Performance</a> • <a href="#-contributing">Contributing</a> </p>

TimeFusion is a specialized time-series database engineered for high-performance storage and querying of events, logs, traces, and metrics. Built on Apache Arrow and Delta Lake, it speaks the PostgreSQL wire protocol while storing data in your own S3-compatible object storage.

🎯 Why TimeFusion?

Traditional time-series databases force you to choose between performance, cost, and data ownership. TimeFusion eliminates these trade-offs:

You Own Your Data: All data is stored in your S3 bucket - no vendor lock-in
PostgreSQL Compatible: Use any PostgreSQL client, driver, or tool you already know
Blazing Fast: Leverages Apache Arrow's columnar format and intelligent caching
Cost Effective: Pay only for S3 storage and compute - no expensive proprietary storage
Multi-Tenant Ready: Built-in project isolation and partitioning

✨ Features

Core Capabilities

🚀 High-Performance Ingestion: Batch processing with configurable intervals
🔍 Fast Queries: Columnar storage with predicate pushdown and partition pruning
🔒 ACID Compliance: Full transactional guarantees via Delta Lake
📊 Rich Query Support: SQL aggregations, filters, and time-based operations
🌐 Multi-Tenant: First-class support for project isolation
🔄 Auto-Optimization: Background compaction and vacuuming

Storage & Caching

💾 S3-Compatible Storage: Works with AWS S3, MinIO, R2, and more
⚡ Intelligent Caching: Two-tier cache (memory + disk) with configurable TTLs
🗜️ Compression: Zstandard compression with tunable levels
📦 Parquet Format: Industry-standard columnar storage

Operations

🔐 Distributed Locking: DynamoDB-based locking for multi-instance deployments
📈 Connection Limiting: Built-in proxy to prevent overload
🛡️ Graceful Degradation: Continues operating even under extreme load
📝 Comprehensive Logging: Structured logs with tracing support

🚀 Quick Start

Prerequisites

Rust 1.75+ (for building from source)
S3-compatible object storage (AWS S3, MinIO, etc.)
(Optional) DynamoDB table for distributed locking

Installation

Using Docker

docker run -d \
  -p 5432:5432 \
  -e AWS_S3_BUCKET=your-bucket \
  -e AWS_ACCESS_KEY_ID=your-key \
  -e AWS_SECRET_ACCESS_KEY=your-secret \
  timefusion/timefusion:latest

Building from Source

git clone https://github.com/apitoolkit/timefusion.git
cd timefusion
cargo build --release
./target/release/timefusion

Connect with Any PostgreSQL Client

psql "postgresql://postgres:postgres@localhost:5432/postgres"

Insert Data

INSERT INTO otel_logs_and_spans (
  name, id, project_id, timestamp, date, hashes
) VALUES (
  'api.request',
  '550e8400-e29b-41d4-a716-446655440000',
  'prod-api-001',
  '2025-01-17 14:25:00',
  '2025-01-17',
  ARRAY[]::text[]
);

Query Data

-- Get recent logs for a project
SELECT name, id, timestamp
FROM otel_logs_and_spans
WHERE project_id = 'prod-api-001'
  AND timestamp >= '2025-01-17 14:00:00' AND timestamp < '2025-01-17 15:00:00'
ORDER BY timestamp DESC
LIMIT 100;

-- Aggregate by name
SELECT name, COUNT(*) as count
FROM otel_logs_and_spans
WHERE project_id = 'prod-api-001'
  AND date = '2025-01-17'
GROUP BY name
ORDER BY count DESC;

Complete psql Example Session

Here's a real-world example showing TimeFusion's capabilities with API trace data:

$ psql "postgresql://postgres:postgres@localhost:5432/postgres"

psql (16.8 (Homebrew), server 0.28.0)
WARNING: psql major version 16, server major version 0.28.
         Some psql features might not work.
Type "help" for help.

postgres=> -- Insert sample API trace data
postgres=> INSERT INTO otel_logs_and_spans (
    name, id, project_id, timestamp, date, hashes,
    duration, attributes___http___response___status_code,
    attributes___user___id, attributes___error___type, kind
) VALUES
('POST /api/v1/users', '550e8400-e29b-41d4-a716-446655440001', 'prod-api-001', '2025-01-17 14:30:00', '2025-01-17', ARRAY['trace_123'], 245000000, 200, 'u_123', NULL, 'SERVER'),
('POST /api/v1/users', '550e8400-e29b-41d4-a716-446655440002', 'prod-api-001', '2025-01-17 14:35:00', '2025-01-17', ARRAY['trace_124'], 1523000000, 500, NULL, 'database_timeout', 'SERVER'),
('GET /api/v1/users/:id', '550e8400-e29b-41d4-a716-446655440003', 'prod-api-001', '2025-01-17 14:40:00', '2025-01-17', ARRAY['trace_125'], 89000000, 200, 'u_456', NULL, 'SERVER'),
('POST /api/v1/payments', '550e8400-e29b-41d4-a716-446655440004', 'prod-api-001', '2025-01-17 14:45:00', '2025-01-17', ARRAY['trace_126'], 3421000000, 200, NULL, NULL, 'SERVER'),
('GET /api/v1/users/:id', '550e8400-e29b-41d4-a716-446655440005', 'prod-api-001', '2025-01-17 14:50:00', '2025-01-17', ARRAY['trace_127'], 2100000000, 408, NULL, 'timeout', 'SERVER');
INSERT 0 5

postgres=> -- Find slow API endpoints (>1 second response time)
postgres=> SELECT
    name as endpoint,
    COUNT(*) as request_count,
    AVG(duration / 1000000)::INT as avg_duration_ms,
    MAX(duration / 1000000)::INT as max_duration_ms,
    ARRAY_AGG(DISTINCT attributes___http___response___status_code::TEXT) as status_codes
FROM otel_logs_and_spans
WHERE project_id = 'prod-api-001'
    AND timestamp >= '2025-01-17 14:00:00' AND timestamp < '2025-01-17 15:00:00'
    AND duration > 1000000000  -- 1 second in nanoseconds
GROUP BY name
ORDER BY avg_duration_ms DESC;

         endpoint         | request_count | avg_duration_ms | max_duration_ms | status_codes
--------------------------+---------------+-----------------+-----------------+--------------
 POST /api/v1/payments    |             1 |            3421 |            3421 | {200}
 GET /api/v1/users/:id    |             1 |            2100 |            2100 | {408}
 POST /api/v1/users       |             1 |            1523 |            1523 | {500}
(3 rows)

postgres=> -- Analyze error rates by endpoint over time windows
postgres=> SELECT
    name as endpoint,
    date_trunc('hour', timestamp) as hour,
    COUNT(*) as total_requests,
    COUNT(*) FILTER (WHERE attributes___http___response___status_code >= 400) as errors,
    ROUND(100.0 * COUNT(*) FILTER (WHERE attributes___http___response___status_code >= 400) / COUNT(*), 2) as error_rate
FROM otel_logs_and_spans
WHERE project_id = 'prod-api-001'
    AND timestamp >= '2025-01-16 15:00:00' AND timestamp < '2025-01-17 15:00:00'
GROUP BY name, date_trunc('hour', timestamp)
HAVING COUNT(*) > 0
ORDER BY hour DESC, error_rate DESC;

         endpoint         |         hour         | total_requests | errors | error_rate
--------------------------+----------------------+----------------+--------+------------
 POST /api/v1/users       | 2025-01-17 15:00:00 |              2 |      1 |      50.00
 GET /api/v1/users/:id    | 2025-01-17 15:00:00 |              2 |      1 |      50.00
 POST /api/v1/payments    | 2025-01-17 15:00:00 |              1 |      0 |       0.00
(3 rows)

postgres=> -- Find traces with specific characteristics using hash lookups
postgres=> SELECT
    id as trace_id,
    name as endpoint,
    timestamp,
    (duration / 1000000)::INT as duration_ms,
    attributes___error___type as error_type
FROM otel_logs_and_spans
WHERE project_id = 'prod-api-001'
    AND 'trace_124' = ANY(hashes)
    AND timestamp >= '2025-01-17 14:00:00' AND timestamp < '2025-01-17 15:00:00';

               trace_id               |    endpoint     |         timestamp          | duration_ms |     error_type
--------------------------------------+-----------------+----------------------------+-------------+--------------------
 550e8400-e29b-41d4-a716-446655440002 | POST /api/v1/users | 2025-01-17 14:35:00.000000 |        1523 | database_timeout
(1 row)

postgres=> -- Time-series aggregation using TimescaleDB's time_bucket function
postgres=> SELECT
    time_bucket(INTERVAL '5 minutes', timestamp) as bucket,
    COUNT(*) as requests,
    AVG(duration / 1000000)::INT as avg_duration_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration / 1000000) as p95_duration_ms
FROM otel_logs_and_spans
WHERE project_id = 'prod-api-001'
    AND timestamp >= '2025-01-17 14:00:00' AND timestamp < '2025-01-17 15:00:00'
GROUP BY bucket
ORDER BY bucket DESC;

        bucket          | requests | avg_duration_ms | p95_duration_ms
------------------------+----------+-----------------+-----------------
 2025-01-17 14:45:00   |        2 |            2760 |            3421
 2025-01-17 14:40:00   |        1 |              89 |              89
 2025-01-17 14:35:00   |        1 |            1523 |            1523
 2025-01-17 14:30:00   |        1 |             245 |             245
(3 rows)

postgres=> -- Advanced time-series: Moving averages with time_bucket
postgres=> WITH time_series AS (
    SELECT
        time_bucket(INTERVAL '1 minute', timestamp) as minute,
        name as endpoint,
        COUNT(*) as requests,
        AVG(duration / 1000000) as avg_duration_ms
    FROM otel_logs_and_spans
    WHERE project_id = 'prod-api-001'
        AND timestamp >= '2025-01-17 14:30:00' AND timestamp < '2025-01-17 15:00:00'
    GROUP BY minute, endpoint
)
SELECT
    minute,
    endpoint,
    requests,

Timefusion

Install / Use

README