Minidb

MiniDB is a high-performance analytical database system built on Lakehouse architecture principles, combining the flexibility of data lakes with the performance and reliability of data warehouses.

Generate Convert Improve

Install / Use

/learn @yyun543/Minidb

About this skill

Quality Score

0/100

README

MiniDB

High-performance Lakehouse Database Engine · Built on Apache Arrow and Parquet

English | 中文 | Quick Start | Documentation | Architecture

</div>

📖 Project Overview

MiniDB is a production-grade Lakehouse database engine that implements 72% of the core capabilities from the Delta Lake paper (PVLDB 2020), and achieves a 1000x write amplification improvement for UPDATE/DELETE operations beyond what's described in the paper. The project is written in Go, built on the Apache Arrow vectorized execution engine and Parquet columnar storage, providing complete ACID transaction guarantees.

🌟 Core Features

✅ Full ACID Transactions - Atomicity/Consistency/Isolation/Durability guarantees based on Delta Log
⚡ Vectorized Execution - Apache Arrow batch processing delivers 10-100x acceleration for analytical queries
🔄 Merge-on-Read - Innovative MoR architecture reduces UPDATE/DELETE write amplification by 1000x
📊 Intelligent Optimization - Z-Order multidimensional clustering, predicate pushdown, automatic compaction
🕐 Time Travel - Complete version control and snapshot isolation, supporting historical data queries
🔍 System Tables Bootstrap - Innovative SQL-queryable metadata system (sys.*)
🎯 Dual Concurrency Control - Pessimistic + optimistic locks available, suitable for different deployment scenarios

📊 Performance Metrics

| Scenario | Performance Improvement | Description | |------|---------|------| | Vectorized Aggregation | 10-100x | GROUP BY + aggregation functions vs row-based execution | | Predicate Pushdown | 2-10x | Data skipping based on Min/Max statistics | | Z-Order Queries | 50-90% | File skip rate for multidimensional queries | | UPDATE Write Amplification | 1/1000 | MoR vs traditional Copy-on-Write | | Checkpoint Recovery | 10x | vs scanning all logs from the beginning |

🚀 Quick Start

System Requirements

Go 1.21+
Operating System: Linux/macOS/Windows
Memory: ≥4GB (8GB+ recommended)
Disk: ≥10GB available space

10-Second Installation

# Clone repository
git clone https://github.com/yyun543/minidb.git
cd minidb

# Install dependencies
go mod download

# Build binary
go build -o minidb ./cmd/server

# Start server
./minidb

The server will start on localhost:7205.

First Query

# Connect to MiniDB
nc localhost 7205

# Or use telnet
telnet localhost 7205

-- Create database and table
CREATE DATABASE ecommerce;
USE ecommerce;

CREATE TABLE products (
    id INT,
    name VARCHAR,
    price INT,
    category VARCHAR
);

-- Insert data
INSERT INTO products VALUES (1, 'Laptop', 999, 'Electronics');
INSERT INTO products VALUES (2, 'Mouse', 29, 'Electronics');
INSERT INTO products VALUES (3, 'Desk', 299, 'Furniture');

-- Vectorized analytical query
SELECT category, COUNT(*) as count, AVG(price) as avg_price
FROM products
GROUP BY category
HAVING count > 0
ORDER BY avg_price DESC;

-- Query transaction history (system table bootstrap feature)
SELECT version, operation, table_id, file_path
FROM sys.delta_log
ORDER BY version DESC
LIMIT 10;

📚 Core Architecture

Lakehouse Three-Layer Architecture

┌─────────────────────────────────────────────────────┐
│           SQL Layer (ANTLR4 Parser)                 │
│   DDL/DML/DQL · WHERE/JOIN/GROUP BY/ORDER BY        │
└─────────────────────────────────────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────┐
│        Execution Layer (Dual Engines)               │
│                                                     │
│  ┌─────────────────┐    ┌──────────────────────┐    │
│  │ Vectorized      │    │ Regular Executor     │    │
│  │ Executor        │    │ (Fallback)           │    │
│  │ (Arrow Batch)   │    │                      │    │
│  └─────────────────┘    └──────────────────────┘    │
│                                                     │
│         Cost-Based Optimizer (Statistics)           │
└─────────────────────────────────────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────┐
│         Storage Layer (Lakehouse)                   │
│                                                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────┐   │
│  │ Delta Log    │  │ Parquet      │  │ Object   │   │
│  │ Manager      │  │ Engine       │  │ Store    │   │
│  │ (ACID)       │  │ (Columnar)   │  │ (Local)  │   │
│  └──────────────┘  └──────────────┘  └──────────┘   │
│                                                     │
│  Features: MoR · Z-Order · Compaction · Pushdown    │
└─────────────────────────────────────────────────────┘

Delta Log Transaction Model

MiniDB implements two concurrency control mechanisms:

1. Pessimistic Lock Mode (Default)

type DeltaLog struct {
    entries    []LogEntry
    mu         sync.RWMutex  // Global read-write lock
    currentVer atomic.Int64
}

Use Case: Single-instance deployment, high-throughput writes
Advantages: Simple implementation, zero conflicts
Disadvantages: Doesn't support multi-client concurrency

2. Optimistic Lock Mode (Optional)

type OptimisticDeltaLog struct {
    conditionalStore ConditionalObjectStore
}

// Atomic operation: PUT if not exists
func (s *Store) PutIfNotExists(path string, data []byte) error

Use Case: Multi-client concurrency, cloud object storage
Advantages: High concurrency, no global locks
Disadvantages: Requires retry on conflict (default max 5 attempts)

Selecting Concurrency Mode:

// Enable optimistic locking
engine, _ := storage.NewParquetEngine(
    basePath,
    storage.WithOptimisticLock(true),
    storage.WithMaxRetries(5),
)

Storage File Structure

minidb_data/
├── sys/                          # System database
│   └── delta_log/
│       └── data/
│           └── *.parquet         # Transaction log persistence
│
├── ecommerce/                    # User database
│   ├── products/
│   │   └── data/
│   │       ├── products_xxx.parquet      # Base data files
│   │       ├── products_xxx_delta.parquet # Delta files (MoR)
│   │       └── zorder_xxx.parquet        # Z-Order optimized files
│   │
│   └── orders/
│       └── data/
│           └── *.parquet
│
└── logs/
    └── minidb.log               # Structured logs

💡 Core Features Explained

1. ACID Transaction Guarantees

MiniDB implements complete ACID properties through Delta Log:

-- Atomicity: Multi-row inserts either all succeed or all fail
BEGIN TRANSACTION;
INSERT INTO orders VALUES (1, 100, '2024-01-01');
INSERT INTO orders VALUES (2, 200, '2024-01-02');
COMMIT;  -- Atomic commit to Delta Log

-- Consistency: Constraint checking
CREATE UNIQUE INDEX idx_id ON products (id);
INSERT INTO products VALUES (1, 'Item1', 100);
INSERT INTO products VALUES (1, 'Item2', 200);  -- Violates unique constraint, rejected

-- Isolation: Snapshot isolation
-- Session 1: Reading snapshot version=10
-- Session 2: Concurrently writing to create version=11
-- Session 1 still reads consistent version=10 data

-- Durability: fsync guarantee
-- Data is immediately persisted to Parquet files
INSERT INTO products VALUES (3, 'Item3', 150);
-- After server crash and restart, data still exists

Test Coverage: test/delta_acid_test.go - 6 ACID scenario tests ✅ 100% passing

2. Merge-on-Read (MoR) Architecture

Traditional Copy-on-Write Problem:

UPDATE products SET price=1099 WHERE id=1;

Traditional approach:
1. Read 100MB Parquet file
2. Modify 1 row
3. Rewrite the entire 100MB file  ❌ 100MB write amplification

MiniDB MoR approach:
1. Write 1KB Delta file     ✅ Only 1KB written
2. Merge at read time

MoR Implementation Principle:

Product table query flow:
┌──────────────┐
│ Base Files   │  ← Base data (immutable)
│ 100MB        │
└──────────────┘
       +
┌──────────────┐
│ Delta Files  │  ← UPDATE/DELETE increments
│ 1KB          │
└──────────────┘
       ↓
   Read-Time
    Merge
       ↓
┌──────────────┐
│ Merged View  │  ← Latest data as seen by users
└──────────────┘

Code Example:

// internal/storage/merge_on_read.go
type MergeOnReadEngine struct {
    baseFiles  []ParquetFile   // Base files
    deltaFiles []DeltaFile     // Delta files
}

func (m *MergeOnReadEngine) Read() []Record {
    // 1. Read base files
    baseRecords := readBaseFiles(m.baseFiles)

    // 2. Apply delta updates
    for _, delta := range m.deltaFiles {
        baseRecords = applyDelta(baseRecords, delta)
    }

    return baseRecords
}

Performance Comparison: | Operation | Copy-on-Write | Merge-on-Read | Improvement Factor | |------|---------------|---------------|----------| | UPDATE 1 row (100MB file) | 100MB written | 1KB written | 100,000x | | DELETE 10 rows (1GB file) | 1GB rewritten | 10KB written | 100,000x | | Read latency | 0ms | 1-5ms | Slightly increased |

Test Coverage: test/merge_on_read_test.go - 3 MoR scenario tests ✅

3. Z-Order Multidimensional Clustering

Problem: Network security log query scenario

-- Scenario 1: Query by source IP
SELECT * FROM network_logs WHERE source_ip = '192.168.1.100';

-- Scenario 2: Query by destination IP
SELECT * FROM network_logs WHERE dest_ip = '10.0.0.50';

-- Scenario 3: Query by time
SELECT * FROM network_logs WHERE timestamp > '2024-01-01';

Traditional Single-Dimension Sorting:

Related Skills

feishu-drive

339.3k

things-mac

339.3k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

339.3k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

yu-ai-agent

2.0k

编程导航 2025 年 AI 开发实战新项目，基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus，覆盖 AI 大模型接入、Spring AI 核心特性、Prompt 工程和优化、RAG 检索增强、向量数据库、Tool Calling 工具调用、MCP 模型上下文协议、AI Agent 开发（Manas Java 实现）、Cursor AI 工具等核心知识。用一套教程将程序员必知必会的 AI 技术一网打尽，帮你成为 AI 时代企业的香饽饽，给你的简历和求职大幅增加竞争力。

yyun543

View profile

View on GitHub

GitHub Stars77

CategoryData

Updated25d ago

Forks8

yyun543/minidb

Languages

Security Score

80/100

Audited on Mar 2, 2026

No findings