Olake
OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain Parquet. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supported sources : Postgres, MongoDB, MySQL, Oracle, MSSql, DB2, Kafka, S3.
Install / Use
/learn @datazip-inc/OlakeREADME
OLake — Super-fast Sync to Apache Iceberg
OLake supports replication from transactional databases such as PostgreSQL, MySQL, MongoDB, Oracle, DB2, and MSSQL, event-streaming systems like Apache Kafka and Object-store like S3, into open data lakehouse formats such as Apache Iceberg or Plain Parquet — delivering blazing-fast performance with minimal infrastructure cost.
<h1 align="center" style="border-bottom: none"> <a href="https://datazip.io/olake" target="_blank"> <img width="3840" height="1920" alt="image" src="https://github.com/user-attachments/assets/e59edc8c-38b6-4d59-ac79-63bf4e0b3a1e" /> </a> </h1>🚀 Why OLake?
- 🧠 Smart sync: Full + CDC replication with automatic schema discovery & schema evolution
- ⚡ High throughput: 580K RPS (Postgres) & 338K RPS (MySQL)
- ➡️ Exactly once delivery & Arrow writes: Accuracy with speed.
- 💾 Iceberg-native: Supports Glue, Hive, JDBC, REST catalogs
- 🖥️ Self-serve UI: Deploy via Docker Compose and sync in minutes
- 💸 Infra-light: No Spark, no Flink, no Kafka, no Debezium
- 🗜️ Iceberg Table Optimization (Coming soon): Compaction tailored for CDC ingestion
📊 Benchmarks & possible connections
Full Load
| Source → Destination | Full Load | Relative Performance (Full Load) | Full Report | |----------------------|-----------------|--------------------------------------|--------------------------------------------------------------| | Postgres → Iceberg | 5,80,113 RPS | 12.5× faster than Fivetran | Full Report | | MySQL → Iceberg | 3,38,005 RPS | 2.83× faster than Fivetran | Full Report | | MongoDB → Iceberg | 37,879 RPS | - | Full Report | | Oracle → Iceberg | 5,26,337 RPS | - | Full Report | | Kafka → Iceberg | 1,54,320 RPS (Bounded Incremental) | 1.8x faster than Flink | Full Report |
CDC
| Source → Destination | CDC | Relative Performance (CDC) | Full Report | |----------------------|-----------------|--------------------------------------|--------------------------------------------------------------| | Postgres → Iceberg | 55,555 RPS | 2× faster than Fivetran | Full Report | | MySQL → Iceberg | 51,867 RPS | 1.85× faster than Fivetran | Full Report | | MongoDB → Iceberg | 10,692 RPS | - | Full Report | | Oracle → Iceberg | - | - | Full Report |
*These are preliminary results. Fully reproducible benchmark scores will be published soon.
🔧 Supported Sources and Destinations
Sources (Databases and S3)
| Source | Full Load | CDC | Incremental | Notes | Documentation |
|---------------|--------------|---------------|-------------------|-----------------------------|-----------------------------|
| PostgreSQL | ✅ | ✅ pgoutput | ✅ |wal2json deprecated |Postgres Docs |
| MySQL | ✅ | ✅ | ✅ | Binlog-based CDC | MySQL Docs |
| MongoDB | ✅ | ✅ | ✅ | Oplog-based CDC |MongoDB Docs |
| Oracle | ✅ | WIP | ✅ | JDBC based Full Load & Incremental | Oracle Docs |
| DB2 | ✅ | - | ✅ | JDBC based Full Load & Incremental | DB2 Docs |
| MSSQL | ✅ | ✅ | ✅ | Full Load, CDC & Incremental | MSSQL Docs |
| S3 | ✅ | - | ✅ | Ingests from Amazon S3 or S3-compatible (MinIO, LocalStack) | S3 Docs |
Sources (Kafka)
| Source | Bounded Incremental | Notes | Documentation | |--------|--------------------|-----------------------------------|---------------| | Kafka | ✅ | Latest offset bounded incremental sync | Kafka Docs |
Destinations
| Destination | Format | Supported Catalogs | |----------------|-----------|---------------------------------------------------------------| | Iceberg | ✅ | Glue, Hive, JDBC, REST (Nessie, Polaris, Unity, Lakekeeper, AWS S3 tables) | | Parquet | ✅ | Filesystem | | Other formats | 🔜 | Planned: Delta Lake, Hudi |
Writer Docs
-
- Catalogs
- Azure ADLS Gen2
- Google Cloud Storage (GCS)
- MinIO (local)
- Iceberg Table Management
-
Parquet Writer
🧪 Quickstart (UI + Docker)
OLake UI is a web-based interface for managing OLake jobs, sources, destinations, and configurations. You can run the entire OLake stack (UI, Backend, and all dependencies) using Docker Compose. This is the recommended way to get started. Run the UI, connect your source DB, and start syncing in minutes.
curl -sSL https://raw.githubusercontent.com/datazip-inc/olake-ui/master/docker-compose.yml | docker compose -f - up -d
Access the UI:
* OLake UI: http://localhost:8000
* Log in with default credentials: admin / password.
Detailed getting started using OLake UI can be found here.
Creating Your First Job
With the UI running, you can create a data pipeline in a few steps:
- Create a Job: Navigate to the Jobs tab and click Create Job.
- Configure Source: Set up your source connection (e.g., PostgreSQL, MySQL, MongoDB).
- Configure Destination: Set up your destination (e.g., Apache Iceberg with a Glue, REST, Hive, or JDBC catalog).
- Select Streams: Choose which tables to sync and configure their sync mode (
CDCorFull Refresh). - Configure & Run: Give your job a name, set a schedule, and click Create Job to finish.
For a detailed walkthrough, refer to the Jobs documentation.
🛠️ CLI Usage (Advanced)
For advanced users and automation, OLake's core logic is exposed via a powerful CLI. The core framework handles state management, configuration validation, logging, and type detection. It interacts with drivers using four main commands:
spec: Re
Related Skills
notion
337.3kNotion API for creating and managing pages, databases, and blocks.
feishu-drive
337.3k|
things-mac
337.3kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
337.3kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
