Delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Generate Convert Improve

Install / Use

/learn @delta-io/Delta

About this skill

Quality Score

0/100

README

Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.

See the Delta Lake Documentation for details.
See the Quick Start Guide to get started with Scala, Java and Python.
Note, this repo is one of many Delta Lake repositories in the delta.io organizations including delta, delta-rs, delta-sharing, kafka-delta-ingest, and website.

The following are some of the more popular Delta Lake integrations, refer to delta.io/integrations for the complete list:

Apache Spark™: This connector allows Apache Spark™ to read from and write to Delta Lake.
Apache Flink (Preview): This connector allows Apache Flink to write to Delta Lake.
PrestoDB: This connector allows PrestoDB to read from Delta Lake.
Trino: This connector allows Trino to read from and write to Delta Lake.
Delta Standalone: This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
Apache Hive: This connector allows Apache Hive to read from Delta Lake.
Delta Rust API: This library allows Rust (with Python and Ruby bindings) low level access to Delta tables and is intended to be used with data processing frameworks like datafusion, ballista, rust-dataframe, vega, etc.

<details> <summary>Table of Contents</summary>

Latest binaries
API Documentation
Compatibility
- API Compatibility
- Data Storage Compatibility
Roadmap
Building
Transaction Protocol
Requirements for Underlying Storage Systems
Concurrency Control
Reporting issues
Contributing
License
Community

</details>

Latest Binaries

See the online documentation for the latest release.

API Documentation

Compatibility

Delta Standalone library is a single-node Java library that can be used to read from and write to Delta tables. Specifically, this library provides APIs to interact with a table’s metadata in the transaction log, implementing the Delta Transaction Log Protocol to achieve the transactional guarantees of the Delta Lake format.

API Compatibility

There are two types of APIs provided by the Delta Lake project.

Direct Java/Scala/Python APIs - The classes and methods documented in the API docs are considered as stable public APIs. All other classes, interfaces, methods that may be directly accessible in code are considered internal, and they are subject to change across releases.
Spark-based APIs - You can read Delta tables through the DataFrameReader/Writer (i.e. spark.read, df.write, spark.readStream and df.writeStream). Options to these APIs will remain stable within a major release of Delta Lake (e.g., 1.x.x).
See the online documentation for the releases and their compatibility with Apache Spark versions.

Data Storage Compatibility

Delta Lake guarantees backward compatibility for all Delta Lake tables (i.e., newer versions of Delta Lake will always be able to read tables written by older versions of Delta Lake). However, we reserve the right to break forward compatibility as new features are introduced to the transaction protocol (i.e., an older version of Delta Lake may not be able to read a table produced by a newer version).

Breaking changes in the protocol are indicated by incrementing the minimum reader/writer version in the Protocol action.

Roadmap

For the high-level Delta Lake roadmap, see Delta Lake 2022H1 roadmap.
For the detailed timeline, see the project roadmap.

Transaction Protocol

Delta Transaction Log Protocol document provides a specification of the transaction protocol.

Requirements for Underlying Storage Systems

Delta Lake ACID guarantees are predicated on the atomicity and durability guarantees of the storage system. Specifically, we require the storage system to provide the following.

Atomic visibility: There must be a way for a file to be visible in its entirety or not visible at all.
Mutual exclusion: Only one writer must be able to create (or rename) a file at the final destination.
Consistent listing: Once a file has been written in a directory, all future listings for that directory must return that file.

See the online documentation on Storage Configuration for details.

Concurrency Control

Delta Lake ensures serializability for concurrent reads and writes. Please see Delta Lake Concurrency Control for more details.

Reporting issues

We use GitHub Issues to track community reported issues. You can also contact the community for getting answers.

Contributing

We welcome contributions to Delta Lake. See our CONTRIBUTING.md for more details.

We also adhere to the Delta Lake Code of Conduct.

Building

Delta Lake is compiled using SBT. Ensure that your Java version is at least 17 (you can verify with java -version).

To compile, run

build/sbt compile

To generate artifacts, run

build/sbt package

To execute tests, run

build/sbt test

To execute a single test suite, run

build/sbt spark/'testOnly org.apache.spark.sql.delta.optimize.OptimizeCompactionSQLSuite'

To execute a single test within and a single test suite, run

build/sbt spark/'testOnly *.OptimizeCompactionSQLSuite -- -z "optimize command: on partitioned table - all partitions"'

Refer to SBT docs for more commands.

Running python tests locally

Setup Environment

Install Conda (Skip if you already installed it)

Follow Conda Download to install Anaconda.

Create an environment from environment file

Follow Create Environment From Environment file to create a Conda environment from <repo-root>/python/environment.yml and activate the newly created delta_python_tests environment.

# Note the `--file` argument should be a fully qualified path. Using `~` in file
# path doesn't work. Example valid path: `/Users/macuser/delta/python/environment.yml`

conda env create --name delta_python_tests --file=<absolute_path_to_delta_repo>/python/environment.yml`

JDK Setup

Build needs JDK 11. Make sure to setup JAVA_HOME that points to JDK 11.

Running tests

conda activate delta_python_tests
python3 <delta-root>/python/run-tests.py

IntelliJ Setup

IntelliJ is the recommended IDE to use when developing Delta Lake. To import Delta Lake as a new project:

Clone Delta Lake into, for example, ~/delta.
In IntelliJ, select File > New Project > Project from Existing Sources... and select ~/delta.
Under Import project from external model select sbt. Click Next.
Under Project JDK specify a valid Java 11 JDK and opt to use SBT shell for project reload and builds.
Click Finish.
In your terminal, run build/sbt clean package. Make sure you use Java 11. The build will generate files that are necessary for Intellij to index the repository.

Setup Verification

After waiting for IntelliJ to index, verify your setup by running a test suite in IntelliJ.

Search for and open DeltaLogSuite
Next t

Related Skills

feishu-drive

335.8k

things-mac

335.8k

Manage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)

clawhub

335.8k

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com

yu-ai-agent

1.9k

编程导航 2025 年 AI 开发实战新项目，基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus，覆盖 AI 大模型接入、Spring AI 核心特性、Prompt 工程和优化、RAG 检索增强、向量数据库、Tool Calling 工具调用、MCP 模型上下文协议、AI Agent 开发（Manas Java 实现）、Cursor AI 工具等核心知识。用一套教程将程序员必知必会的 AI 技术一网打尽，帮你成为 AI 时代企业的香饽饽，给你的简历和求职大幅增加竞争力。

delta-io

View profile

View on GitHub

GitHub Stars8.6k

CategoryData

Updated11h ago

Forks2.0k

delta-io/delta

Languages

Scala

Security Score

100/100

Audited on Mar 25, 2026

No findings