Reladiff
High-performance diffing of large datasets across databases
Install / Use
/learn @erezsh/ReladiffREADME
<br/> <br/> <span style="font-size:1.3em">Reladiff</span> is a high-performance tool and library designed for diffing large datasets across databases. By executing the diff calculation within the database itself, Reladiff minimizes data transfer and achieves optimal performance.
This tool is specifically tailored for data professionals, DevOps engineers, and system administrators.
Reladiff is free, open-source, user-friendly, extensively tested, and delivers fast results, even at massive scale.
Key Features:
-
Cross-Database Diff: Reladiff employs a divide-and-conquer algorithm, based on matching hashes, to efficiently identify modified segments and download only the necessary data for comparison. This approach ensures exceptional performance when differences are minimal.
-
⇄ Diffs across over a dozen different databases (e.g. PostgreSQL -> Snowflake) !
-
🧠 Gracefully handles reduced precision (e.g., timestamp(9) -> timestamp(3)) by rounding according to the database specification.
-
🔥 Benchmarked to diff over 25M rows in under 10 seconds and over 1B rows in approximately 5 minutes, given no differences.
-
♾️ Capable of handling tables with tens of billions of rows.
-
-
Intra-Database Diff: When both tables reside in the same database, Reladiff compares them using a join operation, with additional optimizations for enhanced speed.
- Supports materializing the diff into a local table.
- Can collect various extra statistics about the tables.
-
Threaded: Utilizes multiple threads to significantly boost performance during diffing operations.
-
Configurable: Offers numerous options for power-users to customize and optimize their usage.
-
Automation-Friendly: Outputs both JSON and git-like diffs (with + and -), facilitating easy integration into CI/CD pipelines.
-
Over a dozen databases supported. MySQL, Postgres, Snowflake, Bigquery, Oracle, Clickhouse, and more. See full list
Reladiff is a fork of an archived project called data-diff.
Get Started
🗎 Read the Documentation - our detailed documentation has everything you need to start diffing.
Quickstart
For the impatient ;)
Install
Reladiff is available on PyPI. You may install it by running:
pip install reladiff
Requires Python 3.8+ with pip.
We advise to install it within a virtual-env.
How to Use
Once you've installed Reladiff, you can run it from the command-line:
# Cross-DB diff, using hashes
reladiff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
When both tables belong to the same database, a shorter syntax is available:
# Same-DB diff, using outer join
reladiff DB1_URI TABLE1_NAME TABLE2_NAME [OPTIONS]
Or, you can import and run it from Python:
from reladiff import connect_to_table, diff_tables
table1 = connect_to_table("postgresql:///", "table_name", "id")
table2 = connect_to_table("mysql:///", "table_name", "id")
sign: Literal['+' | '-']
row: tuple[str, ...]
for sign, row in diff_tables(table1, table2):
print(sign, row)
Read our detailed instructions:
"Real-world" example: Diff "events" table between Postgres and Snowflake
reladiff \
postgresql:/// \
events \
"snowflake://<username>:<password>@<host>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \
events \
-k event_id \ # Identifier of event
-c event_data \ # Extra column to compare
-w "event_time < '2024-10-10'" # Filter the rows on both dbs
"Real-world" example: Diff "events" and "old_events" tables in the same Postgres DB
Materializes the results into a new table, containing the current timestamp in its name.
reladiff \
postgresql:/// events old_events \
-k org_id \
-c created_at -c is_internal \
-w "org_id != 1 and org_id < 2000" \
-m test_results_%t \
--materialize-all-rows \
--table-write-limit 10000
Technical Explanation
This technical explanation provides a quick overview of how our cross-database diffing works.
For an in-depth explanation, this blog post details all the major technical choices we made in our implementation.
We're here to help!
-
Confused? Got a cool idea? Just want to share your thoughts? Let's discuss it in GitHub Discussions.
-
Did you encounter a bug? Open an issue.
How to Contribute
- Please read the contributing guidelines to get started.
- Feel free to open a new issue or work on an existing one.
Big thanks to everyone who contributed so far:
<a href="https://github.com/erezsh/reladiff/graphs/contributors"> <img src="https://contributors-img.web.app/image?repo=erezsh/reladiff" /> </a>License
This project is licensed under the terms of the MIT License.
Related Skills
oracle
338.0kBest practices for using the oracle CLI (prompt + file bundling, engines, sessions, and file attachment patterns).
prose
338.0kOpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.
claude-opus-4-5-migration
83.4kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
Command Development
83.4kThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
