Note: This project is maintained by the community after Datafold sunset the project in May 2024.
data-diff is an open-source CLI and Python library for efficiently comparing data across 13+ database engines. It uses bisection and checksumming to find differing rows without transferring entire tables, making it fast even on tables with millions of rows.
pip install data-diffInstall with database-specific extras:
pip install 'data-diff[postgresql,mysql]'data-diff \
postgresql://user:password@localhost/db1 table1 \
postgresql://user:password@localhost/db2 table2 \
--key-columns id \
--columns name,email,updated_atimport data_diff
diff = data_diff.diff_tables(
table1=data_diff.connect_to_table("postgresql://localhost/db1", "table1", "id"),
table2=data_diff.connect_to_table("postgresql://localhost/db2", "table2", "id"),
)
for sign, row in diff:
print(sign, row) # '+' for added, '-' for removed| Database | Status |
|---|---|
| PostgreSQL | Supported |
| MySQL | Supported |
| Snowflake | Supported |
| BigQuery | Supported |
| Databricks | Supported |
| Redshift | Supported |
| DuckDB | Supported |
| Presto | Supported |
| Trino | Supported |
| Oracle | Supported |
| MS SQL | Supported |
| ClickHouse | Supported |
| Vertica | Supported |
data-diff integrates with dbt to compare tables between development and production environments:
data-diff --dbtInstall with dbt support:
pip install 'data-diff[dbt]'See the full documentation for configuration details.
The pip install data-diff command works natively on Windows, macOS, and Linux. Database-specific extras install the same way across all platforms.
For development, the Makefile and docker compose workflow assumes a Unix-like shell. On Windows, use WSL (recommended) or Git Bash to run make test, make up, and other development commands.
This project is licensed under the terms of the MIT License.