From 48c08961b34b2e892924cdeead7394dd00550eef Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Tue, 11 Nov 2025 20:55:10 +0100 Subject: [PATCH] Improve README by referring to canonical documentation Content from `docs/cratedb.md` was re-organized. - https://cratedb.com/docs/guide/integrate/dlt/ - https://cratedb.com/docs/guide/integrate/dlt/usage.html --- README.md | 21 ++++---- docs/cratedb.md | 127 ------------------------------------------------ 2 files changed, 12 insertions(+), 136 deletions(-) delete mode 100644 docs/cratedb.md diff --git a/README.md b/README.md index cc09d06..7467017 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,6 @@ | [License] | [CrateDB] | [Community Forum] -| [Bluesky] ## About @@ -26,19 +25,21 @@ The [dlt-cratedb] package is temporary for shipping the code until ## Documentation -Please refer to the [handbook]. +Please refer to the [overview] and the [usage guide]. ## What's inside - The `cratedb` adapter is heavily based on the `postgres` adapter. - The `CrateDbSqlClient` deviates from the original `Psycopg2SqlClient` by - accounting for [CRATEDB-15161] per `SystemColumnWorkaround`. -- A few more other patches. + accounting for [CRATEDB-15161] per `SystemColumnWorkaround`. This will be + resolved with [DLT-CRATEDB-30] when CrateDB 6.2 will be released around + January/February 2026. +- A few more other patches to account for specifics of CrateDB. ## Backlog -We are tracking corresponding [issues] and a few more [backlog] items -to be resolved as we go. +The project tracks corresponding [issues] and a few more [backlog] items +to be resolved in its incubation phase. [backlog]: https://github.com/crate/dlt-cratedb/blob/main/docs/backlog.md @@ -46,13 +47,15 @@ to be resolved as we go. [dlt]: https://github.com/dlt-hub/dlt [DLT-2733]: https://github.com/dlt-hub/dlt/pull/2733 [dlt-cratedb]: https://pypi.org/project/dlt-cratedb +[DLT-CRATEDB-30]: https://github.com/crate/dlt-cratedb/pull/30 [issues]: https://github.com/crate/dlt-cratedb/issues -[handbook]: https://github.com/crate/dlt-cratedb/blob/main/docs/cratedb.md +[overview]: https://cratedb.com/docs/guide/integrate/dlt/ +[usage guide]: https://cratedb.com/docs/guide/integrate/dlt/usage.html [CrateDB]: https://cratedb.com/database [Bluesky]: https://bsky.app/search?q=cratedb [Community Forum]: https://community.cratedb.com/ -[Documentation]: https://github.com/crate/dlt-cratedb +[Documentation]: https://cratedb.com/docs/guide/integrate/dlt/ [Issues]: https://github.com/crate/dlt-cratedb/issues [License]: https://github.com/crate/dlt-cratedb/blob/main/LICENSE.txt [managed on GitHub]: https://github.com/crate/dlt-cratedb @@ -71,6 +74,6 @@ to be resolved as we go. [project-ci]: https://github.com/crate/dlt-cratedb/actions/workflows/tests.yml [project-coverage]: https://app.codecov.io/gh/crate/dlt-cratedb [project-downloads]: https://pepy.tech/project/dlt-cratedb/ -[project-license]: https://github.com/crate/dlt-cratedb/blob/main/LICENSE +[project-license]: https://github.com/crate/dlt-cratedb/blob/main/LICENSE.txt [project-pypi]: https://pypi.org/project/dlt-cratedb [project-release-notes]: https://github.com/crate/dlt-cratedb/releases diff --git a/docs/cratedb.md b/docs/cratedb.md deleted file mode 100644 index 0399a32..0000000 --- a/docs/cratedb.md +++ /dev/null @@ -1,127 +0,0 @@ ---- -title: CrateDB -description: CrateDB `dlt` destination -keywords: [ cratedb, destination, data warehouse ] ---- - -# CrateDB - -## Install dlt with CrateDB - -**To install the DLT library with CrateDB dependencies:** - -```sh -uv pip install "dlt-cratedb" -``` - -## Setup guide - -### 1. Initialize the dlt project - -Let's start by initializing a new `dlt` project as follows: - -```sh -dlt init chess cratedb -``` - -Because CrateDB currently only supports writing to its default `doc` schema with dlt, -please replace `dataset_name="chess_players_games_data"` with `dataset_name="doc"`. - -The `dlt init` command will initialize your pipeline with `chess` as the source and -`cratedb` as the destination. - -The above command generates several files and directories, including `.dlt/secrets.toml`. - -### 2. Configure credentials - -Next, set up the CrateDB credentials in the `.dlt/secrets.toml` file as shown below. -CrateDB is compatible with PostgreSQL and uses the `psycopg2` driver, like the -`postgres` destination. - -```toml -[destination.cratedb.credentials] -host = "localhost" # CrateDB server host. -port = 5432 # CrateDB PostgreSQL TCP protocol port, default is 5432. -username = "crate" # CrateDB username, default is usually "crate". -password = "" # CrateDB password, if any. -``` - -Alternatively, You can pass a database connection string as shown below. -```toml -destination.cratedb.credentials="postgres://crate:@localhost:5432/" -``` -Keep it at the top of your TOML file, before any section starts. -Because CrateDB uses `psycopg2`, using `postgres://` is the right choice. - -Use Docker or Podman to run an instance of CrateDB for evaluation purposes. -```shell -docker run --rm -it --name=cratedb --publish=4200:4200 --publish=5432:5432 crate:latest -Cdiscovery.type=single-node -``` - -## Data loading - -Data is loaded into CrateDB using the most efficient method depending on the data source: - -- For local files, the `psycopg2` library is used to directly load files into - CrateDB tables using the `INSERT` command. -- For files in remote storage like S3 or Azure Blob Storage, - CrateDB data loading functions are used to read the files and insert the data into tables. - -## Datasets - -CrateDB currently only supports working with its default schema `doc`. -So, please use `dataset_name="doc"`. - -## Supported file formats - -- [INSERT](../file-formats/insert-format.md) is the preferred format for both direct loading and staging. - -The `cratedb` destination has a few specific deviations from the default SQL destinations: - -- CrateDB does not support the `time` datatype. Time will be loaded to a `text` column. -- CrateDB does not support the `binary` datatype. Binary will be loaded to a `text` column. -- CrateDB can produce rounding errors under certain conditions when using the `float/double` datatype. - Make sure to use the `decimal` datatype if you can’t afford to have rounding errors. - -## Supported column hints - -CrateDB supports the following [column hints](../../general-usage/schema#tables-and-columns): - -- `primary_key` - marks the column as part of the primary key. Multiple columns can have this hint to create a composite primary key. - -## Staging support - -CrateDB supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as file staging destinations. - -`dlt` will upload CSV or JSONL files to the staging location and use CrateDB data loading functions -to load the data directly from the staged files. - -Please refer to the filesystem documentation to learn how to configure credentials for the staging destinations: - -- [Amazon S3](./filesystem.md#aws-s3) -- [Azure Blob Storage](./filesystem.md#azure-blob-storage) - -To run a pipeline with staging enabled: - -```py -pipeline = dlt.pipeline( - pipeline_name='chess_pipeline', - destination='cratedb', - staging='filesystem', # add this to activate staging - dataset_name='chess_data' -) -``` - -### dbt support - -Integration with [dbt](../transformations/dbt/dbt.md) is generally supported via [dbt-cratedb2] -but not tested by us. - -### Syncing of `dlt` state - -This destination fully supports [dlt state sync](../../general-usage/state#syncing-state-with-destination). - - -[dbt-cratedb2]: https://pypi.org/project/dbt-cratedb2/ - -