|
13 | 13 | [dlt] (data load tool)—think ELT as Python code—is a popular, |
14 | 14 | production-ready Python library for moving data. It loads data from |
15 | 15 | various and often messy data sources into well-structured, live datasets. |
16 | | -dlt is used by {ref}`ingestr`. |
| 16 | + |
| 17 | +dlt supports [30+ databases supported by SQLAlchemy], |
| 18 | +and is also the workhorse behind the {ref}`ingestr` toolkit. |
17 | 19 |
|
18 | 20 | ::::{grid} |
19 | 21 |
|
@@ -75,32 +77,136 @@ pipeline = dlt.pipeline( |
75 | 77 | pipeline.run(source) |
76 | 78 | ``` |
77 | 79 |
|
78 | | -## Learn |
| 80 | +## Supported features |
| 81 | + |
| 82 | +### Data loading |
| 83 | + |
| 84 | +Data is loaded into CrateDB using the most efficient method depending on the data source. |
| 85 | + |
| 86 | +- For local files, the `psycopg2` library is used to directly load files into |
| 87 | + CrateDB tables using the `INSERT` command. |
| 88 | +- For files in remote storage like S3 or Azure Blob Storage, |
| 89 | + CrateDB data loading functions are used to read the files and insert the data into tables. |
| 90 | + |
| 91 | +### Datasets |
| 92 | + |
| 93 | +Use `dataset_name="doc"` to address CrateDB's default schema `doc`. |
| 94 | +When addressing other schemas, make sure they contain at least one table. [^create-schema] |
| 95 | + |
| 96 | +### File formats |
| 97 | + |
| 98 | +- The [SQL INSERT file format] is the preferred format for both direct loading and staging. |
| 99 | + |
| 100 | +### Column types |
| 101 | + |
| 102 | +The `cratedb` destination has a few specific deviations from the default SQL destinations. |
| 103 | + |
| 104 | +- CrateDB does not support the `time` datatype. Time will be loaded to a `text` column. |
| 105 | +- CrateDB does not support the `binary` datatype. Binary will be loaded to a `text` column. |
| 106 | +- CrateDB can produce rounding errors under certain conditions when using the `float/double` datatype. |
| 107 | + Make sure to use the `decimal` datatype if you can’t afford to have rounding errors. |
| 108 | + |
| 109 | +### Column hints |
| 110 | + |
| 111 | +CrateDB supports the following [column hints]. |
| 112 | + |
| 113 | +- `primary_key` - marks the column as part of the primary key. Multiple columns can have this hint to create a composite primary key. |
| 114 | + |
| 115 | +### File staging |
| 116 | + |
| 117 | +CrateDB supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as file staging destinations. |
| 118 | + |
| 119 | +`dlt` will upload CSV or JSONL files to the staging location and use CrateDB data loading functions |
| 120 | +to load the data directly from the staged files. |
| 121 | + |
| 122 | +Please refer to the filesystem documentation to learn how to configure credentials for the staging destinations. |
| 123 | + |
| 124 | +- [AWS S3] |
| 125 | +- [Azure Blob Storage] |
| 126 | +- [Google Storage] |
| 127 | + |
| 128 | +Invoke a pipeline with staging enabled. |
| 129 | + |
| 130 | +```python |
| 131 | +pipeline = dlt.pipeline( |
| 132 | + pipeline_name='chess_pipeline', |
| 133 | + destination='cratedb', |
| 134 | + staging='filesystem', # add this to activate staging |
| 135 | + dataset_name='chess_data' |
| 136 | +) |
| 137 | +``` |
| 138 | + |
| 139 | +### dbt support |
| 140 | + |
| 141 | +Integration with [dbt] is generally supported via [dbt-cratedb2] but not tested by us. |
| 142 | + |
| 143 | +### dlt state sync |
| 144 | + |
| 145 | +The CrateDB destination fully supports [dlt state sync]. |
| 146 | + |
| 147 | + |
| 148 | +## See also |
| 149 | + |
| 150 | +:::{rubric} Examples |
| 151 | +::: |
79 | 152 |
|
80 | 153 | ::::{grid} |
81 | 154 |
|
| 155 | +:::{grid-item-card} Usage guide: Load API data with dlt |
| 156 | +:link: dlt-usage |
| 157 | +:link-type: ref |
| 158 | +Exercise a canonical `dlt init` example with CrateDB. |
| 159 | +::: |
| 160 | + |
82 | 161 | :::{grid-item-card} Examples: Use dlt with CrateDB |
83 | 162 | :link: https://github.com/crate/cratedb-examples/tree/main/framework/dlt |
84 | 163 | :link-type: url |
85 | | -Executable code examples that demonstrate how to use dlt with CrateDB. |
| 164 | +Executable code examples on GitHub that demonstrate how to use dlt with CrateDB. |
| 165 | +::: |
| 166 | + |
| 167 | +:::: |
| 168 | + |
| 169 | +:::{rubric} Resources |
86 | 170 | ::: |
87 | 171 |
|
88 | | -:::{grid-item-card} Adapter: The dlt destination adapter for CrateDB |
89 | | -:link: https://github.com/crate/dlt-cratedb |
| 172 | +::::{grid} |
| 173 | + |
| 174 | +:::{grid-item-card} Package: `dlt-cratedb` |
| 175 | +:link: https://pypi.org/project/dlt-cratedb/ |
90 | 176 | :link-type: url |
91 | | -Based on the dlt PostgreSQL adapter, the package enables you to work |
92 | | -with dlt and CrateDB. |
| 177 | +The dlt destination adapter for CrateDB is |
| 178 | +based on the dlt PostgreSQL adapter. |
93 | 179 | ::: |
94 | 180 |
|
95 | | -:::{grid-item-card} See also: ingestr |
| 181 | +:::{grid-item-card} Related: `ingestr` |
96 | 182 | :link: ingestr |
97 | 183 | :link-type: ref |
98 | | -The ingestr data import/export application uses dlt. |
| 184 | +The ingestr data import/export application uses dlt as a workhorse. |
99 | 185 | ::: |
100 | 186 |
|
101 | 187 | :::: |
102 | 188 |
|
103 | 189 |
|
| 190 | +:::{toctree} |
| 191 | +:maxdepth: 1 |
| 192 | +:hidden: |
| 193 | +Usage <usage> |
| 194 | +::: |
| 195 | + |
| 196 | + |
| 197 | +[^create-schema]: CrateDB does not support `CREATE SCHEMA` yet, see [CRATEDB-14601]. |
| 198 | + This means by default, unless any table exists within a schema, the schema appears |
| 199 | + to not exist at all. However, it also can't be created explicitly. Schemas are |
| 200 | + currently implicitly created when tables exist in them. |
104 | 201 |
|
105 | | -[databases supported by SQLAlchemy]: https://docs.sqlalchemy.org/en/20/dialects/ |
| 202 | +[30+ databases supported by SQLAlchemy]: https://dlthub.com/docs/dlt-ecosystem/destinations/sqlalchemy |
| 203 | +[AWS S3]: https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#aws-s3 |
| 204 | +[Azure Blob Storage]: https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#azure-blob-storage |
| 205 | +[column hints]: https://dlthub.com/docs/general-usage/schema#column-hint-rules |
| 206 | +[CRATEDB-14601]: https://github.com/crate/crate/issues/14601 |
| 207 | +[dbt]: https://dlthub.com/docs/hub/features/transformations/dbt-transformations |
| 208 | +[dbt-cratedb2]: https://pypi.org/project/dbt-cratedb2/ |
106 | 209 | [dlt]: https://dlthub.com/ |
| 210 | +[dlt state sync]: https://dlthub.com/docs/general-usage/state#syncing-state-with-destination |
| 211 | +[Google Storage]: https://dlthub.com/docs/dlt-ecosystem/destinations/filesystem#google-storage |
| 212 | +[SQL INSERT file format]: https://dlthub.com/docs/dlt-ecosystem/file-formats/insert-format |
0 commit comments