Skip to content

Conversation

@amotl
Copy link
Contributor

@amotl amotl commented Jun 8, 2025

About

A destination adapter for CrateDB, which is compatible with PostgreSQL, currently based on dlt-1.10.0.

Preview

uv pip install --upgrade 'dlt @ git+https://github.com/crate-workbench/dlt.git@cratedb'

Thoughts

📣 New destinations are unlikely to be merged due to high maintenance cost (but we are happy to improve SQLAlchemy destination to handle more dialects)

We hear you. Still, we wanted to submit this patch for inspection at your disposal. Because the adapter is effectively just wrapping the postgres adapter, adding a few adjustments, it doesn't introduce many different technologies or paradigms. Given the smaller footprint compared to other adapters, you may still consider merging it? 🙏

Software tests

pytest -vvv tests/load/cratedb/
9 passed

@netlify
Copy link

netlify bot commented Jun 8, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit aca877d
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/685891181d89700008d7dab4

@amotl amotl marked this pull request as ready for review June 8, 2025 23:51
Comment on lines 58 to 64
def info_schema_null_to_bool(v: str) -> bool:
"""Converts INFORMATION SCHEMA truth values to Python bool"""
if v in ("NO", "0"):
if v in ("NO", "0", False):
return False
elif v in ("YES", "1"):
elif v in ("YES", "1", True):
return True
raise ValueError(v)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@amotl amotl Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. The ticket above has been scheduled as a candidate for CrateDB 6.1, which is certainly in the future. Would it be acceptable for you to merge the patch including this little anomaly? 🙏

Otherwise, please let us know so we can look into adjusting the patch to work differently, when possible. 🤞
Please advise if you think the adjustment is unacceptable, and/or if you see any chances for a different kind of overwrite.

Copy link
Contributor Author

@amotl amotl Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rudolfix. Thank you very much for assigning yourself to the PR.

We are currently working on a spin-off to this patch at dlt-cratedb. After curating the adapter there, we will come back to loop the code into this patch again, subsequently toggling it away from draft mode.

We wanted to inform you about it so you don't get the impression this path will get stale. If you think it might be hibernating for too long, please 🙏 ping us so we can make progress again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The little anomaly discovered and outlined here will be resolved with CrateDB 6.1. Thank you @matriv.

Comment on lines 9 to 25
class SystemColumnWorkaround:
"""
CrateDB reserves `_`-prefixed column names for internal purposes.
When approaching CrateDB with such private columns which are common in
ETL frameworks, a corresponding error will be raised.
InvalidColumnNameException["_dlt_load_id" conflicts with system column pattern]
This class provides utility methods to work around the problem, brutally.
See also: https://github.com/crate/crate/issues/15161
FIXME: Please get rid of this code by resolving the issue in CrateDB.
"""

quirked_labels = ["__dlt_id", "__dlt_load_id"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When loading data from GitHub, we discovered another system column collision.

"_dlt_parent_id" conflicts with system column pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick ag '_dlt.+_id' on the dlt code base reveals a few more: _dlt_root_id, _dlt_list_idx, _dlt_parent_id.

The docs in docs/website/docs/general-usage/schema.md say:

By default, the schema adopts hint rules from the json(relational) normalizer to support correct hinting of columns added by the normalizer:

settings:
  default_hints:
    row_key:
      - _dlt_id
    parent_key:
      - _dlt_parent_id
    not_null:
      - _dlt_id
      - _dlt_root_id
      - _dlt_parent_id
      - _dlt_list_idx
      - _dlt_load_id
    unique:
      - _dlt_id
    root_key:
      - _dlt_root_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional system columns have been considered within the workaround currently needed for CrateDB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will be able to remove this workaround with CrateDB 6.2.

Comment on lines 54 to 55
sql = client._get_table_update_sql("event_test_table", TABLE_UPDATE, False)[0]
sqlfluff.parse(sql, dialect="postgres")
Copy link
Contributor Author

@amotl amotl Jun 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and subsequent spots need CrateDB support for SQLFluff.

@amotl amotl marked this pull request as draft June 22, 2025 10:13
@rudolfix
Copy link
Collaborator

hey @amotl ! we do a cleanup of all community destinations that we are not merging in the core library. they'll get a docs page where we link to your repo. the proposal for the page is here:
#3326

pls tell me

  • if you still want us to link to your fork
  • if you want to change anything in there (you can also do a PR or request changes in comments)

we'll close this PR soon

@amotl
Copy link
Contributor Author

amotl commented Nov 17, 2025

Hi @rudolfix. Thank you very much for linking to our adapter with GH-3326. Let's close this PR right away and come back when CrateDB will provide support for SQLFluff to make it complete.

@amotl amotl closed this Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants