-
Notifications
You must be signed in to change notification settings - Fork 400
CrateDB: Add destination adapter #2733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
| def info_schema_null_to_bool(v: str) -> bool: | ||
| """Converts INFORMATION SCHEMA truth values to Python bool""" | ||
| if v in ("NO", "0"): | ||
| if v in ("NO", "0", False): | ||
| return False | ||
| elif v in ("YES", "1"): | ||
| elif v in ("YES", "1", True): | ||
| return True | ||
| raise ValueError(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already created a ticket about this little anomaly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi. The ticket above has been scheduled as a candidate for CrateDB 6.1, which is certainly in the future. Would it be acceptable for you to merge the patch including this little anomaly? 🙏
Otherwise, please let us know so we can look into adjusting the patch to work differently, when possible. 🤞
Please advise if you think the adjustment is unacceptable, and/or if you see any chances for a different kind of overwrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rudolfix. Thank you very much for assigning yourself to the PR.
We are currently working on a spin-off to this patch at dlt-cratedb. After curating the adapter there, we will come back to loop the code into this patch again, subsequently toggling it away from draft mode.
We wanted to inform you about it so you don't get the impression this path will get stale. If you think it might be hibernating for too long, please 🙏 ping us so we can make progress again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The little anomaly discovered and outlined here will be resolved with CrateDB 6.1. Thank you @matriv.
3068348 to
4df27ac
Compare
| class SystemColumnWorkaround: | ||
| """ | ||
| CrateDB reserves `_`-prefixed column names for internal purposes. | ||
| When approaching CrateDB with such private columns which are common in | ||
| ETL frameworks, a corresponding error will be raised. | ||
| InvalidColumnNameException["_dlt_load_id" conflicts with system column pattern] | ||
| This class provides utility methods to work around the problem, brutally. | ||
| See also: https://github.com/crate/crate/issues/15161 | ||
| FIXME: Please get rid of this code by resolving the issue in CrateDB. | ||
| """ | ||
|
|
||
| quirked_labels = ["__dlt_id", "__dlt_load_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When loading data from GitHub, we discovered another system column collision.
"_dlt_parent_id" conflicts with system column pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick ag '_dlt.+_id' on the dlt code base reveals a few more: _dlt_root_id, _dlt_list_idx, _dlt_parent_id.
The docs in docs/website/docs/general-usage/schema.md say:
By default, the schema adopts hint rules from the json(relational) normalizer to support correct hinting of columns added by the normalizer:
settings:
default_hints:
row_key:
- _dlt_id
parent_key:
- _dlt_parent_id
not_null:
- _dlt_id
- _dlt_root_id
- _dlt_parent_id
- _dlt_list_idx
- _dlt_load_id
unique:
- _dlt_id
root_key:
- _dlt_root_idThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional system columns have been considered within the workaround currently needed for CrateDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will be able to remove this workaround with CrateDB 6.2.
| sql = client._get_table_update_sql("event_test_table", TABLE_UPDATE, False)[0] | ||
| sqlfluff.parse(sql, dialect="postgres") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and subsequent spots need CrateDB support for SQLFluff.
|
hey @amotl ! we do a cleanup of all community destinations that we are not merging in the core library. they'll get a docs page where we link to your repo. the proposal for the page is here: pls tell me
we'll close this PR soon |
|
Hi @rudolfix. Thank you very much for linking to our adapter with GH-3326. Let's close this PR right away and come back when CrateDB will provide support for SQLFluff to make it complete. |
About
A destination adapter for CrateDB, which is compatible with PostgreSQL, currently based on dlt-1.10.0.
Preview
uv pip install --upgrade 'dlt @ git+https://github.com/crate-workbench/dlt.git@cratedb'Thoughts
We hear you. Still, we wanted to submit this patch for inspection at your disposal. Because the adapter is effectively just wrapping the
postgresadapter, adding a few adjustments, it doesn't introduce many different technologies or paradigms. Given the smaller footprint compared to other adapters, you may still consider merging it? 🙏Software tests