-
Notifications
You must be signed in to change notification settings - Fork 163
Support Nullable(JSON) in ClickHouse #3626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tested this change locally, it looks like with this change a regression is introduced: CDC is no longer replicating data correctly. It replicates empty object for JSON, or NULL for Nullable(JSON)). (also, looks like our e2e tests for MongoDB should have caught this regression but didn't, so thanks for surfacing this issue.) |
|
@jgao54, you are totally right, how did I miss that? I found that in the clone step, the avro files have a different schema from the CDC step. Here's an example:
{
"type" : "record",
"name" : "_peerdb_raw_aggregators_basculin_dwh",
"fields" : [ {
"name" : "_peerdb_uid",
"type" : "string"
}, {
"name" : "_peerdb_timestamp",
"type" : "long"
}, {
"name" : "_peerdb_destination_table_name",
"type" : "string"
}, {
"name" : "_peerdb_data",
"type" : "string"
}, {
"name" : "_peerdb_record_type",
"type" : [ "null", "long" ]
}, {
"name" : "_peerdb_match_data",
"type" : [ "null", "string" ]
}, {
"name" : "_peerdb_batch_id",
"type" : [ "null", "long" ]
}, {
"name" : "_peerdb_unchanged_toast_columns",
"type" : [ "null", "string" ]
} ]
}
{
"type" : "record",
"name" : "aggregators_basculin_activity_views",
"fields" : [ {
"name" : "user_id_0",
"type" : {
"type" : "string",
"logicalType" : "uuid"
}
}, {
"name" : "operation_id_1",
"type" : {
"type" : "string",
"logicalType" : "uuid"
}
}, {
"name" : "data_2",
"type" : [ "null", "string" ]
}, {
"name" : "created_at_3",
"type" : {
"type" : "long",
"logicalType" : "timestamp-micros"
}
}, {
"name" : "updated_at_4",
"type" : {
"type" : "long",
"logicalType" : "timestamp-micros"
}
} ]
}In other words, the CDC step is wrapping the record in an envelope, while the Clone step is dumping the record by itself. For the life of me, I could not get the tests to run on my machine, but I was able to figure out that in the CDC step, it needs to first extract the I basically had to split the handling of both |
|
Thanks for the contribution! I'll run some tests this week and if things look good will get this merged. |
| ) | ||
| } | ||
| case "JSON", "Nullable(JSON)": | ||
| case "JSON": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be combined into:
| case "JSON": | |
| case "JSON", "Nullable(JSON)": | |
| stringType := strings.Replace(clickHouseType, "JSON", "String", 1) | |
| fmt.Fprintf(&projection, | |
| "JSONExtract(_peerdb_data, %s, '%s')::%s AS %s,", | |
| peerdb_clickhouse.QuoteLiteral(colName), | |
| stringType, | |
| clickHouseType, | |
| peerdb_clickhouse.QuoteIdentifier(dstColName), | |
| ) | |
| if t.enablePrimaryUpdate { | |
| fmt.Fprintf(&projectionUpdate, | |
| "JSONExtract(_peerdb_match_data, %s, '%s')::%s AS %s,", | |
| peerdb_clickhouse.QuoteLiteral(colName), | |
| stringType, | |
| clickHouseType, | |
| peerdb_clickhouse.QuoteIdentifier(dstColName), | |
| ) | |
| } | |
This is my attempt at addressing issue #3546.
I managed to make it work on my setup, which pulls NULL JSONB columns in a Postgres db into Nullabl(JSON) columns in ClickHouse