feat(catalog_integration): add IcebergRestCatalogIntegration#16
feat(catalog_integration): add IcebergRestCatalogIntegration#16usbrandon wants to merge 4 commits into
Conversation
Adds support for `CATALOG_SOURCE = ICEBERG_REST` catalog integrations,
which are the AWS-recommended path for accessing Amazon S3 Tables
federated catalogs from Snowflake (and any other Iceberg REST-compatible
catalog). Previously snowcap only modeled `GLUE` and `OBJECT_STORE`,
and the legacy GLUE source rejects the federated S3 Tables form
'<account>:s3tablescatalog/<bucket>' for `GLUE_CATALOG_ID`, leaving
S3 Tables-managed iceberg unmodelable in YAML.
Resource shape:
catalog_integrations:
- name: ci_s3_tables_dev
catalog_source: ICEBERG_REST
catalog_namespace: my_namespace
rest_config:
catalog_uri: https://glue.us-east-1.amazonaws.com/iceberg
catalog_api_type: AWS_GLUE
catalog_name: '123456789012:s3tablescatalog/my_table_bucket'
access_delegation_mode: VENDED_CREDENTIALS
rest_authentication:
type: SIGV4
sigv4_iam_role: arn:aws:iam::123456789012:role/snowflake-s3-tables-read
sigv4_signing_region: us-east-1
enabled: true
`rest_config` and `rest_authentication` are nested PropSet blocks,
emitting `REST_CONFIG = (KEY = VAL ...)` / `REST_AUTHENTICATION = (...)`
in the CREATE DDL. Both fields are marked `fetchable: False` because
Snowflake auto-populates `WAREHOUSE` (echoes `CATALOG_NAME` when not
explicitly set) and `PREFIX` (null) — so YAML is the source of truth
for nested block contents (matches Stage.encryption precedent).
Adds a `_parse_enum_map` helper in `data_provider.py` for the
`{KEY=VAL, KEY=VAL}` EnumMap response type that DESC CATALOG INTEGRATION
uses for nested config blocks.
Adds `CatalogApiType` (AWS_GLUE, AWS_API_GATEWAY, PUBLIC),
`AccessDelegationMode` (VENDED_CREDENTIALS), and `RestAuthenticationType`
(SIGV4, OAUTH, BEARER, NONE) enums to validate nested block values.
Validated end-to-end against a live Snowflake account with an existing
S3 Tables federated catalog: `snowcap plan` no longer flags the
existing integration (no false diff), and `create_sql()` emits a DDL
that Snowflake accepts.
Adds a fetch test for the new resource type alongside the existing
GlueCatalogIntegration test.
Refs: https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-vended-credentials
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR ReviewThanks for this contribution! The implementation looks solid and follows the existing patterns well. Required Changes1. Missing JSON fixture (test failure) The test Based on the existing fixtures, it should look something like: {
"name": "ICEBERGRESTCATALOGINT",
"catalog_source": "ICEBERG_REST",
"catalog_namespace": "some_namespace",
"table_format": "ICEBERG",
"rest_config": {
"catalog_uri": "https://glue.us-east-1.amazonaws.com/iceberg",
"catalog_api_type": "AWS_GLUE",
"catalog_name": "123456789012"
},
"rest_authentication": {
"type": "SIGV4",
"sigv4_iam_role": "arn:aws:iam::123456789012:role/SnowflakeAccess",
"sigv4_signing_region": "us-east-1"
},
"enabled": true,
"owner": "ACCOUNTADMIN",
"comment": "This is a test Iceberg REST catalog integration"
}Optional Enhancement2. Missing The docstring mentions rest_authentication=PropSet(
"rest_authentication",
Props(
# ... existing props ...
oauth_allowed_scopes=StringProp("oauth_allowed_scopes"), # add this
),
),Ruff linting passes ✅ |
Per @noel's review on datacoves#16: 1. Add tests/fixtures/json/iceberg_rest_catalog_integration.json so test_polymorphic_resources can resolve the new ICEBERG_REST subtype from a fixture (matches the reviewer's suggested template). 2. Add oauth_allowed_scopes to the rest_authentication PropSet — was mentioned in the docstring but missing from the actual props. Now round-trips correctly: tested via create_sql() with type=OAUTH. Verified locally: all 28 tests in test_polymorphic_resources pass, including the new IcebergRestCatalogIntegration polymorphic resolution. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
Thanks for the review @noel! Both items addressed in 13da5b6:
Existing SIGV4 path (the one we use in production for AWS S3 Tables federated catalogs) is untouched and still 0-diff against our live deployment. |
Follow-up: One more fix neededTests are still failing - the JSON fixture needs to include the Fix: Add {
"name": "ICEBERGRESTCATALOGINT",
"catalog_source": "ICEBERG_REST",
"catalog_namespace": "some_namespace",
"table_format": "ICEBERG",
"rest_config": {
"catalog_uri": "https://glue.us-east-1.amazonaws.com/iceberg",
"catalog_api_type": "AWS_GLUE",
"catalog_name": "123456789012"
},
"rest_authentication": {
"type": "SIGV4",
"sigv4_iam_role": "arn:aws:iam::123456789012:role/SnowflakeAccess",
"sigv4_signing_region": "us-east-1"
},
"enabled": true,
"refresh_interval_seconds": null,
"owner": "ACCOUNTADMIN",
"comment": "This is a test Iceberg REST catalog integration"
}Once this is added, all 1480 tests should pass ✅ |
Documentation neededThe docs are manually maintained, so you'll need to add documentation for the new resource: 1. Create
|
…re fix Address @noel's second-round PR datacoves#16 feedback: - tests/fixtures/json/iceberg_rest_catalog_integration.json: add "refresh_interval_seconds": null to match resource serialization pattern used by other fixtures. Full unit suite (1480 tests) now passes. - docs/resources/iceberg_rest_catalog_integration.md: new page following glue_catalog_integration.md template. Covers YAML + Python examples for the AWS S3 Tables (SIGV4 + AWS_GLUE) production path and lists rest_authentication fields per auth type (SIGV4/OAUTH/BEARER). - mkdocs.yml: register IcebergRestCatalogIntegration under Integrations > Catalog between Glue and ObjectStore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks @noel — both follow-ups addressed in 0c39d93: 1. Fixture fix — added (The 14 warnings are the existing pre-3.14 2. Documentation — created Ready for another look when you have a moment. |
… example
Adds an end-to-end "Minimal example: AWS S3 Tables behind Lake Formation"
section to the IcebergRestCatalogIntegration doc, distilled from the BCS
production onboarding for our Epicor P21 → S3 Tables → Snowflake pipeline.
Covers the four AWS-side pieces (S3 Tables bucket, Lake Formation
integration, LF grants, cross-account IAM role) and the two Snowcap
resources that bind them to Snowflake (catalog_integrations +
storage_integrations together), plus the post-deploy CREATE ICEBERG TABLE
+ SYSTEM$VERIFY_CATALOG_INTEGRATION verification step.
Includes the gotchas we hit during onboarding:
- GlueCatalogIntegration/CATALOG_SOURCE=GLUE rejects the federated
`<account>:s3tablescatalog/<bucket>` form for GLUE_CATALOG_ID with
SQL compilation error 22023/1008 — this is the failure mode that
motivated adding IcebergRestCatalogIntegration in the first place.
- Lake Formation DESCRIBE/SELECT grants are easy to forget; DESC
CATALOG INTEGRATION succeeds without them but CREATE ICEBERG TABLE
fails with a vague 403.
- access_delegation_mode=VENDED_CREDENTIALS is required when the
writer (e.g., pyiceberg-rest on EC2) relies on Lake Formation to
vend temporary S3 credentials.
- sigv4_signing_region must match the S3 Tables bucket region.
Test suite (tests/test_polymorphic_resources.py) still passes (28/28).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
One more doc addition in 8eb07b3 — felt worth including a complete, opinionated, end-to-end "minimal example" so the next person wiring up S3 Tables doesn't have to re-derive what's actually a 6-resource setup (4 AWS pieces + 2 Snowcap resources). The new section in
The example uses placeholder account Polymorphic test suite still green (28/28). Full unit suite was 1480 green before this commit — this commit only touches docs/, so no test impact. |
Summary
Adds support for
CATALOG_SOURCE = ICEBERG_RESTcatalog integrations, which are the AWS-recommended path for accessing Amazon S3 Tables federated catalogs from Snowflake (and any other Iceberg REST-compatible catalog).Why
Today snowcap only models
GLUEandOBJECT_STOREcatalog sources. For S3 Tables federated catalogs (<account>:s3tablescatalog/<bucket>), the legacyGLUEsource path doesn't work — Snowflake'sGLUE_CATALOG_IDparameter rejects that form (verified: SQL compilation error 22023/1008). The supported path isCATALOG_SOURCE = ICEBERG_RESTwithCATALOG_API_TYPE = AWS_GLUEandCATALOG_NAMEset to the federated form, plusACCESS_DELEGATION_MODE = VENDED_CREDENTIALSfor Lake Formation credential vending.The Snowflake docs for this path: https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-vended-credentials
Resource shape
Implementation notes
rest_config/rest_authenticationare nestedPropSetblocks, emittingREST_CONFIG = (KEY = VAL ...)/REST_AUTHENTICATION = (...)in the CREATE DDL.fetchable: Falsebecause Snowflake auto-populatesWAREHOUSE(echoesCATALOG_NAMEwhen unset) andPREFIX(null), so YAML is the source of truth for nested block contents — same precedent asStage.encryption._parse_enum_maphelper indata_provider.pyparses Snowflake's{KEY=VAL, KEY=VAL}EnumMap response type. Nulls and empty values are dropped.CatalogApiType(AWS_GLUE, AWS_API_GATEWAY, PUBLIC),AccessDelegationMode(VENDED_CREDENTIALS),RestAuthenticationType(SIGV4, OAUTH, BEARER, NONE).Validation
Tested end-to-end against a live Snowflake account with an existing S3 Tables federated catalog (created via raw DDL pre-this-PR):
IcebergRestCatalogIntegration(...).create_sql()emits a DDL Snowflake accepts.snowcap planagainst the YAML matches existing state — no false diff (went from "5 to update" to "3 to update" once this resource started handling the two existing S3 Tables integrations).Test plan
create_sql()returns valid DDLtests/integration/data_provider/test_fetch_resource_simple.pyalongsideGlueCatalogIntegration🤖 Generated with Claude Code