Skip to content

feat(catalog_integration): add IcebergRestCatalogIntegration#16

Open
usbrandon wants to merge 4 commits into
datacoves:mainfrom
usbrandon:feat/iceberg-rest-catalog-integration
Open

feat(catalog_integration): add IcebergRestCatalogIntegration#16
usbrandon wants to merge 4 commits into
datacoves:mainfrom
usbrandon:feat/iceberg-rest-catalog-integration

Conversation

@usbrandon
Copy link
Copy Markdown
Contributor

Summary

Adds support for CATALOG_SOURCE = ICEBERG_REST catalog integrations, which are the AWS-recommended path for accessing Amazon S3 Tables federated catalogs from Snowflake (and any other Iceberg REST-compatible catalog).

Why

Today snowcap only models GLUE and OBJECT_STORE catalog sources. For S3 Tables federated catalogs (<account>:s3tablescatalog/<bucket>), the legacy GLUE source path doesn't work — Snowflake's GLUE_CATALOG_ID parameter rejects that form (verified: SQL compilation error 22023/1008). The supported path is CATALOG_SOURCE = ICEBERG_REST with CATALOG_API_TYPE = AWS_GLUE and CATALOG_NAME set to the federated form, plus ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS for Lake Formation credential vending.

The Snowflake docs for this path: https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-vended-credentials

Resource shape

catalog_integrations:
  - name: ci_s3_tables_dev
    catalog_source: ICEBERG_REST
    catalog_namespace: my_namespace
    rest_config:
      catalog_uri: https://glue.us-east-1.amazonaws.com/iceberg
      catalog_api_type: AWS_GLUE
      catalog_name: '123456789012:s3tablescatalog/my_table_bucket'
      access_delegation_mode: VENDED_CREDENTIALS
    rest_authentication:
      type: SIGV4
      sigv4_iam_role: arn:aws:iam::123456789012:role/snowflake-s3-tables-read
      sigv4_signing_region: us-east-1
    enabled: true

Implementation notes

  • rest_config / rest_authentication are nested PropSet blocks, emitting REST_CONFIG = (KEY = VAL ...) / REST_AUTHENTICATION = (...) in the CREATE DDL.
  • Both fields are marked fetchable: False because Snowflake auto-populates WAREHOUSE (echoes CATALOG_NAME when unset) and PREFIX (null), so YAML is the source of truth for nested block contents — same precedent as Stage.encryption.
  • New _parse_enum_map helper in data_provider.py parses Snowflake's {KEY=VAL, KEY=VAL} EnumMap response type. Nulls and empty values are dropped.
  • New enums: CatalogApiType (AWS_GLUE, AWS_API_GATEWAY, PUBLIC), AccessDelegationMode (VENDED_CREDENTIALS), RestAuthenticationType (SIGV4, OAUTH, BEARER, NONE).

Validation

Tested end-to-end against a live Snowflake account with an existing S3 Tables federated catalog (created via raw DDL pre-this-PR):

  • IcebergRestCatalogIntegration(...).create_sql() emits a DDL Snowflake accepts.
  • snowcap plan against the YAML matches existing state — no false diff (went from "5 to update" to "3 to update" once this resource started handling the two existing S3 Tables integrations).
  • ✅ Round-trip: YAML → CREATE → fetch → 0 diff.

Test plan

  • Unit-style: import the resource, instantiate, create_sql() returns valid DDL
  • Integration: added the new resource type to tests/integration/data_provider/test_fetch_resource_simple.py alongside GlueCatalogIntegration
  • Live: 0-diff plan against a real S3 Tables federated catalog

🤖 Generated with Claude Code

Adds support for `CATALOG_SOURCE = ICEBERG_REST` catalog integrations,
which are the AWS-recommended path for accessing Amazon S3 Tables
federated catalogs from Snowflake (and any other Iceberg REST-compatible
catalog). Previously snowcap only modeled `GLUE` and `OBJECT_STORE`,
and the legacy GLUE source rejects the federated S3 Tables form
'<account>:s3tablescatalog/<bucket>' for `GLUE_CATALOG_ID`, leaving
S3 Tables-managed iceberg unmodelable in YAML.

Resource shape:

  catalog_integrations:
    - name: ci_s3_tables_dev
      catalog_source: ICEBERG_REST
      catalog_namespace: my_namespace
      rest_config:
        catalog_uri: https://glue.us-east-1.amazonaws.com/iceberg
        catalog_api_type: AWS_GLUE
        catalog_name: '123456789012:s3tablescatalog/my_table_bucket'
        access_delegation_mode: VENDED_CREDENTIALS
      rest_authentication:
        type: SIGV4
        sigv4_iam_role: arn:aws:iam::123456789012:role/snowflake-s3-tables-read
        sigv4_signing_region: us-east-1
      enabled: true

`rest_config` and `rest_authentication` are nested PropSet blocks,
emitting `REST_CONFIG = (KEY = VAL ...)` / `REST_AUTHENTICATION = (...)`
in the CREATE DDL. Both fields are marked `fetchable: False` because
Snowflake auto-populates `WAREHOUSE` (echoes `CATALOG_NAME` when not
explicitly set) and `PREFIX` (null) — so YAML is the source of truth
for nested block contents (matches Stage.encryption precedent).

Adds a `_parse_enum_map` helper in `data_provider.py` for the
`{KEY=VAL, KEY=VAL}` EnumMap response type that DESC CATALOG INTEGRATION
uses for nested config blocks.

Adds `CatalogApiType` (AWS_GLUE, AWS_API_GATEWAY, PUBLIC),
`AccessDelegationMode` (VENDED_CREDENTIALS), and `RestAuthenticationType`
(SIGV4, OAUTH, BEARER, NONE) enums to validate nested block values.

Validated end-to-end against a live Snowflake account with an existing
S3 Tables federated catalog: `snowcap plan` no longer flags the
existing integration (no false diff), and `create_sql()` emits a DDL
that Snowflake accepts.

Adds a fetch test for the new resource type alongside the existing
GlueCatalogIntegration test.

Refs: https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-catalog-integration-vended-credentials

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@noel
Copy link
Copy Markdown
Contributor

noel commented May 9, 2026

PR Review

Thanks for this contribution! The implementation looks solid and follows the existing patterns well.

Required Changes

1. Missing JSON fixture (test failure)

The test test_polymorphic_resources is failing because it expects a fixture at tests/fixtures/json/iceberg_rest_catalog_integration.json.

Based on the existing fixtures, it should look something like:

{
    "name": "ICEBERGRESTCATALOGINT",
    "catalog_source": "ICEBERG_REST",
    "catalog_namespace": "some_namespace",
    "table_format": "ICEBERG",
    "rest_config": {
        "catalog_uri": "https://glue.us-east-1.amazonaws.com/iceberg",
        "catalog_api_type": "AWS_GLUE",
        "catalog_name": "123456789012"
    },
    "rest_authentication": {
        "type": "SIGV4",
        "sigv4_iam_role": "arn:aws:iam::123456789012:role/SnowflakeAccess",
        "sigv4_signing_region": "us-east-1"
    },
    "enabled": true,
    "owner": "ACCOUNTADMIN",
    "comment": "This is a test Iceberg REST catalog integration"
}

Optional Enhancement

2. Missing oauth_allowed_scopes prop

The docstring mentions oauth_allowed_scopes for OAUTH authentication, but it's not defined in the rest_authentication PropSet. Consider adding it for completeness:

rest_authentication=PropSet(
    "rest_authentication",
    Props(
        # ... existing props ...
        oauth_allowed_scopes=StringProp("oauth_allowed_scopes"),  # add this
    ),
),

Ruff linting passes ✅

Per @noel's review on datacoves#16:

1. Add tests/fixtures/json/iceberg_rest_catalog_integration.json so
   test_polymorphic_resources can resolve the new ICEBERG_REST subtype
   from a fixture (matches the reviewer's suggested template).

2. Add oauth_allowed_scopes to the rest_authentication PropSet — was
   mentioned in the docstring but missing from the actual props. Now
   round-trips correctly: tested via create_sql() with type=OAUTH.

Verified locally: all 28 tests in test_polymorphic_resources pass,
including the new IcebergRestCatalogIntegration polymorphic resolution.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@usbrandon
Copy link
Copy Markdown
Contributor Author

Thanks for the review @noel! Both items addressed in 13da5b6:

  1. Fixture added at tests/fixtures/json/iceberg_rest_catalog_integration.json — used your suggested template verbatim. Verified locally: all 28 tests in tests/test_polymorphic_resources.py pass, including the new CATALOG INTEGRATION:IcebergRestCatalogIntegration polymorphic resolution.

  2. oauth_allowed_scopes added to the rest_authentication PropSet alongside the other oauth_* props. Verified the new prop renders in DDL via a quick create_sql() smoke test:

    rest_authentication = (... OAUTH_ALLOWED_SCOPES = $$catalog write$$ ...)
    

Existing SIGV4 path (the one we use in production for AWS S3 Tables federated catalogs) is untouched and still 0-diff against our live deployment.

@noel
Copy link
Copy Markdown
Contributor

noel commented May 11, 2026

Follow-up: One more fix needed

Tests are still failing - the JSON fixture needs to include the refresh_interval_seconds field with a null value to match the resource serialization (other fixtures in the repo follow this pattern for optional fields).

Fix: Add "refresh_interval_seconds": null to tests/fixtures/json/iceberg_rest_catalog_integration.json:

{
    "name": "ICEBERGRESTCATALOGINT",
    "catalog_source": "ICEBERG_REST",
    "catalog_namespace": "some_namespace",
    "table_format": "ICEBERG",
    "rest_config": {
        "catalog_uri": "https://glue.us-east-1.amazonaws.com/iceberg",
        "catalog_api_type": "AWS_GLUE",
        "catalog_name": "123456789012"
    },
    "rest_authentication": {
        "type": "SIGV4",
        "sigv4_iam_role": "arn:aws:iam::123456789012:role/SnowflakeAccess",
        "sigv4_signing_region": "us-east-1"
    },
    "enabled": true,
    "refresh_interval_seconds": null,
    "owner": "ACCOUNTADMIN",
    "comment": "This is a test Iceberg REST catalog integration"
}

Once this is added, all 1480 tests should pass ✅

@noel
Copy link
Copy Markdown
Contributor

noel commented May 11, 2026

Documentation needed

The docs are manually maintained, so you'll need to add documentation for the new resource:

1. Create docs/resources/iceberg_rest_catalog_integration.md

Follow the pattern of glue_catalog_integration.md. You can pull most of the content from your class docstring. Something like:

---
description: >-
  An Iceberg REST catalog integration in Snowflake.
---

# IcebergRestCatalogIntegration

[Snowflake Documentation](https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-rest) | Snowcap CLI label: `iceberg_rest_catalog_integration`

Manages an Apache Iceberg REST catalog integration in Snowflake. This is the right choice for AWS S3 Tables (federated catalogs reachable via the Glue Iceberg REST endpoint) and any other Iceberg REST-compatible catalog.

## Examples

### YAML

```yaml
catalog_integrations:
  - name: ci_s3_tables_dev
    catalog_source: ICEBERG_REST
    catalog_namespace: my_namespace
    rest_config:
      catalog_uri: https://glue.us-east-1.amazonaws.com/iceberg
      catalog_api_type: AWS_GLUE
      catalog_name: '123456789012:s3tablescatalog/my_table_bucket'
      access_delegation_mode: VENDED_CREDENTIALS
    rest_authentication:
      type: SIGV4
      sigv4_iam_role: arn:aws:iam::123456789012:role/snowflake-s3-tables-read
      sigv4_signing_region: us-east-1
    enabled: true

Python

catalog = IcebergRestCatalogIntegration(
    name="ci_s3_tables_dev",
    catalog_namespace="my_namespace",
    rest_config={
        "catalog_uri": "https://glue.us-east-1.amazonaws.com/iceberg",
        "catalog_api_type": "AWS_GLUE",
        "catalog_name": "123456789012:s3tablescatalog/my_table_bucket",
        "access_delegation_mode": "VENDED_CREDENTIALS",
    },
    rest_authentication={
        "type": "SIGV4",
        "sigv4_iam_role": "arn:aws:iam::123456789012:role/snowflake-s3-tables-read",
        "sigv4_signing_region": "us-east-1",
    },
    enabled=True,
)

Fields

  • name (string, required) - The name of the catalog integration.
  • rest_config (dict, required) - Iceberg REST configuration. Required keys: catalog_uri. Optional: catalog_api_type, catalog_name, warehouse, prefix, access_delegation_mode.
  • rest_authentication (dict, required) - Authentication block. Required key: type (SIGV4, OAUTH, BEARER, NONE). See class docstring for auth-specific fields.
  • catalog_namespace (string) - Default namespace for tables referencing this catalog.
  • enabled (bool) - Whether the integration is enabled. Defaults to True.
  • refresh_interval_seconds (int) - Optional metadata refresh interval.
  • table_format (string) - Table format. Only ICEBERG is supported.
  • owner (string or Role) - Owner role. Defaults to "ACCOUNTADMIN".
  • comment (string) - Optional comment.

### 2. Update `mkdocs.yml`

Add entry under `Integrations > Catalog` (around line 156):

```yaml
      - Catalog:
        - GlueCatalogIntegration: resources/glue_catalog_integration.md
        - IcebergRestCatalogIntegration: resources/iceberg_rest_catalog_integration.md
        - ObjectStoreCatalogIntegration: resources/object_store_catalog_integration.md

…re fix

Address @noel's second-round PR datacoves#16 feedback:

- tests/fixtures/json/iceberg_rest_catalog_integration.json:
  add "refresh_interval_seconds": null to match resource serialization
  pattern used by other fixtures. Full unit suite (1480 tests) now passes.

- docs/resources/iceberg_rest_catalog_integration.md:
  new page following glue_catalog_integration.md template. Covers YAML +
  Python examples for the AWS S3 Tables (SIGV4 + AWS_GLUE) production path
  and lists rest_authentication fields per auth type (SIGV4/OAUTH/BEARER).

- mkdocs.yml:
  register IcebergRestCatalogIntegration under Integrations > Catalog
  between Glue and ObjectStore.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@usbrandon
Copy link
Copy Markdown
Contributor Author

Thanks @noel — both follow-ups addressed in 0c39d93:

1. Fixture fix — added "refresh_interval_seconds": null to tests/fixtures/json/iceberg_rest_catalog_integration.json. Full unit suite now green locally:

1480 passed, 14 warnings in 9.29s

(The 14 warnings are the existing pre-3.14 pathspec GitWildMatchPattern deprecation warnings — unrelated to this PR.)

2. Documentation — created docs/resources/iceberg_rest_catalog_integration.md following the glue_catalog_integration.md shape and registered it under Integrations > Catalog in mkdocs.yml, alphabetically between Glue and ObjectStore. The example uses the AWS S3 Tables + Glue Iceberg REST + SIGV4 path that we run in production at BCS, and the Fields section enumerates the per-auth-type keys for rest_authentication (SIGV4/OAUTH/BEARER).

Ready for another look when you have a moment.

… example

Adds an end-to-end "Minimal example: AWS S3 Tables behind Lake Formation"
section to the IcebergRestCatalogIntegration doc, distilled from the BCS
production onboarding for our Epicor P21 → S3 Tables → Snowflake pipeline.

Covers the four AWS-side pieces (S3 Tables bucket, Lake Formation
integration, LF grants, cross-account IAM role) and the two Snowcap
resources that bind them to Snowflake (catalog_integrations +
storage_integrations together), plus the post-deploy CREATE ICEBERG TABLE
+ SYSTEM$VERIFY_CATALOG_INTEGRATION verification step.

Includes the gotchas we hit during onboarding:

  - GlueCatalogIntegration/CATALOG_SOURCE=GLUE rejects the federated
    `<account>:s3tablescatalog/<bucket>` form for GLUE_CATALOG_ID with
    SQL compilation error 22023/1008 — this is the failure mode that
    motivated adding IcebergRestCatalogIntegration in the first place.
  - Lake Formation DESCRIBE/SELECT grants are easy to forget; DESC
    CATALOG INTEGRATION succeeds without them but CREATE ICEBERG TABLE
    fails with a vague 403.
  - access_delegation_mode=VENDED_CREDENTIALS is required when the
    writer (e.g., pyiceberg-rest on EC2) relies on Lake Formation to
    vend temporary S3 credentials.
  - sigv4_signing_region must match the S3 Tables bucket region.

Test suite (tests/test_polymorphic_resources.py) still passes (28/28).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@usbrandon
Copy link
Copy Markdown
Contributor Author

One more doc addition in 8eb07b3 — felt worth including a complete, opinionated, end-to-end "minimal example" so the next person wiring up S3 Tables doesn't have to re-derive what's actually a 6-resource setup (4 AWS pieces + 2 Snowcap resources).

The new section in docs/resources/iceberg_rest_catalog_integration.md:

  • lists the AWS-side prerequisites (S3 Tables bucket, Lake Formation S3 Tables integration, LF grants, cross-account IAM role with the specific Glue/LF/S3-Tables actions needed)
  • shows the paired Snowcap YAML — catalog_integrations: and storage_integrations: together, because either one alone is a misconfiguration
  • shows the post-deploy CREATE ICEBERG TABLE … CATALOG = … EXTERNAL_VOLUME = … and SYSTEM$VERIFY_CATALOG_INTEGRATION(...) so users can test the wiring before pointing real tables at it
  • documents the four onboarding gotchas we hit, including the legacy CATALOG_SOURCE = GLUE rejection of federated catalog IDs with SQL compilation error 22023/1008 — which is the failure mode that motivated this PR in the first place

The example uses placeholder account 123456789012 and the production names from our deployment (ci_p21_iceberg_prd, si_p21_raw_prd, bcs-iceberg-raw-prd, role snowflake-s3-tables-read). Mirrors the working DDL we've been running since 2026-05-08.

Polymorphic test suite still green (28/28). Full unit suite was 1480 green before this commit — this commit only touches docs/, so no test impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants