Skip to content

Bug: Declarative component schema should validate inline stream schemas against JSON Schema spec #832

@aaronsteers

Description

@aaronsteers

Workaround in MCP:

Current Behavior

The InlineSchemaLoader.schema field in declarative_component_schema.yaml is currently defined as:

schema:
  type: object
  additionalProperties: true

This allows any object, including invalid JSON schemas. For example, these invalid schemas currently pass validation:

Example 1: Invalid type value

schema:
  type: object
  properties:
    id:
      type: not_a_valid_type  # Should fail but doesn't

Example 2: Malformed structure
schema: "string instead of object" # Causes Python error later

Example 3: Invalid property definitions

schema:
  type: object
  properties:
    field: "should be object with type"  # Invalid structure

Problem

Invalid schemas pass manifest validation but cause issues later:

  1. Cryptic Python errors during stream reading - e.g., 'str' object has no attribute 'get' when schema is a string
  2. Runtime failures instead of validation-time failures - errors only appear when trying to read records
  3. Poor developer experience - no clear feedback about what's wrong with the schema

Expected Behavior

The declarative component schema should validate that InlineSchemaLoader.schema is a valid JSON Schema conforming to the JSON Schema Draft 7
specification.

Invalid schemas should be caught during manifest validation with clear error messages like:

  • "Invalid JSON Schema type: 'not_a_valid_type' is not a valid type"
  • "Schema must be an object, got string"
  • "Properties must be objects with type definitions"

Proposed Solution

Update the InlineSchemaLoader definition in declarative_component_schema.yaml to validate against the JSON Schema meta-schema.

One approach would be to add a reference to the JSON Schema Draft 7 meta-schema:

InlineSchemaLoader:
  properties:
    schema:
      title: "Schema"
      description: "Describes a streams' schema..."
      type: object
      # Validate that this is a valid JSON Schema
      "$ref": "http://json-schema.org/draft-07/schema#"

Alternatively, add custom validation logic in the CDK to ensure the schema structure is valid before attempting to use it.

Impact

This affects all connector builders using declarative YAML manifests with inline schemas. The fix would:

  • Catch schema errors earlier in the development process
  • Provide clearer error messages
  • Improve developer experience
  • Prevent runtime errors from malformed schemas

Workaround

We've added validation in our MCP server as a temporary workaround, but this should be fixed upstream in the CDK for all users.

Environment

  • CDK version: 6.61.6+ (latest)
  • Component: airbyte_cdk.sources.declarative.declarative_component_schema.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions