Skip to content

Declarative: Pattern-based partition routing for dynamic repository discovery #838

@devin-ai-integration

Description

@devin-ai-integration

Problem

Some connectors need to dynamically discover resources based on wildcard patterns (e.g., airbytehq/* to sync all repositories in an organization). Currently, this requires custom Python code to expand patterns into actual resource lists, preventing connectors from being fully declarative (manifest-only).

Current Workaround

Connectors like source-github implement custom logic to expand wildcard patterns. See source-github/source.py:215-313 which implements _get_org_repositories() to:

  • Parse patterns like airbytehq/* or airbytehq/a*
  • Call GitHub API to list all repositories in the organization
  • Filter repositories by pattern matching
  • Return expanded list of repositories

Proposed Solution

Add a declarative PatternPartitionRouter component that supports:

  1. Pattern expansion: Accept wildcard patterns and expand them via API calls
  2. API-based discovery: Call an API endpoint to list available resources
  3. Pattern matching: Filter resources by glob/regex patterns
  4. Caching: Cache expanded patterns to avoid repeated API calls

Example Configuration

partition_router:
  type: PatternPartitionRouter
  pattern_field: "repositories"  # Config field containing patterns
  discovery_requester:
    type: HttpRequester
    url_base: "https://api.github.com"
    path: "/orgs/{{ pattern.org }}/repos"
    authenticator:
      type: BearerAuthenticator
      api_token: "{{ config.access_token }}"
  pattern_parser:
    type: GlobPattern
    separator: "/"
    wildcard: "*"
  record_selector:
    extractor:
      field_path: []
  partition_fields:
    - owner: "{{ item.owner.login }}"
    - name: "{{ item.name }}"

Example Use Cases

Input patterns:

  • airbytehq/* → Expands to all repos in airbytehq org
  • airbytehq/airbyte → Single specific repo
  • airbytehq/a* → All repos starting with 'a' in airbytehq org
  • org1/repo1 org2/* → Multiple patterns

Expanded partitions:

[
  {"owner": "airbytehq", "name": "airbyte"},
  {"owner": "airbytehq", "name": "airbyte-platform"},
  {"owner": "airbytehq", "name": "airbyte-python-cdk"},
  ...
]

Impact

This would enable connectors like source-github, source-gitlab, and others to support wildcard patterns declaratively without custom Python code.

Related

Code References

Metadata

Metadata

Assignees

Labels

manifest-only feature gapsMissing capabilities that prevent connectors from being fully declarative (manifest-only)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions