-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Labels
manifest-only feature gapsMissing capabilities that prevent connectors from being fully declarative (manifest-only)Missing capabilities that prevent connectors from being fully declarative (manifest-only)
Description
Problem
Some connectors need to dynamically discover resources based on wildcard patterns (e.g., airbytehq/* to sync all repositories in an organization). Currently, this requires custom Python code to expand patterns into actual resource lists, preventing connectors from being fully declarative (manifest-only).
Current Workaround
Connectors like source-github implement custom logic to expand wildcard patterns. See source-github/source.py:215-313 which implements _get_org_repositories() to:
- Parse patterns like
airbytehq/*orairbytehq/a* - Call GitHub API to list all repositories in the organization
- Filter repositories by pattern matching
- Return expanded list of repositories
Proposed Solution
Add a declarative PatternPartitionRouter component that supports:
- Pattern expansion: Accept wildcard patterns and expand them via API calls
- API-based discovery: Call an API endpoint to list available resources
- Pattern matching: Filter resources by glob/regex patterns
- Caching: Cache expanded patterns to avoid repeated API calls
Example Configuration
partition_router:
type: PatternPartitionRouter
pattern_field: "repositories" # Config field containing patterns
discovery_requester:
type: HttpRequester
url_base: "https://api.github.com"
path: "/orgs/{{ pattern.org }}/repos"
authenticator:
type: BearerAuthenticator
api_token: "{{ config.access_token }}"
pattern_parser:
type: GlobPattern
separator: "/"
wildcard: "*"
record_selector:
extractor:
field_path: []
partition_fields:
- owner: "{{ item.owner.login }}"
- name: "{{ item.name }}"Example Use Cases
Input patterns:
airbytehq/*→ Expands to all repos in airbytehq orgairbytehq/airbyte→ Single specific repoairbytehq/a*→ All repos starting with 'a' in airbytehq orgorg1/repo1 org2/*→ Multiple patterns
Expanded partitions:
[
{"owner": "airbytehq", "name": "airbyte"},
{"owner": "airbytehq", "name": "airbyte-platform"},
{"owner": "airbytehq", "name": "airbyte-python-cdk"},
...
]
Impact
This would enable connectors like source-github, source-gitlab, and others to support wildcard patterns declaratively without custom Python code.
Related
- Survey of Manifest-Only Connectors Using Custom Components (Feature Gaps) #714 - Survey showing 74.3% of certified manifest-only connectors require custom components
- Requested by @aaronsteers in https://airbytehq-team.slack.com/archives/C08BHPUMEPJ/p1762665961303909
Code References
source-github/spec.json:80-96- Repository pattern configurationsource-github/source.py:215-313-_get_org_repositories()pattern expansion logic
Metadata
Metadata
Assignees
Labels
manifest-only feature gapsMissing capabilities that prevent connectors from being fully declarative (manifest-only)Missing capabilities that prevent connectors from being fully declarative (manifest-only)