Skip to content

Fixes #23568: Implement Configurable DB Query Retry Mechanism#27733

Open
mohitjeswani01 wants to merge 5 commits intoopen-metadata:mainfrom
mohitjeswani01:feat/23568-db-retry-mechanism
Open

Fixes #23568: Implement Configurable DB Query Retry Mechanism#27733
mohitjeswani01 wants to merge 5 commits intoopen-metadata:mainfrom
mohitjeswani01:feat/23568-db-retry-mechanism

Conversation

@mohitjeswani01
Copy link
Copy Markdown

@mohitjeswani01 mohitjeswani01 commented Apr 25, 2026

Describe your changes:

Fixes #23568

This contribution is part of the WeMakeDevs X OpenMetadata Hackathon.

I worked on implementing a centralized, configurable database query retry mechanism because transient DBA-enforced statement timeouts (QueryCanceled / SQLSTATE 57014) were causing the ingestion framework to swallow OperationalError exceptions, resulting in incomplete metadata capture in Postgres and Greenplum environments.

The Implementation:

  1. Schema Update: Added queryRetryConfig to databaseServiceMetadataPipeline.json allowing operators to configure backoff parameters (default: enabled=False to prevent breaking existing deployments).
  2. Reliability Engine: Created @db_retry in metadata.utils.db_retry featuring exponential backoff with jitter and a 3-layer transient error detection system (SQLSTATE, Exception Names, DBAPI connection state).
  3. Framework Integration: Surgically applied the decorator strictly to idempotent metadata SELECT queries across base classes (CommonDbSourceService, SqlColumnHandlerMixin) and dialect-specific overrides (Postgres, Greenplum).

How did you test your changes?

  • Wrote and executed 50 new unit tests in test_db_retry.py verifying SQLSTATE detection priorities, backoff math boundaries, config extraction, and decorator metadata preservation. All 50 tests are passing.
  • Verified framework compliance with make py_format and achieved a 10.00/10 pylint score on all modified files.

Screenshots:
Here is the successful test run output proving the 50/50 green sweep across the retry utility tests:
image

Also ran make py_format and make lint on the code
image

Type of change:

  • Improvement
  • New feature

Checklist:

  • I have read the CONTRIBUTING document.

  • My PR title is Fixes #23568: Implement Configurable DB Query Retry Mechanism

  • I have commented on my code, particularly in hard-to-understand areas.

  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed. (Note: Schema additive only, no migration required).

  • I have added tests around the new logic.

  • For connector/ingestion changes: I updated the documentation.

  • The issue properly describes why the new feature is needed, what's the goal, and how we are building it. Any discussion
    or decision-making process is reflected in the issue.

  • I have updated the documentation.

  • I have added tests around the new logic.


Summary by Gitar

  • Reliability Improvements:
    • Added generator support via inspect.isgeneratorfunction and mid-iteration materialization to ensure robustness during retries.
    • Added logging to _normalize_config to track when configuration values are clamped to safe ranges.
  • Security:
    • Implemented _sanitize_exc to prevent potential SQL or parameter data leaks in logs.

This will update automatically on new commits.

…ilures (open-metadata#23568)

Added @db_retry decorator with configurable exponential backoff and SQLSTATE detection. Applied to base classes (CommonDbSource, SqlColumnHandlerMixin) and dialect-specific overrides (Postgres, Greenplum).
Copilot AI review requested due to automatic review settings April 25, 2026 13:28
@mohitjeswani01 mohitjeswani01 requested a review from a team as a code owner April 25, 2026 13:28
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/utils/db_retry.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Implements a centralized, configurable retry mechanism for transient database query failures during ingestion to reduce incomplete metadata capture due to statement timeouts and similar transient DB errors.

Changes:

  • Added queryRetryConfig to the database ingestion pipeline JSON schema.
  • Introduced metadata.utils.db_retry with transient error detection + exponential backoff w/ jitter.
  • Applied @db_retry to selected idempotent query methods across common and Postgres/Greenplum sources, and added unit tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json Adds queryRetryConfig schema definition and exposes it on the pipeline config.
ingestion/src/metadata/utils/db_retry.py New retry decorator implementation (transient detection + backoff/jitter + logging).
ingestion/src/metadata/ingestion/source/database/sql_column_handler.py Wraps column fetch/sample query helpers with @db_retry.
ingestion/src/metadata/ingestion/source/database/common_db_source.py Applies @db_retry to key schema/table metadata queries.
ingestion/src/metadata/ingestion/source/database/multi_db_source.py Adds retry to the raw DB-name query executor.
ingestion/src/metadata/ingestion/source/database/postgres/metadata.py Adds retry to Postgres-specific metadata SELECT queries.
ingestion/src/metadata/ingestion/source/database/greenplum/metadata.py Adds retry to Greenplum-specific metadata SELECT queries.
ingestion/tests/unit/utils/test_db_retry.py Adds unit tests covering transient detection, config extraction, and backoff behavior.

Comment thread ingestion/src/metadata/utils/db_retry.py Outdated
Comment thread ingestion/src/metadata/utils/db_retry.py Outdated
Comment thread ingestion/src/metadata/utils/db_retry.py Outdated
Comment thread ingestion/src/metadata/ingestion/source/database/common_db_source.py Outdated
Comment thread ingestion/src/metadata/utils/db_retry.py Outdated
…onfig clamping

- Add inspect.isgeneratorfunction() to handle generator methods (gitar-bot)
- Materialize generators via list() inside retry loop to catch mid-iteration errors
- Add _sanitize_exc() to prevent SQL/parameter leaks in logs (Copilot)
- Add _normalize_config() to clamp invalid retry values (Copilot)
- Swap decorator order on get_schema_definition (Copilot)
- Remove @db_retry from non-idempotent _get_stored_procedures_internal (gitar-bot)
- Fix log wording from 'attempt' to 'retry' (Copilot)
- Commit generated Pydantic models from make generate (gitar-bot)
- 50 unit tests passing (up from 34)
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment thread ingestion/src/metadata/utils/db_retry.py
Copilot AI review requested due to automatic review settings April 25, 2026 14:42
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

- Emit logger.warning when queryRetryConfig values are clamped to safe ranges (gitar-bot)
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings April 25, 2026 15:25
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 25, 2026

Code Review 🚫 Blocked 3 resolved / 4 findings

Implement configurable DB query retry mechanism with fixes for decorator generator compatibility, non-idempotent procedure handling, and silent configuration clamping. The feature remains inoperable because the generated Python models in databaseServiceMetadataPipeline are not updated.

🚨 Bug: Generated Python models not updated — feature is inoperable

📄 openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json:44-58 📄 openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json:222-226 📄 ingestion/src/metadata/utils/db_retry.py:77-91

The JSON schema in databaseServiceMetadataPipeline.json was updated with the queryRetryConfig definition, but make generate was not run (or the generated output was not committed). The queryRetryConfig attribute does not exist on any generated Pydantic model in the codebase.

Since _get_retry_config uses getattr(source_config, 'queryRetryConfig', None), it will always return None in production, meaning the retry mechanism can never be enabled regardless of user configuration. The entire feature is dead code.

Per the project's schema-first development guidelines: changes must originate in openmetadata-spec/ JSON schemas, followed by make generate.

✅ 3 resolved
Bug: db_retry decorator is broken for generator functions

📄 ingestion/src/metadata/utils/db_retry.py:107-121 📄 ingestion/src/metadata/ingestion/source/database/common_db_source.py:401-415 📄 ingestion/src/metadata/ingestion/source/database/multi_db_source.py:37-44 📄 ingestion/src/metadata/ingestion/source/database/postgres/metadata.py:292-306 📄 ingestion/tests/unit/utils/test_db_retry.py:145-159
The db_retry wrapper uses return func(self, *args, **kwargs) inside a try/except block. For generator functions, calling func() merely creates a generator object — no code executes until the caller iterates it. This means the try/except in the decorator never catches exceptions from generators, and the retry logic is completely non-functional.

Three of the seven decorated methods are generators:

  • get_tables_name_and_type (yields table names)
  • _execute_database_query (yields database names)
  • _get_stored_procedures_internal (yields stored procedures)

All DB errors from these methods will bypass the retry mechanism entirely and propagate uncaught to the caller, giving a false sense of reliability.

Bug: @db_retry applied to non-idempotent stored procedure query

📄 ingestion/src/metadata/ingestion/source/database/postgres/metadata.py:292-306
_get_stored_procedures_internal is decorated with @db_retry, but the PR description states the decorator is "strictly applied to idempotent metadata SELECT queries." While listing stored procedures is likely a read-only operation, this method's internal error handling (catching exceptions per-row and calling self.status.failed()) means a retry after partial iteration could report duplicate failures. Verify this is intentional.

Quality: _normalize_config silently clamps invalid values without logging

📄 ingestion/src/metadata/utils/db_retry.py:116-123
_normalize_config clamps out-of-range config values (e.g. maxRetries=01, negative backoff → 0.1) without emitting any log message. An operator who sets maxRetries: 0 expecting "execute once, no retries" will silently get 1 retry (2 total attempts) with no diagnostic feedback. Adding a logger.warning when values are clamped would aid debugging misconfigured pipelines.

🤖 Prompt for agents
Code Review: Implement configurable DB query retry mechanism with fixes for decorator generator compatibility, non-idempotent procedure handling, and silent configuration clamping. The feature remains inoperable because the generated Python models in databaseServiceMetadataPipeline are not updated.

1. 🚨 Bug: Generated Python models not updated — feature is inoperable
   Files: openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json:44-58, openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json:222-226, ingestion/src/metadata/utils/db_retry.py:77-91

   The JSON schema in `databaseServiceMetadataPipeline.json` was updated with the `queryRetryConfig` definition, but `make generate` was not run (or the generated output was not committed). The `queryRetryConfig` attribute does not exist on any generated Pydantic model in the codebase.
   
   Since `_get_retry_config` uses `getattr(source_config, 'queryRetryConfig', None)`, it will **always return None** in production, meaning the retry mechanism can never be enabled regardless of user configuration. The entire feature is dead code.
   
   Per the project's schema-first development guidelines: changes must originate in `openmetadata-spec/` JSON schemas, **followed by `make generate`**.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@mohitjeswani01
Copy link
Copy Markdown
Author

Reviewing your code

Code Review 🚫 Blocked 3 resolved / 4 findings
Implement configurable DB query retry mechanism with fixes for decorator generator compatibility, non-idempotent procedure handling, and silent configuration clamping. The feature remains inoperable because the generated Python models in databaseServiceMetadataPipeline are not updated.

🚨 Bug: Generated Python models not updated — feature is inoperable
📄 openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json:44-58 📄 openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json:222-226 📄 ingestion/src/metadata/utils/db_retry.py:77-91

The JSON schema in databaseServiceMetadataPipeline.json was updated with the queryRetryConfig definition, but make generate was not run (or the generated output was not committed). The queryRetryConfig attribute does not exist on any generated Pydantic model in the codebase.

Since _get_retry_config uses getattr(source_config, 'queryRetryConfig', None), it will always return None in production, meaning the retry mechanism can never be enabled regardless of user configuration. The entire feature is dead code.

Per the project's schema-first development guidelines: changes must originate in openmetadata-spec/ JSON schemas, followed by make generate.

✅ 3 resolved
✅ Bug: db_retry decorator is broken for generator functions

📄 ingestion/src/metadata/utils/db_retry.py:107-121 📄 ingestion/src/metadata/ingestion/source/database/common_db_source.py:401-415 📄 ingestion/src/metadata/ingestion/source/database/multi_db_source.py:37-44 📄 ingestion/src/metadata/ingestion/source/database/postgres/metadata.py:292-306 📄 ingestion/tests/unit/utils/test_db_retry.py:145-159
The db_retry wrapper uses return func(self, *args, **kwargs) inside a try/except block. For generator functions, calling func() merely creates a generator object — no code executes until the caller iterates it. This means the try/except in the decorator never catches exceptions from generators, and the retry logic is completely non-functional.
Three of the seven decorated methods are generators:

  • get_tables_name_and_type (yields table names)
  • _execute_database_query (yields database names)
  • _get_stored_procedures_internal (yields stored procedures)

All DB errors from these methods will bypass the retry mechanism entirely and propagate uncaught to the caller, giving a false sense of reliability.

✅ Bug: @db_retry applied to non-idempotent stored procedure query

📄 ingestion/src/metadata/ingestion/source/database/postgres/metadata.py:292-306
_get_stored_procedures_internal is decorated with @db_retry, but the PR description states the decorator is "strictly applied to idempotent metadata SELECT queries." While listing stored procedures is likely a read-only operation, this method's internal error handling (catching exceptions per-row and calling self.status.failed()) means a retry after partial iteration could report duplicate failures. Verify this is intentional.

✅ Quality: _normalize_config silently clamps invalid values without logging

📄 ingestion/src/metadata/utils/db_retry.py:116-123
_normalize_config clamps out-of-range config values (e.g. maxRetries=01, negative backoff → 0.1) without emitting any log message. An operator who sets maxRetries: 0 expecting "execute once, no retries" will silently get 1 retry (2 total attempts) with no diagnostic feedback. Adding a logger.warning when values are clamped would aid debugging misconfigured pipelines.

🤖 Prompt for agents

Code Review: Implement configurable DB query retry mechanism with fixes for decorator generator compatibility, non-idempotent procedure handling, and silent configuration clamping. The feature remains inoperable because the generated Python models in databaseServiceMetadataPipeline are not updated.

1. 🚨 Bug: Generated Python models not updated — feature is inoperable
   Files: openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json:44-58, openmetadata-spec/src/main/resources/json/schema/metadataIngestion/databaseServiceMetadataPipeline.json:222-226, ingestion/src/metadata/utils/db_retry.py:77-91

   The JSON schema in `databaseServiceMetadataPipeline.json` was updated with the `queryRetryConfig` definition, but `make generate` was not run (or the generated output was not committed). The `queryRetryConfig` attribute does not exist on any generated Pydantic model in the codebase.
   
   Since `_get_retry_config` uses `getattr(source_config, 'queryRetryConfig', None)`, it will **always return None** in production, meaning the retry mechanism can never be enabled regardless of user configuration. The entire feature is dead code.
   
   Per the project's schema-first development guidelines: changes must originate in `openmetadata-spec/` JSON schemas, **followed by `make generate`**.

Options
Display: compact → Showing less information.

Comment with these commands to change:

Compact

gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

bro i ran make generate twice locally
image

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Retry transient DB query failures (e.g. canceling statement due to user request) in ingestion DAGs / connectors

2 participants