Skip to content

Fixes #27158: ingestion slowdown from tag_usage seq-scan on Postgres#27745

Open
sonika-shah wants to merge 4 commits intomainfrom
fix-27158-tag-usage-postgres-index
Open

Fixes #27158: ingestion slowdown from tag_usage seq-scan on Postgres#27745
sonika-shah wants to merge 4 commits intomainfrom
fix-27158-tag-usage-postgres-index

Conversation

@sonika-shah
Copy link
Copy Markdown
Collaborator

@sonika-shah sonika-shah commented Apr 26, 2026

Fixes #27158

Summary

getTagsInternalByPrefix parallel seq-scans tag_usage on Postgres, causing RDS CPU spikes during ingestion.

Cause: 1.11.0 (#23054) added four partial indexes on tag_usage filtered WHERE state = 1. #24063 then dropped the matching state = 1 filter from the query (Suggested rows are valid for both classification and glossary derivation), leaving every partial index inapplicable. MySQL was unaffected — its 1.11.0 indexes were never partial (no partial-index syntax).

Fix

bootstrap/sql/migrations/native/1.12.7/postgres/schemaChanges.sql:

  1. Add non-partial single-col btrees on targetfqnhash_lower and tagfqn_lower — mirrors MySQL's idx_targetfqnhash_lower / idx_tagfqn_lower from 1.11.0.
  2. Rebuild the four 1.11.0 partials as non-partial — same shape, same INCLUDE columns; only WHERE state = 1 removed so future predicate changes can't silently invalidate them.

All DDL is CONCURRENTLY and idempotent. No Java/query change. No MySQL change needed.

Verification

50k synthetic rows in local Postgres:

Plan Buffers
Before Seq Scan (Rows Removed by Filter: 49010) 2274
After Bitmap Index Scan on idx_tag_usage_targetfqnhash_lower_pattern 1024
Prepared / generic plan Index still picked; LIKE LOWER($1) → range scan 24 (index only)

Test plan

  • Reproduced seq scan on a representative dataset
  • Verified bitmap index scan after fix (inline + prepared statement)
  • Verified rebuilt composite still serves source-filtered queries
  • Verified migration idempotency

Summary by Gitar

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 26, 2026 19:56
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 26, 2026
@sonika-shah sonika-shah force-pushed the fix-27158-tag-usage-postgres-index branch from 6243f22 to 5e01f10 Compare April 26, 2026 19:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores efficient Postgres execution for tag_usage prefix-LIKE lookups by reintroducing a usable index for the current query shape and removing the brittle coupling between query predicates and partial index predicates.

Changes:

  • Add a non-partial btree index on tag_usage.targetfqnhash_lower using text_pattern_ops to serve prefix LIKE queries.
  • Rebuild the existing tag_usage partial indexes (previously WHERE state = 1) as non-partial indexes to avoid future predicate-coupling regressions.
  • Rebuild the existing gin_tag_usage_targetfqn_trgm index without the partial predicate.

Comment thread bootstrap/sql/migrations/native/2.0.1/postgres/schemaChanges.sql Outdated
The 1.11.0 perf migration (#23054) added four `WHERE state = 1` partial
indexes on tag_usage; #24063 dropped the matching `state = 1` predicate
from getTagsInternalByPrefix (Suggested-state rows are valid for both
classification and glossary derivation), leaving every partial index
inapplicable. Postgres fell back to a parallel seq scan; MySQL was
unaffected because its 1.11.0 indexes were never partial.

Adds a non-partial single-col btree on targetfqnhash_lower (mirrors
MySQL's idx_targetfqnhash_lower) and rebuilds the four partials as
non-partial -- same shape, same INCLUDE columns, predicate coupling
removed so future query changes can't silently invalidate them.

Verified end-to-end against a local Postgres with 50k rows: seq scan
reproduced before the fix (matches reporter's EXPLAIN), bitmap index
scan after, both for inline and prepared-statement paths.
@sonika-shah sonika-shah force-pushed the fix-27158-tag-usage-postgres-index branch from 9520cc4 to bc73c29 Compare April 26, 2026 20:08
Copilot AI review requested due to automatic review settings April 26, 2026 20:08
@sonika-shah sonika-shah changed the title Fixes #27158: restore tag_usage prefix-LIKE index on Postgres Fixes #27158: ingestion slowdown from tag_usage seq-scan on Postgres Apr 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread bootstrap/sql/migrations/native/1.12.8/postgres/schemaChanges.sql Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 26, 2026

🟡 Playwright Results — all passed (14 flaky)

✅ 3984 passed · ❌ 0 failed · 🟡 14 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 297 0 2 4
🟡 Shard 2 749 0 5 8
🟡 Shard 3 745 0 1 7
🟡 Shard 4 774 0 1 18
🟡 Shard 5 686 0 1 41
🟡 Shard 6 733 0 4 8
🟡 14 flaky test(s) (passed on retry)
  • Features/DataAssetRulesDisabled.spec.ts › Verify the Messaging Service entity item action after rules disabled (shard 1, 1 retry)
  • Features/TeamsHierarchy.spec.ts › Delete Parent Team (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when owner is added (shard 2, 1 retry)
  • Features/DataQuality/TestCaseImportExportE2eFlow.spec.ts › Admin: Complete export-import-validate flow (shard 2, 1 retry)
  • Features/DataQuality/TestCaseResultPermissions.spec.ts › User with only VIEW cannot PATCH results (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Pages/Entity.spec.ts › Tier Add, Update and Remove (shard 4, 1 retry)
  • Pages/ExplorePageRightPanel.spec.ts › Should verify deleted user not visible in owner selection for table (shard 5, 1 retry)
  • Pages/Glossary.spec.ts › Async Delete - WebSocket failure triggers recovery (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/ODCSImportExport.spec.ts › Multi-object ODCS contract - object selector shows all schema objects (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Ship the fix in the 1.12.7 release line so customers on 1.12.x get it
without waiting for 2.0.x. Also closes the second structural gap with
MySQL: a non-partial single-col btree on tagfqn_lower mirroring MySQL's
idx_tagfqn_lower (1.11.0). 1.12.8 directory removed; 1.12.7's existing
schemaChanges.sql now carries the tag_usage indexes alongside the
entity_extension migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread bootstrap/sql/migrations/native/1.12.7/postgres/schemaChanges.sql
Comment thread bootstrap/sql/migrations/native/1.12.7/postgres/schemaChanges.sql
If CREATE INDEX CONCURRENTLY fails partway (lock timeout, OOM,
connection drop on a busy multi-GB tag_usage), Postgres leaves the
index in an INVALID state. A subsequent CREATE ... IF NOT EXISTS sees
the catalog row and silently skips, leaving the index permanently
broken while the migration reports success.

The four composite indexes already use the DROP-then-CREATE pattern;
applying the same to the two new single-col indexes for symmetry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 04:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

sonika-shah added a commit that referenced this pull request Apr 30, 2026
* Fixes #27158: restore tag_usage prefix-LIKE index on Postgres

The 1.11.0 perf migration (#23054) added four `WHERE state = 1` partial
indexes on tag_usage; #24063 dropped the matching `state = 1` predicate
from getTagsInternalByPrefix (Suggested-state rows are valid for both
classification and glossary derivation), leaving every partial index
inapplicable. Postgres fell back to a parallel seq scan; MySQL was
unaffected because its 1.11.0 indexes were never partial.

Adds non-partial single-col btrees on targetfqnhash_lower and
tagfqn_lower (mirror MySQL's idx_targetfqnhash_lower / idx_tagfqn_lower)
and rebuilds the four partials as non-partial -- same shape, same
INCLUDE columns, predicate coupling removed so future query changes
can't silently invalidate them.

Backport of #27745 (main) onto the 1.12.7 release line so customers on
1.12.x get the fix without waiting for 2.0.x.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Drop new indexes before CREATE to defuse failed-CONCURRENTLY edge case

If CREATE INDEX CONCURRENTLY fails partway (lock timeout, OOM,
connection drop on a busy multi-GB tag_usage), Postgres leaves the
index in an INVALID state. A subsequent CREATE ... IF NOT EXISTS sees
the catalog row and silently skips, leaving the index permanently
broken while the migration reports success.

The four composite indexes already use the DROP-then-CREATE pattern;
applying the same to the two new single-col indexes for symmetry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 30, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Adds a GIN trigram index on the tag_usage column to resolve ingestion slowdowns caused by sequential scans. This change addresses previously identified index targeting and privilege concerns.

✅ 2 resolved
Performance: GIN trigram index targets wrong column for the slow query

📄 bootstrap/sql/migrations/native/1.12.7/postgres/schemaChanges.sql:39-42
The gin_tag_usage_targetfqn_trgm index (line 41-42) is built on targetFQNHash (original-case column), but getTagsInternalByPrefix filters on targetfqnhash_lower with a prefix LIKE. This index won't serve the problematic query. Additionally, prefix LIKE is already served by the new text_pattern_ops btree index at line 18-19, making this GIN trigram index redundant for the stated fix.

If there is a future use-case for infix LIKE '%…%' searches on targetFQNHash, the trigram index makes sense — but that should be noted in the migration comment. As-is, it adds write overhead (GIN maintenance on every insert/update) with no query benefit for the fix in this PR.

Edge Case: CREATE EXTENSION pg_trgm may fail without superuser privileges

📄 bootstrap/sql/migrations/native/1.12.7/postgres/schemaChanges.sql:39
CREATE EXTENSION IF NOT EXISTS pg_trgm (line 39) requires the CREATE privilege on the database and, depending on PostgreSQL configuration, may need superuser or rds_superuser role on managed services like RDS. If the migration user lacks this privilege, the entire migration will fail — blocking the critical btree index creation that precedes it.

Consider moving the extension creation + GIN index to the end of the file (or a separate optional script) so the core fix (the text_pattern_ops indexes) isn't gated on extension availability.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hive ingestion slowdown after upgrade to 1.12.3

3 participants