feat(hive): add is_partition_column flag and get_table_partition_details by Reet24-del · Pull Request #27731 · open-metadata/OpenMetadata

Reet24-del · 2026-04-25T09:59:51Z

Summary

The Hive connector mixed partition columns (folder-path segments like year, country) with regular data columns in the ingested schema, making it impossible to distinguish partition keys from data columns. This made it hard to write efficient queries and understand table organization. The two existing PRs (#27029 and #27278) addressed this but have not landed — this is a clean reimplementation.

Changes

`ingestion/src/metadata/ingestion/source/database/hive/utils.py`

Added _get_partition_column_names(rows): a strict state-machine parser that reads the # Partition Information section of DESCRIBE FORMATTED output and returns the set of partition column names. Exits cleanly on the next #-prefixed section header (e.g. # Detailed Table Information) to avoid false positives from Owner:/Location: rows.
get_columns now calls _get_partition_column_names before the main loop and attaches is_partition_column=True to every column that appears in the partition section. Non-partitioned tables get is_partition_column=False on all columns — no regression to existing behaviour.

`ingestion/src/metadata/ingestion/source/database/hive/metadata.py`

Added get_table_partition_details(table_name, schema_name, inspector) to HiveSource. It reads the is_partition_column flag produced by get_columns and builds a TablePartition with PartitionColumnDetails for each partition key, using PartitionIntervalTypes.COLUMN_VALUE (Hive partitions are value-based folder segments, not time/integer intervals).
Returns (False, None) gracefully for non-partitioned tables and on any unexpected error — ingestion never breaks for users without partitioned tables.

`ingestion/tests/unit/topology/database/test_hive.py`

Added TestHivePartitionKeyFlag with 8 unit tests covering:

_get_partition_column_names: no-partition table, single key, multiple keys, correct exit on the next section header.
get_columns: single partition key flagged, non-partitioned table, multiple partition keys — all with correct is_partition_column values.

How to test

cd ingestion
python -m pytest tests/unit/topology/database/test_hive.py::TestHivePartitionKeyFlag -v

All 8 new tests pass alongside the existing Hive test suite.

Resolves open-metadata#26712 **Problem:** The Hive connector mixed partition columns (folder-path segments like year, country) with regular data columns in the ingested schema. There was no way for users to distinguish which columns are partition keys, making it hard to write efficient queries or understand table organization. **Changes:** ingestion/src/metadata/ingestion/source/database/hive/utils.py - Add _get_partition_column_names(rows): a strict state-machine parser that reads the '# Partition Information' section of DESCRIBE FORMATTED output and returns a set of partition column names. Exits cleanly on the next '#'-prefixed section header to avoid false positives from Owner:/Location: rows. - get_columns now calls _get_partition_column_names before the main loop and attaches is_partition_column=True to every column that appears in the partition section. Non-partitioned tables get is_partition_column=False on all columns (no regression). ingestion/src/metadata/ingestion/source/database/hive/metadata.py - Add get_table_partition_details(table_name, schema_name, inspector) to HiveSource. It reads the is_partition_column flag produced by get_columns and builds a TablePartition with PartitionColumnDetails for each partition key, using PartitionIntervalTypes.COLUMN_VALUE (Hive partitions are value-based folder segments, not time/integer intervals). - Returns (False, None) gracefully for non-partitioned tables and on any unexpected error. ingestion/tests/unit/topology/database/test_hive.py - Add TestHivePartitionKeyFlag with 8 unit tests covering: - _get_partition_column_names: no-partition table, single key, multiple keys, and correct exit on the next section header. - get_columns: single partition key flagged, non-partitioned table, multiple partition keys, all with correct is_partition_column values.

github-actions · 2026-04-25T10:00:44Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

gitar-bot · 2026-04-25T10:02:13Z


+# Sentinel header names that appear in DESCRIBE FORMATTED output; these rows
+# are metadata rows, not real columns.
+_DESCRIBE_SECTION_HEADERS = {"# Partition Information", "# col_name"}


💡 Quality: Unused module-level constant _DESCRIBE_SECTION_HEADERS

The constant _DESCRIBE_SECTION_HEADERS defined at line 27 is never referenced anywhere in the codebase. It appears to be a leftover from an earlier design where section header detection was centralised. Since _get_partition_column_names and get_columns both use inline string literals instead, this constant is dead code.

Suggested fix:

Remove the unused constant: -_DESCRIBE_SECTION_HEADERS = {"# Partition Information", "# col_name"}

_{Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion}

@Reet24-del can you check this comment

gitar-bot · 2026-04-25T10:02:14Z

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Implements is_partition_column and get_table_partition_details to enhance Hive metadata handling. Remove the unused module-level constant _DESCRIBE_SECTION_HEADERS to clean up the implementation.

💡 Quality: Unused module-level constant _DESCRIBE_SECTION_HEADERS

📄 ingestion/src/metadata/ingestion/source/database/hive/utils.py:27

The constant _DESCRIBE_SECTION_HEADERS defined at line 27 is never referenced anywhere in the codebase. It appears to be a leftover from an earlier design where section header detection was centralised. Since _get_partition_column_names and get_columns both use inline string literals instead, this constant is dead code.

Suggested fix

Remove the unused constant:

-_DESCRIBE_SECTION_HEADERS = {"# Partition Information", "# col_name"}

🤖 Prompt for agents

Code Review: Implements `is_partition_column` and `get_table_partition_details` to enhance Hive metadata handling. Remove the unused module-level constant `_DESCRIBE_SECTION_HEADERS` to clean up the implementation.

1. 💡 Quality: Unused module-level constant `_DESCRIBE_SECTION_HEADERS`
   Files: ingestion/src/metadata/ingestion/source/database/hive/utils.py:27

   The constant `_DESCRIBE_SECTION_HEADERS` defined at line 27 is never referenced anywhere in the codebase. It appears to be a leftover from an earlier design where section header detection was centralised. Since `_get_partition_column_names` and `get_columns` both use inline string literals instead, this constant is dead code.

   Suggested fix:
   Remove the unused constant:
   
   -_DESCRIBE_SECTION_HEADERS = {"# Partition Information", "# col_name"}

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

github-actions · 2026-04-25T10:03:15Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Reet24-del requested a review from a team as a code owner April 25, 2026 09:59

Reet24-del mentioned this pull request Apr 25, 2026

feat: Add metadata flag to identify Hive partition keys #26712

Open

gitar-bot Bot reviewed Apr 25, 2026

View reviewed changes

Merge branch 'main' into feat/hive-partition-key-flag

51b1c5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hive): add is_partition_column flag and get_table_partition_details#27731

feat(hive): add is_partition_column flag and get_table_partition_details#27731
Reet24-del wants to merge 2 commits intoopen-metadata:mainfrom
Reet24-del:feat/hive-partition-key-flag

Reet24-del commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

gitar-bot Bot Apr 25, 2026

Uh oh!

ulixius9 Apr 25, 2026

Uh oh!

gitar-bot Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Reet24-del commented Apr 25, 2026

Summary

Changes

ingestion/src/metadata/ingestion/source/database/hive/utils.py

ingestion/src/metadata/ingestion/source/database/hive/metadata.py

ingestion/tests/unit/topology/database/test_hive.py

How to test

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

gitar-bot Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

ulixius9 Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`ingestion/src/metadata/ingestion/source/database/hive/utils.py`

`ingestion/src/metadata/ingestion/source/database/hive/metadata.py`

`ingestion/tests/unit/topology/database/test_hive.py`

gitar-bot Bot commented Apr 25, 2026 •

edited

Loading