[GLUTEN-11683][VL] Add Parquet type widening support by baibaichen · Pull Request #11719 · apache/gluten

baibaichen · 2026-03-08T10:18:53Z

What changes were proposed in this pull request?

Fix SPARK-18108, parquet-thrift compatibility, and add Parquet type widening support to Velox. Enables 79 of 84 tests in GlutenParquetTypeWideningSuite.

Changes

Fix SPARK-18108 (SubstraitToVeloxPlan.cc):
Exclude partition columns from HiveTableHandle.dataColumns() to prevent type validation failures when partition column types differ from file column types.
Point Velox to type widening branch (get-velox.sh):
Use baibaichen/pr3/parquet-type-widening Velox branch which includes:
- Upstream PR #15173 fix for parquet-thrift compatibility (replaces OAP INT narrowing)
- INT→Decimal, INT→Double, Float→Double widening support
- Decimal→Decimal widening support
Update VeloxTestSettings (spark40 + spark41):
Remove excludes for widening tests now passing with the updated Velox branch.
Disable native writer (GlutenParquetTypeWideningSuite.scala):
This suite tests the READ path only. Disable native writer so Spark's writer produces correct V2 encodings (DELTA_BINARY_PACKED/DELTA_BYTE_ARRAY).
Override 33 tests with testGluten (GlutenParquetTypeWideningSuite.scala spark40/41):
Velox native reader always behaves like Spark's vectorized reader (no parquet-mr fallback). The parent tests use withAllParquetReaders which includes vectorized=false where parquet-mr allows conversions the vectorized reader rejects. We override 33 parent tests with testGluten and expectError=true to verify Velox correctly rejects these incompatible conversions:
- 16 unsupported INT→Decimal conversions
- 6 decimal precision narrowing cases
- 11 decimal precision+scale narrowing/mixed cases
The copied private methods (checkAllParquetReaders, readParquetFiles, writeParquetFiles, etc.) mirror the parent's structure but remove the withAllParquetReaders wrapper.

Test Results

	Before	This PR
✅ Passed	21	79 (+58)
❌ Ignored	63	38 (-25)

38 ignored = 33 parent tests excluded (overridden by testGluten) + 5 truly excluded:

4 DELTA_BYTE_ARRAY encoding unsupported for FIXED_LEN_BYTE_ARRAY decimals
1 parquet-mr overflow fallback (no Velox equivalent)

Additionally fixed (not in TypeWideningSuite):

SPARK-18108 Parquet reader fails when data column types conflict with partition ones ✅
Read Parquet file generated by parquet-thrift ✅

Fixes #11683

How was this patch tested?

Local tests: TypeWideningSuite 79 pass / 38 ignored (spark40 and spark41).

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with GitHub Copilot.

github-actions · 2026-03-08T10:19:20Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-03-09T03:05:44Z

Run Gluten Clickhouse CI on x86

…olumns When Gluten creates HiveTableHandle, it was passing all columns (including partition columns) as dataColumns. This caused Velox's convertType() to validate partition column types against the Parquet file's physical types, failing when they differ (e.g., LongType in file vs IntegerType from partition inference). Fix: build dataColumns excluding partition columns (ColumnType::kPartitionKey). Partition column values come from the partition path, not from the file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

With OAP INT narrowing commit replaced by upstream Velox PR #15173: - Remove 2 excludes now passing: LongType->IntegerType, LongType->DateType - Add 2 excludes for new failures: IntegerType->ShortType (OAP removed) Exclude 63 (net unchanged: -2 +2). Test results: 21 pass / 63 ignored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

With Velox PR3 type widening (INT->Decimal, INT->Double, Float->Double): - Remove 15 excludes for widening tests now passing Remaining 48 excludes. Test results: 36 pass / 48 ignored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This suite tests the READ path only. Disable native writer so Spark's writer produces correct V2 encodings (DELTA_BINARY_PACKED/DELTA_BYTE_ARRAY). - Remove 10 excludes for decimal widening tests now passing Remaining 38 excludes: - 34: Velox native reader rejects incompatible decimal conversions regardless of reader config (no parquet-mr fallback) - 4: Velox does not support DELTA_BYTE_ARRAY encoding Test results: 46 pass / 38 ignored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Velox native reader always behaves like Spark's vectorized reader, so tests that rely on parquet-mr behavior (vectorized=false) fail. Instead of just excluding these 33 tests, add testGluten overrides with expectError=true to verify Velox correctly rejects incompatible conversions. - 16 unsupported INT->Decimal conversions - 6 decimal precision narrowing cases - 11 decimal precision+scale narrowing/mixed cases VeloxTestSettings: 38 excludes (parent tests) + 33 testGluten overrides Test results: 79 pass / 38 ignored (33 excluded parent + 5 truly excluded)

github-actions bot added CORE works for Gluten Core BUILD VELOX labels Mar 8, 2026

baibaichen force-pushed the pr3/parquet-type-widening branch 3 times, most recently from abbc057 to c2d50e1 Compare March 10, 2026 14:47

baibaichen mentioned this pull request Mar 11, 2026

[GLUTEN-11683][VL] Fix SPARK-18108 and parquet-thrift compatibility #11689

Closed

baibaichen and others added 5 commits March 11, 2026 16:21

Point Velox to PR3 branch with parquet type widening support

017e014

baibaichen force-pushed the pr3/parquet-type-widening branch from c2d50e1 to f926c56 Compare March 11, 2026 10:02

baibaichen force-pushed the pr3/parquet-type-widening branch from f926c56 to b376d6d Compare March 11, 2026 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-11683][VL] Add Parquet type widening support#11719

[GLUTEN-11683][VL] Add Parquet type widening support#11719
baibaichen wants to merge 6 commits intoapache:mainfrom
baibaichen:pr3/parquet-type-widening

baibaichen commented Mar 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

baibaichen commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Changes

Test Results

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

baibaichen commented Mar 8, 2026 •

edited

Loading