Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Jan 15, 2026

Rationale for this change

Currently, when fuzzing a Parquet file, proper errors are ignored since the file is most of the time simply invalid: it's ok to get an error.

However, if reading a Parquet row group was successful, the Parquet reader should have ensured that the resulting Table is structurally sound: calling Table::Validate should succeed, and if it fails then it should report a bug.

What changes are included in this PR?

Call Table::Validate on a successful read from the Parquet fuzz target, and abort if validation fails.

Table::ValidateFull, however, is allowed to fail, as for example a Parquet UTF8 column could contain invalid UTF8 bytes, and we don't detect that when decoding.

Are these changes tested?

By existing tests and fuzz regression files.

Actual running this on OSS-Fuzz will tell us if this is really a good idea.

Are there any user-facing changes?

No.

@pitrou pitrou force-pushed the gh48560-fuzz-table-validation branch from 7854761 to 1f1d34a Compare January 15, 2026 10:53
@pitrou
Copy link
Member Author

pitrou commented Jan 15, 2026

@github-actions crossbow submit fuzz

@pitrou pitrou marked this pull request as ready for review January 15, 2026 10:53
@pitrou pitrou requested a review from wgtmac as a code owner January 15, 2026 10:54
@github-actions
Copy link

Revision: 1f1d34a

Submitted crossbow builds: ursacomputing/crossbow @ actions-ccbec654ec

Task Status
test-build-cpp-fuzz GitHub Actions

@pitrou pitrou requested a review from adamreeve January 15, 2026 13:56
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 16, 2026
@pitrou pitrou merged commit 34045db into apache:main Jan 19, 2026
49 of 50 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jan 19, 2026
@pitrou pitrou deleted the gh48560-fuzz-table-validation branch January 19, 2026 09:25
@pitrou
Copy link
Member Author

pitrou commented Jan 19, 2026

Thanks for the reviews. I've merged this, let's "brace for impact" as the saying goes.

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 34045db.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants