Js/with dyn filter serialization dedupe#346
Draft
jayshrivastava wants to merge 6 commits intomainfrom
Draft
Conversation
…tion in distributed execution - Use DeduplicatingProtoConverter in do_get.rs, stage.rs to preserve Arc-sharing - Update distributed_codec.rs to use DefaultPhysicalProtoConverter for API compatibility - Add compare_dynamic_filter test to verify dynamic filter pruning works - Update join test configuration to force CollectLeft mode This enables dynamic filter pushdown to work correctly in distributed query execution by preserving the shared Arc<RwLock<Inner>> state across serialization boundaries. Dynamic filters now successfully prune files on workers when using CollectLeft join mode. Note: Partitioned join mode still has issues due to SharedBuildAccumulator waiting for all partitions globally rather than per-worker.
…umentation - Use enable_join_dynamic_filter_pushdown instead of master flag - Add comprehensive comment showing expected metrics for both scenarios - Documents 48% reduction in bytes scanned and 100% probe hit rate with filtering
Add test_join_hive_dynamic_filter_comparison which demonstrates a known correctness bug with dynamic filtering in distributed Partitioned hash join mode. When dynamic filtering is enabled with mode=Partitioned: - DataSourceExec incorrectly prunes ALL rows (0 matched instead of 2) - Query returns 0 rows instead of expected 14 rows - This is a correctness bug tracked at apache/datafusion#20175 Notes: - The dynamic filter display shows "empty" because the plan is displayed on the coordinator, which doesn't have the dynamic filter state - On the workers, the filter exists but incorrectly prunes all data - Without dynamic filtering, the same query correctly returns 14 rows Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove tests/compare_dynamic_filter.rs (superseded by test in join.rs) - Update comment to clarify that dynamic filtering works but results are incorrect Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Switch from path dependencies to git dependencies referencing the fork at commit f3b6568af which includes dynamic filter serialization fixes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Update lockfile to reflect git dependency change. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.