Skip to content

Js/with dyn filter serialization dedupe#346

Draft
jayshrivastava wants to merge 6 commits intomainfrom
js/with-dyn-filter-serialization-dedupe
Draft

Js/with dyn filter serialization dedupe#346
jayshrivastava wants to merge 6 commits intomainfrom
js/with-dyn-filter-serialization-dedupe

Conversation

@jayshrivastava
Copy link
Copy Markdown
Collaborator

No description provided.

jayshrivastava and others added 6 commits February 14, 2026 19:16
…tion in distributed execution

- Use DeduplicatingProtoConverter in do_get.rs, stage.rs to preserve Arc-sharing
- Update distributed_codec.rs to use DefaultPhysicalProtoConverter for API compatibility
- Add compare_dynamic_filter test to verify dynamic filter pruning works
- Update join test configuration to force CollectLeft mode

This enables dynamic filter pushdown to work correctly in distributed query
execution by preserving the shared Arc<RwLock<Inner>> state across serialization
boundaries. Dynamic filters now successfully prune files on workers when using
CollectLeft join mode.

Note: Partitioned join mode still has issues due to SharedBuildAccumulator
waiting for all partitions globally rather than per-worker.
…umentation

- Use enable_join_dynamic_filter_pushdown instead of master flag
- Add comprehensive comment showing expected metrics for both scenarios
- Documents 48% reduction in bytes scanned and 100% probe hit rate with filtering
Add test_join_hive_dynamic_filter_comparison which demonstrates a known
correctness bug with dynamic filtering in distributed Partitioned hash join mode.

When dynamic filtering is enabled with mode=Partitioned:
- DataSourceExec incorrectly prunes ALL rows (0 matched instead of 2)
- Query returns 0 rows instead of expected 14 rows
- This is a correctness bug tracked at apache/datafusion#20175

Notes:
- The dynamic filter display shows "empty" because the plan is displayed on
  the coordinator, which doesn't have the dynamic filter state
- On the workers, the filter exists but incorrectly prunes all data
- Without dynamic filtering, the same query correctly returns 14 rows

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove tests/compare_dynamic_filter.rs (superseded by test in join.rs)
- Update comment to clarify that dynamic filtering works but results are incorrect

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Switch from path dependencies to git dependencies referencing the fork
at commit f3b6568af which includes dynamic filter serialization fixes.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Update lockfile to reflect git dependency change.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant