-
Notifications
You must be signed in to change notification settings - Fork 114
scan: make LazyScanStream initialization non-blocking #5906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
scan: make LazyScanStream initialization non-blocking #5906
Conversation
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks for the PR! I'm going to run our benchmarks on this branch so there will be a bunch of large comments made here by a bot that gives a rough idea where there are improvements/regressions, this is just a heads up! Edit: huh something went wrong with that but at least they ran... Edit again: There seems to be some sort of race condition in our CI. Because we have to approve the workflows on outside contributors' PRs, the actions bot doesn't take the |
Signed-off-by: godnight10061 <[email protected]>
2f445e6 to
9bf6ce2
Compare
Signed-off-by: godnight10061 <[email protected]>
Signed-off-by: godnight10061 <[email protected]>
Signed-off-by: godnight10061 <[email protected]>
|
Re the benchmark that is the expected behaviour. We cannot give a fork action write permission. We use the run summary to print out the result |
|
@connortsui20 @joseph-isaacs The two failing checks don't appear to be caused by this PR. Happy to provide additional context if needed. |
|
@godnight10061 look there is not a merge conflict I also see some regressions in clickbench: https://github.com/vortex-data/vortex/actions/runs/20914237836?pr=5906 and fineweb any ideas?? |
May you tell me which exact target line you're referring to? |
|
clickbench: clickbench_{q24,q26}/duckdb:vortex-file-compressed, I have just kicked off another run so we can see if these repeat (new: https://github.com/vortex-data/vortex/actions/runs/20988924098 previous: https://github.com/vortex-data/vortex/actions/runs/20914237836) |
|
clickbench_q24/duckdb:vortex-file-compressed 58105110 4.38643e+07 1.32465 ns 🚨 Interestingly doesn't affect datafusion |
Summary
ScanBuilder::prepare()forScanBuilder::into_stream()onspawn_blockinginstead of insideStream::poll_next.Motivation
This addresses the ClickBench regression reported in #5899.
prepare()can be expensive, and running it synchronously insidepoll_nextcan cause head-of-line blocking when many scan streams are polled concurrently.Testing
cargo +nightly fmt --all --checkcargo test -p vortex-scan --libcargo clippy --locked --all-features --all-targets -p vortex-scan -- -D warningscargo hack clippy -p vortex-scan --no-default-features -- -D warningscargo semver-checks --workspace(ran in a temporary worktree after setting workspace version to0.58.0)PYO3_PYTHON=C:\Python312\python.exe cargo test --locked --workspace --all-features --exclude vortex-duckdb --exclude duckdb-benchBench
Local DataFusion ClickBench (partitioned, q23, 5 iters):
datafusion:vortex-file-compressed: 39.405s → 27.661sdatafusion:vortex-compact: 57.573s → 35.945s