feat: remove full path from partition locations#1527
feat: remove full path from partition locations#1527milenkovicm merged 16 commits intoapache:mainfrom
Conversation
f4ae4e9 to
15839b9
Compare
15839b9 to
3903863
Compare
|
@sqlbenchmark tpch --iterations 3 --scale-factor 1 |
Ballista TPC-H Benchmark ResultsPR: #1527 - remove partition from partition location Query Comparison
Total: Main=4041.30ms, PR=3885.90ms (-3.8%) Automated benchmark run by dfbench |
|
@sqlbenchmark tpch --iterations 3 --scale-factor 10 |
Ballista TPC-H Benchmark ResultsPR: #1527 - remove partition from partition location Query Comparison
Total: Main=39984.00ms, PR=39226.30ms (-1.9%) Automated benchmark run by dfbench |
3903863 to
0edddcb
Compare
|
@sqlbenchmark tpch --iterations 5 --scale-factor 1 |
Ballista TPC-H Benchmark ResultsPR: #1527 - remove partition from partition location Query Comparison
Total: Main=4045.40ms, PR=4118.20ms (+1.8%) Automated benchmark run by dfbench |
|
@sqlbenchmark tpch --iterations 5 --scale-factor 10 |
Ballista TPC-H Benchmark ResultsPR: #1527 - remove partition from partition location Query Comparison
Total: Main=41212.10ms, PR=40436.90ms (-1.9%) Automated benchmark run by dfbench |
|
@sqlbenchmark help |
@sqlbenchmark usageThis bot is whitelisted for DataFusion committers. CommandsExamplesAutomated by dfbench |
|
@sqlbenchmark tpch --iterations 3 --scale-factor 1 --baseline branch-51 |
Ballista TPC-H Benchmark ResultsPR: #1527 - remove partition from partition location ResultsCould not parse structured results. Raw output: Main branch: PR branch: Automated benchmark run by dfbench |
|
@sqlbenchmark tpch --iterations 3 --scale-factor 1 --baseline branch-52 |
Ballista TPC-H Benchmark ResultsPR: #1527 - remove partition from partition location Query Comparison
Total: Main=3979.40ms, PR=3859.60ms (-3.0%) Automated benchmark run by dfbench |
|
hey @martin-g any chance you can have a look whenever you get time |
|
I'll try to do it tomorrow! |
| // transfers the entire file, which contains all partitions. | ||
| // Use flight transport (do_get) for sort-based shuffle. | ||
| if is_sort_shuffle_output(data_path) { | ||
| if is_sort_shuffle_output(&path) { |
There was a problem hiding this comment.
shouldn't this use is_sort_shuffle now ?
| partition, | ||
| metrics: ExecutionPlanMetricsSet::new(), | ||
| properties, | ||
| work_dir: "".to_string(), // to be updated at the executor side |
There was a problem hiding this comment.
IMO it would be safer to make it an Option<String> with None as default.
This way if it is not updated later to Some you can fail early.
Currently there is a chance silently to use the current working directory if it is not updated for any reason.
There was a problem hiding this comment.
It copy of shuffle writer approach, but i agree it makes sense
| // replace this check with open, and check for error | ||
| // | ||
| // Check if this is a sort-based shuffle output (has index file) | ||
| if sort_shuffle_enabled && is_sort_shuffle_output(data_path) { |
There was a problem hiding this comment.
Should this use location.is_sort_shuffle ?
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
there are going to be follow up changes after this one, adding subdiroctory to store shuffle files and session id between work dir and job
Are there any user-facing changes?