-
Notifications
You must be signed in to change notification settings - Fork 164
Adds load balanced kernel for point-point, point-linestring and linestring-linestring #1144
base: branch-23.08
Are you sure you want to change the base?
Adds load balanced kernel for point-point, point-linestring and linestring-linestring #1144
Conversation
…ti-geometry based
…y multigeometry test case.
…to fix/segment_iterator
…into improvement/load_balanced_distance_kernel
…m:isVoid/cuspatial into improvement/load_balanced_distance_kernel
| auto aggregate = BlockReduce(temp_storage).Reduce(partial, cub::Min()); | ||
|
|
||
| // atmomic with leading thread | ||
| if (cooperative_groups::this_thread_block().thread_rank() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any advantage to this over if (threadIdx.x == 0)? What code does this produce? It's impossible for threadIdx.x == 0 to have exited before any other threads in its block in this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot think of any advantage comparing to threadIdx.x == 0. I originally had hopped writing modularized kernel with cg but but later regressed. This is probably a remnant from prototyping.
| std::size_t constexpr threads_per_block = 256; | ||
| std::size_t const num_blocks = | ||
| (multilinestrings1.num_points() + threads_per_block - 1) / threads_per_block; | ||
| std::size_t num_threads = 1e8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than hard coding, this may be a job for the occupancy API...
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__OCCUPANCY.html
Description
Place holder for PRs to close during burndown period.
Closes #1061
Checklist