Skip to content

refactor: extract FlatRTree to shared header + add KNN search primitive#796

Open
pierre-warnier wants to merge 2 commits intoduckdb:v1.5-variegatafrom
pierre-warnier:pr/knn-1-flatrtree
Open

refactor: extract FlatRTree to shared header + add KNN search primitive#796
pierre-warnier wants to merge 2 commits intoduckdb:v1.5-variegatafrom
pierre-warnier:pr/knn-1-flatrtree

Conversation

@pierre-warnier
Copy link
Copy Markdown

Preparation for the upcoming KNN spatial join operator. This PR is pure plumbing — no new user-visible functionality, no behavior change to the existing SPATIAL_JOIN.

Depends on: #795 (GCC 11+ build fix). Merging PR #795 first is recommended so this branch can rebase cleanly onto v1.5-variegata.

Changes

1. Extract FlatRTree from anonymous namespace

FlatRTree (and helpers typed_view, FlatRTreeScanState) were defined inline in spatial_join_physical.cpp inside an anonymous namespace. They move to a new public header src/spatial/operators/flat_rtree.hpp so the upcoming KNN operator can reuse them.

  • spatial_join_physical.cpp loses 361 lines, gains 1 #include
  • flat_rtree.hpp is new (418 lines)
  • No logic changes — this is a cut-paste refactor

2. Add KNN search primitive

FlatRTree gains a KNNSearch() method implementing Hjaltason-Samet best-first traversal (priority queue on squared bbox distance). The new FlatRTreeKNNState holds the scan state (heap, result vector).

This is not wired into any user-facing function yet — it's the building block for the next PR in the stack.

3. Add Box2D::MinDistanceSquared helpers

Two overloads on Box2D<T> in src/spatial/geometry/bbox.hpp:

  • MinDistanceSquared(const V &point) — point to box
  • MinDistanceSquared(const Box &other) — box to box

Both return 0 for inside/overlapping. Used by the KNN priority queue's lower-bound distance. Zero existing callers, so no risk of affecting SPATIAL_JOIN.

Verification

  • SPATIAL_JOIN tests unchanged and green: test/sql/join/* — 133 assertions pass
  • Built on GCC 11.4.0, no warnings
  • git diff --stat:
    src/spatial/geometry/bbox.hpp                   |  34 ++
    src/spatial/operators/flat_rtree.hpp            | 418 ++++++++++++++++++++++++
    src/spatial/operators/spatial_join_physical.cpp | 363 +-------------------
    3 files changed, 454 insertions(+), 361 deletions(-)
    

What this PR does NOT do

  • No new user-facing functions
  • No new SQL syntax
  • No optimizer changes
  • No tests added (the primitive is exercised by the operator PR's tests)

Why split it out

Reviewing a 1600+ line KNN operator PR is painful. This PR lets reviewers focus on the code movement and the new primitive in isolation. The operator PR then only adds the integration.

The Copy() method returns a unique_ptr<BindData> which must be
converted to unique_ptr<FunctionData>. Under GCC 11+ this
conversion requires explicit std::move() — newer compilers
reject the implicit conversion as a copy-initialization.

Fixes: error: could not convert 'copy' from
'unique_ptr<ST_Distance_Sphere::BindData>' to 'unique_ptr<FunctionData>'
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the spatial join implementation by extracting the internal FlatRTree implementation into a reusable header, and adds foundational primitives needed for a future KNN spatial join operator (plus a small GCC11-related unique_ptr return fix).

Changes:

  • Moved FlatRTree, typed_view, and scan state types out of spatial_join_physical.cpp into a new shared header flat_rtree.hpp.
  • Added a best-first (Hjaltason–Samet) FlatRTree::KNNSearch() primitive with a corresponding FlatRTreeKNNState.
  • Added Box2D::MinDistanceSquared helpers and applied the GCC 11+ unique_ptr return fix in ST_Distance_Sphere.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/spatial/operators/spatial_join_physical.cpp Replaces the inlined FlatRTree implementation with an include of the extracted shared header.
src/spatial/operators/flat_rtree.hpp New shared header containing FlatRTree and new KNN traversal primitive.
src/spatial/geometry/bbox.hpp Adds min squared distance helpers for point→box and box→box, used by KNN priority ordering.
src/spatial/modules/main/spatial_functions_scalar.cpp Fixes GCC11+ compilation by explicitly std::move-returning a derived unique_ptr.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/spatial/operators/flat_rtree.hpp Outdated
Comment thread src/spatial/operators/flat_rtree.hpp
Comment thread src/spatial/operators/flat_rtree.hpp
Comment thread src/spatial/operators/flat_rtree.hpp Outdated
Comment thread src/spatial/operators/spatial_join_physical.cpp
Move FlatRTree from anonymous namespace in spatial_join_physical.cpp
to a shared header (flat_rtree.hpp) so it can be reused by the
upcoming KNN join operator.

Changes:
- Extract FlatRTree class to src/spatial/operators/flat_rtree.hpp
- Add KNNSearch method using Hjaltason-Samet priority queue traversal
- Add FlatRTreeKNNState for KNN search state management
- Add Box2D::MinDistanceSquared for KNN distance lower bound
- Use Allocator instead of BufferManager (in-memory only)
- No functional changes to existing SPATIAL_JOIN behavior
@pierre-warnier
Copy link
Copy Markdown
Author

Addressed the 5 review items in the amended commit (707ea8e):

  1. Stale comments on GetBounds() / Count() — rewrote to match current behavior ("Returns the bounding box" / "Returns the number of leaf items").
  2. Build() divergence from upstream — reverted the dx/dy > 0 guards. The refactor is now a faithful cut-paste of the original inlined version.
  3. Missing <limits> and <functional> headers — added.
  4. Empty public: section — removed.
  5. spatial/util/math.hpp transitive include — re-added the direct include to spatial_join_physical.cpp.

All 133 existing SPATIAL_JOIN assertions still pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants