Skip to content

Conversation

@jiayuasu
Copy link
Member

@jiayuasu jiayuasu commented Feb 11, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Added StructuredAdapter.repartitionBySpatialKey() - a one-step API for spatially partitioning DataFrames using KDB-Tree (or other grid types). This simplifies the current 5-step workflow (DataFrame to SpatialRDD to analyze to spatialPartitioningWithoutDuplicates to back to DataFrame) into a single method call.

Changes:

  • Scala: 3 overloads of repartitionBySpatialKey in StructuredAdapter.scala (full params, auto-detect geometry, auto-detect plus default partitions)
  • Python: Matching repartitionBySpatialKey classmethod in structured_adapter.py
  • Scala tests: 2 new tests in structuredAdapterTestScala.scala
  • Python tests: 2 new tests in test_structured_adapter.py
  • Docs: New Spatial Partitioning for GeoParquet section in GeoParquet tutorial with Python, Scala, Java examples

How was this patch tested?

  • Scala tests: all 10 tests pass in structuredAdapterTestScala
  • Python tests added following existing patterns
  • Spotless formatting verified
  • Pre-commit hooks pass

Did this PR include necessary documentation updates?

  • Yes, I am adding a new API. I am using the current SNAPSHOT version number in v1.9.0 format.
  • Yes, I have updated the documentation. Added a new Spatial Partitioning for GeoParquet section to the GeoParquet tutorial with tabbed Python, Scala, Java examples.

…atial partitioning

Add a convenience method that wraps the multi-step process of:
1. Converting DataFrame to SpatialRDD
2. Calling analyze()
3. Applying spatialPartitioningWithoutDuplicates
4. Converting back to DataFrame

into a single call:

  StructuredAdapter.repartitionBySpatialKey(df, GridType.KDBTREE, 10)

This simplifies the workflow for generating spatially partitioned
GeoParquet files. Added to both Scala and Python APIs, with tests
and updated documentation.

Closes #2639
@jiayuasu jiayuasu marked this pull request as draft February 11, 2026 10:42
@jiayuasu jiayuasu added this to the sedona-1.9.0 milestone Feb 11, 2026
@jiayuasu jiayuasu changed the title [SEDONA-2639] Add StructuredAdapter.repartitionBySpatialKey for simplified spatial partitioning [GH-2639] Add StructuredAdapter.repartitionBySpatialKey for simplified spatial partitioning Feb 11, 2026
@jiayuasu jiayuasu closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We should simplify the process to use KDB-Tree partitioner to generate geoparquet files

1 participant