Skip to content

Conversation

@jiayuasu
Copy link
Member

@jiayuasu jiayuasu commented Feb 12, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

This PR integrates proj4sedona (v0.0.3) as the default CRS transformation engine for vector geometries in Apache Sedona. proj4sedona is a pure Java implementation with no LGPL dependencies, replacing GeoTools for vector CRS transformations.

New CRS format support

Format Example
EPSG code EPSG:4326, EPSG:3857 (also ESRI, IAU, SR-ORG authorities)
WKT1 (OGC) PROJCS[...], GEOGCS[...]
WKT2 (ISO 19162:2019) PROJCRS[...], GEOGCRS[...]
PROJ string +proj=longlat +datum=WGS84 +no_defs
PROJJSON {"type": "ProjectedCRS", ...}

CRS database is sourced from spatialreference.org, supporting EPSG, ESRI, IAU, and SR-ORG authorities.

Grid file support

NAD grid files (.gsb, .tif) for high-accuracy datum transformations (e.g., NAD27 to NAD83, OSGB36 to ETRS89) via:

  • Local file path: +nadgrids=/path/to/grid.gsb
  • PROJ CDN: +nadgrids=@us_noaa_conus.tif
  • HTTPS URL: +nadgrids=https://cdn.proj.org/us_noaa_conus.tif

Configuration

New Spark config spark.sedona.crs.geotools controls the CRS engine:

Value Behavior
none proj4sedona for all transformations
raster (default) proj4sedona for vector, GeoTools for raster
all GeoTools for everything (legacy behavior)

Key changes

  • New FunctionsProj4 class in common, flink, and snowflake modules with cached projections for performance
  • ST_Transform API unchanged — 2-arg, 3-arg, and 4-arg overloads preserved. The lenient parameter is kept for API compatibility (ignored by proj4sedona)
  • Snowflake fully migrated — GeoToolsWrapper removed; Snowflake uses FunctionsProj4 directly
  • Driver-side config resolution — useGeoTools boolean resolved on the Spark driver and serialized to executors, avoiding executor-side SparkSession access
  • Added proj4sedona (v0.0.3) dependency to root, common, flink, and snowflake POMs

How was this patch tested?

  • Unit tests (FunctionsProj4Test.java): 35 tests covering all CRS formats, all geometry types, grid files (local/remote/optional/mandatory), edge cases (null, empty, same CRS, missing SRID, invalid CRS), SRID/UserData preservation, round-trip transformations
  • Performance benchmarks (FunctionsProj4PerformanceTest.java): 7 benchmarks comparing proj4sedona vs GeoTools, including cache effect tests
  • Spark integration tests (CRSTransformProj4Test.scala): ~38 tests covering SQL and DataFrame API for all CRS formats, config switching, all geometry types, and 40 official OSTN15 test points (UK Ordnance Survey reference data for OSGB36/ETRS89 accuracy validation)
  • Updated existing tests: Flink, Snowflake, Spark, and Python test suites updated for compatibility

Did this PR include necessary documentation updates?

  • Yes, I am adding a new API. I am using the current SNAPSHOT version number in v1.9.0 format.
  • New dedicated documentation page: docs/api/sql/CRS-Transformation.md covering all CRS formats, grid files, coordinate order, and SRID usage with SQL examples
  • Updated ST_Transform entries in Spark, Flink, and Snowflake function docs
  • Added spark.sedona.crs.geotools parameter to docs/api/sql/Parameter.md
  • Added navigation entry in mkdocs.yml

jiayuasu and others added 3 commits February 12, 2026 01:16
…583)

This pull request introduces a new performance test suite for CRS
(Coordinate Reference System) transformation using Proj4sedona, and
updates the documentation for the `ST_Transform` function in both Flink
and Snowflake APIs. The main focus is on clarifying supported CRS
formats, grid file usage, and simplifying documentation to reflect the
latest capabilities of Sedona.

**Testing enhancements:**

- **New Proj4 Performance Test Suite:**
- Added `FunctionsProj4PerformanceTest.java`, which benchmarks
Proj4sedona CRS transformation performance, cache effects, and compares
it with GeoTools. It covers scenarios like built-in and remote EPSG
codes, PROJ/WKT strings, and grid file usage (both local and remote).

**Documentation improvements:**

- **Flink API Documentation (`Function.md`):**
- Updated to clarify that Sedona now supports multiple CRS formats
(EPSG, WKT1/2, PROJ strings, PROJJSON) and grid files for high-accuracy
transformations.
- Removed outdated explanations about lon/lat order handling, deprecated
optional parameters, and lengthy WKT examples.
- Added a tip directing users to the Spark SQL documentation for
comprehensive CRS transformation details.
- Simplified the function signature and removed the deprecated
`lenientMode` parameter from examples.

- **Snowflake API Documentation (`Function.md`):**
- Updated to reflect support for multiple CRS formats and grid files,
and direct users to the Spark SQL documentation for more details.
- Simplified the function signature and removed references to deprecated
parameters and redundant examples.

These changes ensure that both the codebase and the documentation are
up-to-date with Sedona’s latest CRS transformation features and best
practices, making it easier for users and developers to understand and
utilize these capabilities.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Integrates proj4sedona as the default CRS transformation engine (replacing GeoTools for vector ST_Transform), adds a new Spark config to switch backends, and updates tests/docs accordingly.

Changes:

  • Added proj4sedona dependency and introduced proj4-backed ST_Transform across Spark/Flink/Snowflake.
  • Added spark.sedona.crs.geotools configuration via SedonaConf.CRSTransformMode and driver-side resolution in Spark expressions.
  • Added extensive proj4 CRS transformation tests and new CRS Transformation documentation page.

Reviewed changes

Copilot reviewed 31 out of 37 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Functions.scala Switch ST_Transform expression to proj4 backend with config-controlled GeoTools fallback
spark/common/src/main/java/org/apache/sedona/core/utils/SedonaConf.java Add CRS transform mode enum + config parsing
common/src/main/java/org/apache/sedona/common/FunctionsProj4.java New proj4sedona-based CRS transform implementation
common/src/main/java/org/apache/sedona/common/Constructors.java Add factory-aware envelope polygon creation + SRID-aware envelope construction
common/src/main/java/org/apache/sedona/common/Functions.java Adjust expand envelope creation (now uses provided GeometryFactory)
flink/src/main/java/org/apache/sedona/flink/expressions/FunctionsProj4.java New Flink scalar function wrapper for proj4-based ST_Transform
flink/src/main/java/org/apache/sedona/flink/Catalog.java Register proj4-based transform UDF in Flink catalog
snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/UDFs.java Use proj4 for Snowflake ST_Transform implementation
snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/UDFsV2.java Use proj4 for Snowflake ST_Transform implementation (V2)
docs/api/sql/CRS-Transformation.md New documentation page for CRS formats + grid files
docs/api/sql/Function.md Update ST_Transform docs to reflect proj4sedona defaults
spark/common/src/test/scala/org/apache/sedona/sql/CRSTransformProj4Test.scala Add Spark-side proj4 CRS transformation tests (incl. grid + remote cases)
common/src/test/java/org/apache/sedona/common/FunctionsProj4Test.java Add unit tests for CRS formats + grid behaviors
common/src/test/java/org/apache/sedona/common/FunctionsProj4PerformanceTest.java Add performance-focused tests/benchmarks for proj4 vs GeoTools
pom.xml / common/pom.xml / flink/pom.xml / snowflake/pom.xml Add proj4sedona dependency/version management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu merged commit 7c6c768 into master Feb 12, 2026
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introducing new CRS format support

2 participants