-
Notifications
You must be signed in to change notification settings - Fork 748
[GH-2610] Integrate proj4sedona for CRS transformation #2647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…583) This pull request introduces a new performance test suite for CRS (Coordinate Reference System) transformation using Proj4sedona, and updates the documentation for the `ST_Transform` function in both Flink and Snowflake APIs. The main focus is on clarifying supported CRS formats, grid file usage, and simplifying documentation to reflect the latest capabilities of Sedona. **Testing enhancements:** - **New Proj4 Performance Test Suite:** - Added `FunctionsProj4PerformanceTest.java`, which benchmarks Proj4sedona CRS transformation performance, cache effects, and compares it with GeoTools. It covers scenarios like built-in and remote EPSG codes, PROJ/WKT strings, and grid file usage (both local and remote). **Documentation improvements:** - **Flink API Documentation (`Function.md`):** - Updated to clarify that Sedona now supports multiple CRS formats (EPSG, WKT1/2, PROJ strings, PROJJSON) and grid files for high-accuracy transformations. - Removed outdated explanations about lon/lat order handling, deprecated optional parameters, and lengthy WKT examples. - Added a tip directing users to the Spark SQL documentation for comprehensive CRS transformation details. - Simplified the function signature and removed the deprecated `lenientMode` parameter from examples. - **Snowflake API Documentation (`Function.md`):** - Updated to reflect support for multiple CRS formats and grid files, and direct users to the Spark SQL documentation for more details. - Simplified the function signature and removed references to deprecated parameters and redundant examples. These changes ensure that both the codebase and the documentation are up-to-date with Sedona’s latest CRS transformation features and best practices, making it easier for users and developers to understand and utilize these capabilities.
…oj4sedona transform (#585)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Integrates proj4sedona as the default CRS transformation engine (replacing GeoTools for vector ST_Transform), adds a new Spark config to switch backends, and updates tests/docs accordingly.
Changes:
- Added proj4sedona dependency and introduced proj4-backed
ST_Transformacross Spark/Flink/Snowflake. - Added
spark.sedona.crs.geotoolsconfiguration viaSedonaConf.CRSTransformModeand driver-side resolution in Spark expressions. - Added extensive proj4 CRS transformation tests and new CRS Transformation documentation page.
Reviewed changes
Copilot reviewed 31 out of 37 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Functions.scala | Switch ST_Transform expression to proj4 backend with config-controlled GeoTools fallback |
| spark/common/src/main/java/org/apache/sedona/core/utils/SedonaConf.java | Add CRS transform mode enum + config parsing |
| common/src/main/java/org/apache/sedona/common/FunctionsProj4.java | New proj4sedona-based CRS transform implementation |
| common/src/main/java/org/apache/sedona/common/Constructors.java | Add factory-aware envelope polygon creation + SRID-aware envelope construction |
| common/src/main/java/org/apache/sedona/common/Functions.java | Adjust expand envelope creation (now uses provided GeometryFactory) |
| flink/src/main/java/org/apache/sedona/flink/expressions/FunctionsProj4.java | New Flink scalar function wrapper for proj4-based ST_Transform |
| flink/src/main/java/org/apache/sedona/flink/Catalog.java | Register proj4-based transform UDF in Flink catalog |
| snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/UDFs.java | Use proj4 for Snowflake ST_Transform implementation |
| snowflake/src/main/java/org/apache/sedona/snowflake/snowsql/UDFsV2.java | Use proj4 for Snowflake ST_Transform implementation (V2) |
| docs/api/sql/CRS-Transformation.md | New documentation page for CRS formats + grid files |
| docs/api/sql/Function.md | Update ST_Transform docs to reflect proj4sedona defaults |
| spark/common/src/test/scala/org/apache/sedona/sql/CRSTransformProj4Test.scala | Add Spark-side proj4 CRS transformation tests (incl. grid + remote cases) |
| common/src/test/java/org/apache/sedona/common/FunctionsProj4Test.java | Add unit tests for CRS formats + grid behaviors |
| common/src/test/java/org/apache/sedona/common/FunctionsProj4PerformanceTest.java | Add performance-focused tests/benchmarks for proj4 vs GeoTools |
| pom.xml / common/pom.xml / flink/pom.xml / snowflake/pom.xml | Add proj4sedona dependency/version management |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Introducing new CRS format support #2610What changes were proposed in this PR?
This PR integrates proj4sedona (v0.0.3) as the default CRS transformation engine for vector geometries in Apache Sedona. proj4sedona is a pure Java implementation with no LGPL dependencies, replacing GeoTools for vector CRS transformations.
New CRS format support
EPSG:4326,EPSG:3857(also ESRI, IAU, SR-ORG authorities)PROJCS[...],GEOGCS[...]PROJCRS[...],GEOGCRS[...]+proj=longlat +datum=WGS84 +no_defs{"type": "ProjectedCRS", ...}CRS database is sourced from spatialreference.org, supporting EPSG, ESRI, IAU, and SR-ORG authorities.
Grid file support
NAD grid files (
.gsb,.tif) for high-accuracy datum transformations (e.g., NAD27 to NAD83, OSGB36 to ETRS89) via:+nadgrids=/path/to/grid.gsb+nadgrids=@us_noaa_conus.tif+nadgrids=https://cdn.proj.org/us_noaa_conus.tifConfiguration
New Spark config
spark.sedona.crs.geotoolscontrols the CRS engine:noneraster(default)allKey changes
FunctionsProj4class incommon,flink, andsnowflakemodules with cached projections for performanceST_TransformAPI unchanged — 2-arg, 3-arg, and 4-arg overloads preserved. Thelenientparameter is kept for API compatibility (ignored by proj4sedona)GeoToolsWrapperremoved; Snowflake usesFunctionsProj4directlyuseGeoToolsboolean resolved on the Spark driver and serialized to executors, avoiding executor-side SparkSession accessproj4sedona(v0.0.3) dependency to root, common, flink, and snowflake POMsHow was this patch tested?
FunctionsProj4Test.java): 35 tests covering all CRS formats, all geometry types, grid files (local/remote/optional/mandatory), edge cases (null, empty, same CRS, missing SRID, invalid CRS), SRID/UserData preservation, round-trip transformationsFunctionsProj4PerformanceTest.java): 7 benchmarks comparing proj4sedona vs GeoTools, including cache effect testsCRSTransformProj4Test.scala): ~38 tests covering SQL and DataFrame API for all CRS formats, config switching, all geometry types, and 40 official OSTN15 test points (UK Ordnance Survey reference data for OSGB36/ETRS89 accuracy validation)Did this PR include necessary documentation updates?
v1.9.0format.docs/api/sql/CRS-Transformation.mdcovering all CRS formats, grid files, coordinate order, and SRID usage with SQL examplesST_Transformentries in Spark, Flink, and Snowflake function docsspark.sedona.crs.geotoolsparameter todocs/api/sql/Parameter.mdmkdocs.yml