Conversation
- Add sedona-spark-4.1 Maven profile (Spark 4.1.0, Scala 2.13.17, Hadoop 3.4.1) - Create spark/spark-4.1 module based on spark-4.0 - Fix Geometry import ambiguity (Spark 4.1 adds o.a.s.sql.types.Geometry) - Fix WritableColumnVector.setAllNull() removal (replaced by setMissing() in 4.1) - Add sessionUUID parameter to ArrowPythonWithNamedArgumentRunner (new in 4.1) - Update docs (maven-coordinates, platform, publish) - Update CI workflows (java, example, python, docker-build)
…patibility table, refine CI matrices
6b14539 to
3c621e5
Compare
Spark 4.1's RowEncoder calls udt.getClass directly, which returns the Scala module class (e.g. GeometryUDT$) with a private constructor for case objects, causing EXPRESSION_DECODING_FAILED errors. Fix: Add apply() method to GeometryUDT, GeographyUDT, and RasterUDT case objects that return new class instances, and use UDT() instead of the bare singleton throughout schema construction code. This ensures getClass returns the public class with an accessible constructor. Also: - Revert docker-build.yml (no Spark 4.1 in Docker builds) - Bump pyspark upper bound from <4.1.0 to <4.2.0 - Bump Spark 4.1.0 to 4.1.1 in CI and POM - Fix Scala 2.13.12 vs 2.13.17 mismatch in scala2.13 profile
…failure on Python <3.10
…disable geospatial in tests
…ion, overwrite Spark native
Spark 4.1 no longer provides commons-collections 3.x transitively. Replace FilterIterator with Java 8 stream filtering in DuplicatesFilter, and IteratorUtils.toList with StreamSupport in the test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Support Spark 4.1 #2609What changes were proposed in this PR?
This PR adds support for Apache Spark 4.1 in Sedona.
Build scaffolding
sedona-spark-4.1Maven profile in rootpom.xml(Spark 4.1.1, Scala 2.13.17, Hadoop 3.4.1)spark-4.1module entry inspark/pom.xml(enable-all-submodulesprofile)sedona-spark-4.1profile inspark/common/pom.xmlwithspark-sql-apidependencyspark/spark-4.1/module (copied fromspark/spark-4.0/, updatedartifactId)scala2.13andsedona-spark-4.0profiles to use Scala 2.13.17Spark 4.1 API compatibility fixes
setAllNull()to reflection-basedmarkAllNull()that works with bothsetAllNull(Spark less than 4.1) andsetMissing(Spark 4.1+)org.locationtech.jts.geom.Geometryimport to resolve ambiguity with neworg.apache.spark.sql.functions.Geometryin Spark 4.1sessionUUIDparameter required by Spark 4.1ArrowEvalPythonExecSPARK-52671 UDT workaround
Spark 4.1 changed
RowEncoder.encoderForDataTypeto calludt.getClassdirectly instead of looking up viaUDTRegistration. For Scalacase objectUDTs,getClassreturns the module class (e.g.,GeometryUDT$) which has a private constructor, causingScalaReflectionException.Fix: Added
apply()factory methods to all three UDT case objects (GeometryUDT,GeographyUDT,RasterUDT) and replaced bare singleton references withUDT()calls across source files so that schema construction uses proper class instances.Python support
<4.1.0to<4.2.0inpython/pyproject.toml(3 locations)CI workflows
compileandunit-testjobs)Documentation
docs/setup/maven-coordinates.mdwith Spark 4.1 artifact coordinatesdocs/setup/platform.mdcompatibility table (Spark 4.1 requires Scala 2.13 and Python 3.10+)docs/community/publish.mdrelease checklistHow was this PR tested?
Key files changed