[GH-2609] Support Spark 4.1 by jiayuasu · Pull Request #2649 · apache/sedona

jiayuasu · 2026-02-12T19:40:26Z

Did you read the Contributor Guide?

Yes, I have read the Contributor Rules and Contributor Development Guide

Is this PR related to a ticket?

Yes, and the PR name follows the format [GH-XXX] my subject. Closes Support Spark 4.1 #2609

What changes were proposed in this PR?

This PR adds support for Apache Spark 4.1 in Sedona.

Build scaffolding

Added sedona-spark-4.1 Maven profile in root pom.xml (Spark 4.1.1, Scala 2.13.17, Hadoop 3.4.1)
Added spark-4.1 module entry in spark/pom.xml (enable-all-submodules profile)
Added sedona-spark-4.1 profile in spark/common/pom.xml with spark-sql-api dependency
Created spark/spark-4.1/ module (copied from spark/spark-4.0/, updated artifactId)
Fixed Scala version mismatch: updated scala2.13 and sedona-spark-4.0 profiles to use Scala 2.13.17

Spark 4.1 API compatibility fixes

ParquetColumnVector.java: Changed setAllNull() to reflection-based markAllNull() that works with both setAllNull (Spark less than 4.1) and setMissing (Spark 4.1+)
Functions.scala: Added explicit org.locationtech.jts.geom.Geometry import to resolve ambiguity with new org.apache.spark.sql.functions.Geometry in Spark 4.1
SedonaArrowEvalPythonExec.scala (spark-4.1 only): Added sessionUUID parameter required by Spark 4.1 ArrowEvalPythonExec

SPARK-52671 UDT workaround

Spark 4.1 changed RowEncoder.encoderForDataType to call udt.getClass directly instead of looking up via UDTRegistration. For Scala case object UDTs, getClass returns the module class (e.g., GeometryUDT$) which has a private constructor, causing ScalaReflectionException.

Fix: Added apply() factory methods to all three UDT case objects (GeometryUDT, GeographyUDT, RasterUDT) and replaced bare singleton references with UDT() calls across source files so that schema construction uses proper class instances.

Python support

Bumped pyspark upper bound from <4.1.0 to <4.2.0 in python/pyproject.toml (3 locations)

CI workflows

java.yml: Added Spark 4.1.1 + Scala 2.13.17 + JDK 17 matrix entries (both compile and unit-test jobs)
python.yml: Added Spark 4.1.1 matrix entry
example.yml: Added Spark 4.1 matrix entry

Documentation

Updated docs/setup/maven-coordinates.md with Spark 4.1 artifact coordinates
Updated docs/setup/platform.md compatibility table (Spark 4.1 requires Scala 2.13 and Python 3.10+)
Updated docs/community/publish.md release checklist

How was this PR tested?

All 6 Spark/Scala/JDK combinations compile successfully:
- Spark 3.4 + Scala 2.12 + JDK 11
- Spark 3.4 + Scala 2.13 + JDK 11
- Spark 3.5 + Scala 2.12 + JDK 11
- Spark 4.0 + Scala 2.13 + JDK 17
- Spark 4.1 + Scala 2.13 + JDK 17
Unit tests for Spark 4.1: 1610 passed, 44 failed (all GEOSPATIAL_DISABLED - Spark 4.1 built-in geospatial functions shadow Sedona; separate issue)

Key files changed

Area	Files
Build	pom.xml, spark/pom.xml, spark/common/pom.xml, spark/spark-4.1/pom.xml
Spark 4.1 module	spark/spark-4.1/src/ (all files)
UDT workaround	GeometryUDT.scala, GeographyUDT.scala, RasterUDT.scala, plus ~50 files with schema changes
API compat	ParquetColumnVector.java, Functions.scala
Python	python/pyproject.toml
CI	.github/workflows/java.yml, python.yml, example.yml
Docs	docs/setup/maven-coordinates.md, platform.md, docs/community/publish.md

- Add sedona-spark-4.1 Maven profile (Spark 4.1.0, Scala 2.13.17, Hadoop 3.4.1) - Create spark/spark-4.1 module based on spark-4.0 - Fix Geometry import ambiguity (Spark 4.1 adds o.a.s.sql.types.Geometry) - Fix WritableColumnVector.setAllNull() removal (replaced by setMissing() in 4.1) - Add sessionUUID parameter to ArrowPythonWithNamedArgumentRunner (new in 4.1) - Update docs (maven-coordinates, platform, publish) - Update CI workflows (java, example, python, docker-build)

…patibility table, refine CI matrices

Spark 4.1's RowEncoder calls udt.getClass directly, which returns the Scala module class (e.g. GeometryUDT$) with a private constructor for case objects, causing EXPRESSION_DECODING_FAILED errors. Fix: Add apply() method to GeometryUDT, GeographyUDT, and RasterUDT case objects that return new class instances, and use UDT() instead of the bare singleton throughout schema construction code. This ensures getClass returns the public class with an accessible constructor. Also: - Revert docker-build.yml (no Spark 4.1 in Docker builds) - Bump pyspark upper bound from <4.1.0 to <4.2.0 - Bump Spark 4.1.0 to 4.1.1 in CI and POM - Fix Scala 2.13.12 vs 2.13.17 mismatch in scala2.13 profile

…failure on Python <3.10

…disable geospatial in tests

…ion, overwrite Spark native

… on Spark 4.1

Spark 4.1 no longer provides commons-collections 3.x transitively. Replace FilterIterator with Java 8 stream filtering in DuplicatesFilter, and IteratorUtils.toList with StreamSupport in the test.

github-actions bot added docs sedona-spark github_actions Pull requests that update GitHub Actions code root labels Feb 12, 2026

jiayuasu added 3 commits February 13, 2026 00:27

Remove Spark 4.1 from example CI

2a23a99

Fix docs and CI: remove Scala 2.12 tabs for Spark 4.1, fix Python com…

3c621e5

…patibility table, refine CI matrices

jiayuasu force-pushed the support-spark-4.1 branch from 6b14539 to 3c621e5 Compare February 13, 2026 08:28

github-actions bot added the sedona-python label Feb 13, 2026

jiayuasu added 7 commits February 13, 2026 09:08

fix: split pyspark dep with python_version markers to avoid resolver …

07ee40d

…failure on Python <3.10

fix: add toString override to UDT classes and disable fail-fast in CI

542d447

fix: force-overwrite Spark 4.1 native ST functions with Sedona's and …

e3eef2d

…disable geospatial in tests

fix: smart guard for function registration - skip Sedona re-registrat…

78682c6

…ion, overwrite Spark native

fix: use python_version >= 3.10 for all Spark 4.x in Python CI

660154a

fix: alias DataFrames in OsmReaderTest to resolve ambiguous self-join…

7d65d1b

… on Spark 4.1

fix: replace commons-collections 3.x with Java stdlib

391ad36

Spark 4.1 no longer provides commons-collections 3.x transitively. Replace FilterIterator with Java 8 stream filtering in DuplicatesFilter, and IteratorUtils.toList with StreamSupport in the test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GH-2609] Support Spark 4.1#2649

[GH-2609] Support Spark 4.1#2649
jiayuasu wants to merge 11 commits intomasterfrom
support-spark-4.1

jiayuasu commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiayuasu commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Build scaffolding

Spark 4.1 API compatibility fixes

SPARK-52671 UDT workaround

Python support

CI workflows

Documentation

How was this PR tested?

Key files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jiayuasu commented Feb 12, 2026 •

edited

Loading