Skip to content

Conversation

@coderfender
Copy link
Contributor

@coderfender coderfender commented Dec 1, 2025

Which issue does this PR close?

Closes #326

Rationale for this change

Comet surrently supports some basic string -> non int casts and this PR essentially extends on that to support all possible casts from string to non-int numeric (float, double and decimal) types

What changes are included in this PR?

Updates to cast logic to support exhaustive casts from string to numeric (non-int) types ( float and decimal inputs)

Float inputs (which should include DoubleType)

  1. Added logic to support all float types along with scientific notation handling
  2. Added unit tests to support double and float casts

Decimal Types
(This is the tricky one)

  1. I added two functions (is_valid_decimal and parse_string_to_decimal). 'is_valid_decimal' to parse the input string and return True if the input is a valid decimal. This follows similar pattern of Spark's parsing logic which indeed relies on BigDecimal implementation
    2.'parse_string_to_decimal' does the heavy lifting of parsing the input string into a valid 'i128' which is essentially how the value is stored (along with scale and precision information). This method also checks if the scale is too high / low and fails early.
  2. Once the above method returns a valid i128 integer, we use the decimal builder and return the decimal array

How are these changes tested?

Unit tests for specific edge cases like infinity, scientific notations etc (covering the edge cases mentioned in the linked issue) and fuzz tests to make sure all valid/invalid inputs are verified
2. Also added additional tests to verify higher precision for decimal casts as well

cc : @andygrove , @martin-g

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 54.65%. Comparing base (f09f8af) to head (0a60b94).
⚠️ Report is 753 commits behind head on main.

Files with missing lines Patch % Lines
...scala/org/apache/comet/expressions/CometCast.scala 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2835      +/-   ##
============================================
- Coverage     56.12%   54.65%   -1.48%     
- Complexity      976     1330     +354     
============================================
  Files           119      167      +48     
  Lines         11743    15290    +3547     
  Branches       2251     2532     +281     
============================================
+ Hits           6591     8356    +1765     
- Misses         4012     5714    +1702     
- Partials       1140     1220      +80     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

) -> SparkResult<ArrayRef> {
let string_array = array
.as_any()
.downcast_ref::<StringArray>()
Copy link
Member

@martin-g martin-g Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle both Utf8 and LargeUtf8:

    use datafusion::common::cast::as_generic_string_array;

    if let Ok(string_array) = as_generic_string_array::<i32>(array) {
        return cast_string_to_decimal128_impl_generic(string_array, eval_mode, precision, scale);
    }
    if let Ok(string_array) = as_generic_string_array::<i64>(array) {
        return cast_string_to_decimal128_impl_generic(string_array, eval_mode, precision, scale);
    }
    Err(SparkError::Internal("Expected string array".to_string()))

@coderfender coderfender marked this pull request as draft December 2, 2025 16:00
@coderfender coderfender changed the title feat: Support string non int numeric types feat: Support casting string non int numeric types Dec 6, 2025
@coderfender coderfender force-pushed the support_string_non_int_numeric_types branch from e7a1144 to 79d0ea9 Compare December 7, 2025 19:36
@coderfender coderfender marked this pull request as ready for review December 7, 2025 19:53
@coderfender
Copy link
Contributor Author

Fixed issues with formatting

@coderfender
Copy link
Contributor Author

Fixed failing tests and added some more . Also modularized to replicate spark's behavior

@coderfender
Copy link
Contributor Author

Reasons for the tests to be failing :

  1. Test name mismatch (Comet is checking for cast {dt} to {dt} while the actual test name has ANSI support cast StringType to FloatType .
  2. Mismatches between Spark and Comet outputs for a low precision Decimals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Spark-compatible CAST from String to Floating Point

3 participants