-
Notifications
You must be signed in to change notification settings - Fork 258
feat: Support casting string non int numeric types #2835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Support casting string non int numeric types #2835
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2835 +/- ##
============================================
- Coverage 56.12% 54.65% -1.48%
- Complexity 976 1330 +354
============================================
Files 119 167 +48
Lines 11743 15290 +3547
Branches 2251 2532 +281
============================================
+ Hits 6591 8356 +1765
- Misses 4012 5714 +1702
- Partials 1140 1220 +80 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| ) -> SparkResult<ArrayRef> { | ||
| let string_array = array | ||
| .as_any() | ||
| .downcast_ref::<StringArray>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle both Utf8 and LargeUtf8:
use datafusion::common::cast::as_generic_string_array;
if let Ok(string_array) = as_generic_string_array::<i32>(array) {
return cast_string_to_decimal128_impl_generic(string_array, eval_mode, precision, scale);
}
if let Ok(string_array) = as_generic_string_array::<i64>(array) {
return cast_string_to_decimal128_impl_generic(string_array, eval_mode, precision, scale);
}
Err(SparkError::Internal("Expected string array".to_string()))e7a1144 to
79d0ea9
Compare
|
Fixed issues with formatting |
|
Fixed failing tests and added some more . Also modularized to replicate spark's behavior |
|
Reasons for the tests to be failing :
|
Which issue does this PR close?
Closes #326
Rationale for this change
Comet surrently supports some basic string -> non int casts and this PR essentially extends on that to support all possible casts from string to non-int numeric (float, double and decimal) types
What changes are included in this PR?
Updates to cast logic to support exhaustive casts from string to numeric (non-int) types ( float and decimal inputs)
Float inputs (which should include DoubleType)
Decimal Types
(This is the tricky one)
2.'parse_string_to_decimal' does the heavy lifting of parsing the input string into a valid 'i128' which is essentially how the value is stored (along with scale and precision information). This method also checks if the scale is too high / low and fails early.
How are these changes tested?
Unit tests for specific edge cases like infinity, scientific notations etc (covering the edge cases mentioned in the linked issue) and fuzz tests to make sure all valid/invalid inputs are verified
2. Also added additional tests to verify higher precision for decimal casts as well
cc : @andygrove , @martin-g