Skip to content

DataFusion support for TimestampWithOffset #21116

@LiaCastaneda

Description

@LiaCastaneda

Is your feature request related to a problem or challenge?

DataFusion has no support for the arrow.timestamp_with_offset which is an Arrow extension type introduced in arrow-rs 58.0.0 (apache/arrow-rs#8743). This type represents SQL TIMESTAMP WITH TIME ZONE by storing a UTC timestamp alongside a per row offset in minutes as a Struct { timestamp: Timestamp(unit, "UTC"), offset_minutes: Int16 }.

Describe the solution you'd like

Support for this type in the existing date functions: date_part, date_trunc, to_char, to_unixtime

Describe alternatives you've considered

The main concern I have on implementing this is about efficiency. TimestampWithOffset stores a per-row offset, therefore existing datetime functions cannot be applied to the array as a whole, since they require a uniform timezone for the entire array.
One approach is to just loop row by row applying each function individually, which is simple but does not take advantage of arrow's kernels. A more efficient alternative is to group rows by their offset_minutes value (which in theory should be low cardinality), process each group as a sub-array using the existing kernels, then reassemble the results in original row order. However, this grouping approach adds implementation complexity.

Additional context

For substrait queries we will need substrait support to specify the offset and a different timezone to UTC -- substrait-io/substrait#841

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions