-
Notifications
You must be signed in to change notification settings - Fork 57
Description
There are a handful of fields in OTLP protobuf that are repeated primitive types. HistograpDataPoint bucket_counts and explicit_bounds are examples.
In protobuf encoding, these can be either "packed" or "expanded". See the documentation here: https://protobuf.dev/programming-guides/encoding/#repeated
Prost, correctly, uses the packed encoding for these fields (which is the default in proto3). However, our direct OTAP -> OTLP bytes conversion does not use a packed encoding:
otel-arrow/rust/otap-dataflow/crates/pdata/src/otlp/metrics/data_points/histogram.rs
Lines 193 to 221 in 99deb08
| if let Some(bucket_counts) = &hist_dp_arrays.histogram_bucket_counts { | |
| if bucket_counts.list.is_valid(index) { | |
| let value_offsets = bucket_counts.list.value_offsets(); | |
| let start = value_offsets[index] as usize; | |
| let end = value_offsets[index + 1] as usize; | |
| for i in start..end { | |
| if bucket_counts.value.is_valid(i) { | |
| let val = bucket_counts.value.value(i); | |
| result_buf.encode_field_tag(HISTOGRAM_DP_BUCKET_COUNTS, wire_types::FIXED64); | |
| result_buf.extend_from_slice(&val.to_le_bytes()); | |
| } | |
| } | |
| } | |
| } | |
| if let Some(explicit_bounds) = &hist_dp_arrays.histogram_explicit_bounds { | |
| if explicit_bounds.list.is_valid(index) { | |
| let value_offsets = explicit_bounds.list.value_offsets(); | |
| let start = value_offsets[index] as usize; | |
| let end = value_offsets[index + 1] as usize; | |
| for i in start..end { | |
| if explicit_bounds.value.is_valid(i) { | |
| let val = explicit_bounds.value.value(i); | |
| result_buf.encode_field_tag(HISTOGRAM_DP_EXPLICIT_BOUNDS, wire_types::FIXED64); | |
| result_buf.extend_from_slice(&val.to_le_bytes()); | |
| } | |
| } | |
| } | |
| } |
However in our OTLP bytes -> OTAP conversion, we support packed only (as of #1398). Also, the interpretation of packed values does not handle a particular edge case: where there are multiple packed values in the same message. The docs says this needs to be handled:
Note that although there’s usually no reason to encode more than one key-value pair for a packed repeated field, parsers must be prepared to accept multiple key-value pairs. In this case, the payloads should be concatenated. Each pair must contain a whole number of elements. The following is a valid encoding of the same message above that parsers must accept:
6: {3 270}
6: {86942}
The docs also states:
Protocol buffer parsers must be able to parse repeated fields that were compiled as packed as if they were not packed, and vice versa. This permits adding [packed=true] to existing fields in a forward- and backward-compatible way.
Clearly we need to correct this. To fix:
- [Most critical] OTAP -> OTLP bytes should be made to use the packed encoding.
- if there's only one value, we could optionally use the expanded form to save 1 byte per field (by not encoding the length).
- this is the most critical thing to fix b/c currently our proto decoder will not correctly read what is produced by the proto encoder
- OTLP bytes -> OTAP should support:
- decoding the expanded encoding
- If multiple sequences of packed values are present.
The docs are not clear on whether proto encoders can mix packed & expanded in the same message, so our proto decoder probably does not need to handle this.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status