Skip to content

Correct support for packed/expanded protobuf fields #1399

@albertlockett

Description

@albertlockett

There are a handful of fields in OTLP protobuf that are repeated primitive types. HistograpDataPoint bucket_counts and explicit_bounds are examples.

In protobuf encoding, these can be either "packed" or "expanded". See the documentation here: https://protobuf.dev/programming-guides/encoding/#repeated

Prost, correctly, uses the packed encoding for these fields (which is the default in proto3). However, our direct OTAP -> OTLP bytes conversion does not use a packed encoding:

if let Some(bucket_counts) = &hist_dp_arrays.histogram_bucket_counts {
if bucket_counts.list.is_valid(index) {
let value_offsets = bucket_counts.list.value_offsets();
let start = value_offsets[index] as usize;
let end = value_offsets[index + 1] as usize;
for i in start..end {
if bucket_counts.value.is_valid(i) {
let val = bucket_counts.value.value(i);
result_buf.encode_field_tag(HISTOGRAM_DP_BUCKET_COUNTS, wire_types::FIXED64);
result_buf.extend_from_slice(&val.to_le_bytes());
}
}
}
}
if let Some(explicit_bounds) = &hist_dp_arrays.histogram_explicit_bounds {
if explicit_bounds.list.is_valid(index) {
let value_offsets = explicit_bounds.list.value_offsets();
let start = value_offsets[index] as usize;
let end = value_offsets[index + 1] as usize;
for i in start..end {
if explicit_bounds.value.is_valid(i) {
let val = explicit_bounds.value.value(i);
result_buf.encode_field_tag(HISTOGRAM_DP_EXPLICIT_BOUNDS, wire_types::FIXED64);
result_buf.extend_from_slice(&val.to_le_bytes());
}
}
}
}

However in our OTLP bytes -> OTAP conversion, we support packed only (as of #1398). Also, the interpretation of packed values does not handle a particular edge case: where there are multiple packed values in the same message. The docs says this needs to be handled:

Note that although there’s usually no reason to encode more than one key-value pair for a packed repeated field, parsers must be prepared to accept multiple key-value pairs. In this case, the payloads should be concatenated. Each pair must contain a whole number of elements. The following is a valid encoding of the same message above that parsers must accept:

6: {3 270}
6: {86942}

The docs also states:

Protocol buffer parsers must be able to parse repeated fields that were compiled as packed as if they were not packed, and vice versa. This permits adding [packed=true] to existing fields in a forward- and backward-compatible way.

Clearly we need to correct this. To fix:

  1. [Most critical] OTAP -> OTLP bytes should be made to use the packed encoding.
    • if there's only one value, we could optionally use the expanded form to save 1 byte per field (by not encoding the length).
    • this is the most critical thing to fix b/c currently our proto decoder will not correctly read what is produced by the proto encoder
  2. OTLP bytes -> OTAP should support:
    • decoding the expanded encoding
    • If multiple sequences of packed values are present.

The docs are not clear on whether proto encoders can mix packed & expanded in the same message, so our proto decoder probably does not need to handle this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions