Correct support for packed/expanded protobuf fields

There are a handful of fields in OTLP protobuf that are repeated primitive types. HistograpDataPoint `bucket_counts` and `explicit_bounds` are examples.

In protobuf encoding, these can be either "packed" or "expanded". See the documentation here: https://protobuf.dev/programming-guides/encoding/#repeated

Prost, correctly, uses the packed encoding for these fields (which is the default in proto3). However, our direct OTAP -> OTLP bytes conversion does not use a packed encoding:
https://github.com/open-telemetry/otel-arrow/blob/99deb083032f33365657140d51358363e1b3bc83/rust/otap-dataflow/crates/pdata/src/otlp/metrics/data_points/histogram.rs#L193-L221

However in our OTLP bytes -> OTAP conversion, we support packed only (as of https://github.com/open-telemetry/otel-arrow/pull/1398). Also, the interpretation of packed values does not handle a particular edge case: where there are multiple packed values in the same message. The docs says this needs to be handled:
> Note that although there’s usually no reason to encode more than one key-value pair for a packed repeated field, parsers must be prepared to accept multiple key-value pairs. In this case, the payloads should be concatenated. Each pair must contain a whole number of elements. The following is a valid encoding of the same message above that parsers must accept:
```
6: {3 270}
6: {86942}
```

The docs also states:
> Protocol buffer parsers must be able to parse repeated fields that were compiled as packed as if they were not packed, and vice versa. This permits adding [packed=true] to existing fields in a forward- and backward-compatible way.


Clearly we need to correct this. To fix:
1. **[Most critical]** OTAP -> OTLP bytes should be made to use the packed encoding.
   - if there's only one value, we _could_ optionally use the expanded form to save 1 byte per field (by not encoding the length).
   - this is the most critical thing to fix b/c currently our proto decoder will not correctly read what is produced by the proto encoder
2. OTLP bytes -> OTAP should support:
   - decoding the expanded encoding
   - If multiple sequences of packed values are present.

The docs are not clear on whether proto encoders can mix packed & expanded in the same message, so our proto decoder probably does not need to handle this.

	if let Some(bucket_counts) = &hist_dp_arrays.histogram_bucket_counts {
	if bucket_counts.list.is_valid(index) {
	let value_offsets = bucket_counts.list.value_offsets();
	let start = value_offsets[index] as usize;
	let end = value_offsets[index + 1] as usize;
	for i in start..end {
	if bucket_counts.value.is_valid(i) {
	let val = bucket_counts.value.value(i);
	result_buf.encode_field_tag(HISTOGRAM_DP_BUCKET_COUNTS, wire_types::FIXED64);
	result_buf.extend_from_slice(&val.to_le_bytes());
	}
	}
	}
	}

	if let Some(explicit_bounds) = &hist_dp_arrays.histogram_explicit_bounds {
	if explicit_bounds.list.is_valid(index) {
	let value_offsets = explicit_bounds.list.value_offsets();
	let start = value_offsets[index] as usize;
	let end = value_offsets[index + 1] as usize;
	for i in start..end {
	if explicit_bounds.value.is_valid(i) {
	let val = explicit_bounds.value.value(i);
	result_buf.encode_field_tag(HISTOGRAM_DP_EXPLICIT_BOUNDS, wire_types::FIXED64);
	result_buf.extend_from_slice(&val.to_le_bytes());
	}
	}
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correct support for packed/expanded protobuf fields #1399

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Correct support for packed/expanded protobuf fields #1399

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions