Skip to content

Batch processor sorting logic #1376

@jmacd

Description

@jmacd

I have studied the current state of our batch processing logic. I think the original author of the code might find a different solution, but I couldn't pick this up without starting over. I found it had some undesirable behavior, particularly around single payloads that exceeded the maximum size; the organization of batching into two discrete phases with split() and concatenate() is IMO not a great approach.

I have started with a rewrite of the batch processor, starting with the first principles that I wish to see that break with the current version:

  • Focus entirely on Logs to start
  • Provide an option to sort the main payload by some dimension(s), particularly useful ones (in logs) are timestamp, trace ID, event name
  • Maximum size is not optional, since we have u16 limits anyway
  • Produce batches of the maximum size with max 1 residual small batch
  • Deduplicate Resource and Scope attribute values

There aspects of the current code that I will keep:

  • basic data type Vec<[Option; N]>
  • select() helper function for iterating basic data type
  • unify() logic for schema unifications

See also
#969
#347

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions