-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
I have studied the current state of our batch processing logic. I think the original author of the code might find a different solution, but I couldn't pick this up without starting over. I found it had some undesirable behavior, particularly around single payloads that exceeded the maximum size; the organization of batching into two discrete phases with split() and concatenate() is IMO not a great approach.
I have started with a rewrite of the batch processor, starting with the first principles that I wish to see that break with the current version:
- Focus entirely on Logs to start
- Provide an option to sort the main payload by some dimension(s), particularly useful ones (in logs) are timestamp, trace ID, event name
- Maximum size is not optional, since we have u16 limits anyway
- Produce batches of the maximum size with max 1 residual small batch
- Deduplicate Resource and Scope attribute values
There aspects of the current code that I will keep:
- basic data type Vec<[Option; N]>
- select() helper function for iterating basic data type
- unify() logic for schema unifications
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
No status