Skip to content

workflows: start tracing action and match sampling#412

Open
mattklein123 wants to merge 1 commit intomainfrom
tracing
Open

workflows: start tracing action and match sampling#412
mattklein123 wants to merge 1 commit intomainfrom
tracing

Conversation

@mattklein123
Copy link
Contributor

@mattklein123 mattklein123 commented Mar 4, 2026

Summary

  • implement SampledMatch in bd-log-matcher with deterministic RNG abstraction and tests
  • implement StartTracing lifecycle in workflows/logger, including persisted active-state behavior and streaming carryover handling
  • expose tracing on/off via is_tracing_active and remove transition-only start_trace_sampling signaling
  • add comprehensive workflow and logger integration coverage for tracing lifecycle semantics

Validation

  • cargo clippy -p bd-workflows --tests -- --no-deps
  • cargo nextest run -p bd-workflows
  • cargo nextest run -p bd-logger
  • cargo nextest run --workspace

Fixes BIT-7464

Signed-off-by: Matt Klein <mklein@bitdrift.io>
return true;
}

rng.random_u32(SAMPLE_RATE_DENOMINATOR) < sample_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this would make this a bit more complicated but should we do what envoy does and roll the dice once per workflow execution to avoid different sampling decisions to be reached by different workflows evaluating the same workflow? With multiple workflows using a sample node you would roll the dice N times and since we dedupe counters it would then appear that we are counting more than the sample rate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not sure it's worth it we can discuss. We would have to persist the random value. Realistically I can't see this being used more than once per workflow but dunno.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking about doing it once per workflow but once per log such that multiple workflows would make a consistent sampling decision, like in the A->B(sampled) workflow and A->B(sampled)->C case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sure that's easy. Can do.

self.state.active_run_tracing_count = 0;
self.state.active_streaming_tracing_count = 0;
// TODO(mattklein123): Switch pending flush actions to a map.
self.state.pending_flush_actions = std::mem::take(&mut self.state.pending_flush_actions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand, if tracing is its own action why does it seem so tied to the flush actions?

0
};

let traced_run_ended =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what we're saying is that tracing extends to any streaming logs that may be emitted while we're tracing? Maybe add a comment explaining the interaction somewhere unless I missed it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly we decided that tracing would extend through the post flush stream. I will add more comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants