-
Notifications
You must be signed in to change notification settings - Fork 158
tide: add more metrics #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for k8s-prow ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Prucek The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test pull-prow-unit-test-race-detector-nonblocking |
pkg/tide/tide.go
Outdated
| retestsByAction: prometheus.NewCounterVec(prometheus.CounterOpts{ | ||
| Name: "tide_retests_by_action_total", | ||
| Help: "Total number of retests by action type (TRIGGER for serial, TRIGGER_BATCH for batch). Helps identify whether batch or serial testing is causing more retests.", | ||
| }, []string{ | ||
| "org", | ||
| "repo", | ||
| "branch", | ||
| "action", | ||
| }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we need both tide_retests_by_action_total and tide_retests_total, if you need the latter you can get it from the former by aggregating. We should only have tide_retests_total but let it have action label.
pkg/tide/tide.go
Outdated
| if len(batchPending) == 0 && len(missings) > 0 && len(pendings) == 0 { | ||
| if act == Trigger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why this indicates a batch failure? I think we can be in this state simply because the HEAD moved and now a PR needs to be (maybe individually, not necessarily in a batch) retested?
pkg/tide/tide.go
Outdated
| batchFailures: prometheus.NewCounterVec(prometheus.CounterOpts{ | ||
| Name: "tide_batch_failures_total", | ||
| Help: "Total number of times a batch test completes but PRs move back to missing state (indicating batch failure). This is expensive as all PRs in the batch need retesting.", | ||
| }, []string{ | ||
| "org", | ||
| "repo", | ||
| "branch", | ||
| }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need to be a Tide metric? Batch jobs are a separate type of a ProwJob, I guess we already have that number from crier (or plank maybe) in a metric that tracks job results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right: pkg/kube/metrics.go
ec69c65 to
8f0cb89
Compare
This PR adds more metrics to see how much time tide spends retesting.
🤖 Assisted by Claude.