Skip to content

Conversation

@Prucek
Copy link
Member

@Prucek Prucek commented Oct 30, 2025

This PR adds more metrics to see how much time tide spends retesting.

🤖 Assisted by Claude.

@netlify
Copy link

netlify bot commented Oct 30, 2025

Deploy Preview for k8s-prow ready!

Name Link
🔨 Latest commit 8f0cb89
🔍 Latest deploy log https://app.netlify.com/projects/k8s-prow/deploys/69048cd83e6f770008e24316
😎 Deploy Preview https://deploy-preview-539--k8s-prow.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Prucek
Once this PR has been reviewed and has the lgtm label, please assign cjwagner for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the area/tide Issues or PRs related to prow's tide component label Oct 30, 2025
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 30, 2025
@Prucek
Copy link
Member Author

Prucek commented Oct 30, 2025

/test pull-prow-unit-test-race-detector-nonblocking

pkg/tide/tide.go Outdated
Comment on lines 300 to 308
retestsByAction: prometheus.NewCounterVec(prometheus.CounterOpts{
Name: "tide_retests_by_action_total",
Help: "Total number of retests by action type (TRIGGER for serial, TRIGGER_BATCH for batch). Helps identify whether batch or serial testing is causing more retests.",
}, []string{
"org",
"repo",
"branch",
"action",
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need both tide_retests_by_action_total and tide_retests_total, if you need the latter you can get it from the former by aggregating. We should only have tide_retests_total but let it have action label.

pkg/tide/tide.go Outdated
Comment on lines 1792 to 1793
if len(batchPending) == 0 && len(missings) > 0 && len(pendings) == 0 {
if act == Trigger {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why this indicates a batch failure? I think we can be in this state simply because the HEAD moved and now a PR needs to be (maybe individually, not necessarily in a batch) retested?

pkg/tide/tide.go Outdated
Comment on lines 309 to 316
batchFailures: prometheus.NewCounterVec(prometheus.CounterOpts{
Name: "tide_batch_failures_total",
Help: "Total number of times a batch test completes but PRs move back to missing state (indicating batch failure). This is expensive as all PRs in the batch need retesting.",
}, []string{
"org",
"repo",
"branch",
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need to be a Tide metric? Batch jobs are a separate type of a ProwJob, I guess we already have that number from crier (or plank maybe) in a metric that tracks job results?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right: pkg/kube/metrics.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tide Issues or PRs related to prow's tide component cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants