-
Notifications
You must be signed in to change notification settings - Fork 158
tide: fallback to individual PR merge in batch failures #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Prucek The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for k8s-prow ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
/cc |
pkg/tide/github.go
Outdated
| failed = append(failed, pr.Number) | ||
| errs = append(errs, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks very suspicious to me. This condition handles an actual failure to merge something that should be mergeable in the first place, otherwise we would not be here (btw, this is the place where Tide hits the annoying errors where it attempts to merge something blocked by incompatible branch protection settings, like #269). That's why the (now removed) "these are user errors" comment was here. Maybe we need to treat this case differently when we are trying to merge a batch, but for non-batch merges I don't see why this behavior would need to change.
I'll need to doublecheck.
/hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the conditions to only if len(merged) > 0
| enableScheduling: true, | ||
| }, | ||
| { | ||
| name: "batch merge fails, falls back to individual PR merge (issue #474)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this what the "batch merge errors but continues if a PR is unmergeable" test on L1899 is supposed to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a situation where something was merged (merged: 2), but the bug I am trying to solve is that nothing got merged
6233070 to
ed28fed
Compare
| // If the entire batch failed to merge (e.g., due to conflicts), fall through | ||
| // to attempt merging individual PRs to break the tie. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#474 is really weird to me and I'm really not sure if the change proposed in this PR actually helps with anything.
Have you managed to reproduce or at least really understand what was actually happening in #474 (Tide logs would really help with that, it's a pity we haven't captured them when the problem was happening).
My thinking is: if we are here, then we have at least one successful batch job of a presubmit that is required for merge. That implies that all pull requests involved must be git-mergeable (not necessarilty GH-mergeable) without conflicts, otherwise the batch job would fail right at the beginning, when clonerefs merges all involved pulls into the base ref. If that happened, we would not have a successful batch job and we would not get here to try merge PRs based on that fact.
Additionally, if at least one PR from the batch is GH-mergeable (and therefore, if the fallback single merge on L1448 has any chance of succeeding) , it would merge already. The PRs are attempted to be merged sequentially and the code tracks keepTrying for merge failures that should only affect that single PR. It only returns "stop trying" on "we do not have permissions to merge" errors and "merge commits are forbidden" errors. Otherwise it continues and tries to merge other PRs. If none of the PRs merged it means that all of them failed to merge individually (and fallback individual merge has no chance of merging) or Tide hit one of the hard errors (and fallback merge has no chance of merging).
This is all really strange to me, because I remember that we actually saw #474 when it was happening. I just do not understand what is the actual failure case from which the user can recover through individual merges.
I think the idea that if we once have a passing batch job that fails to merge all its PRs then Tide gets stuck on this and does not let any other PR to merge (because it will always prefer to act on the passing Batch job it sees) is sound, I just do not understand the conditions that lead to that.
Fixes: #474
This PR adds a fallback to merging an individual PR, when a batch fails to merge.
Before, when a Batch failed to merge, it didn't take into account that a failure could occur, the action in
takeActionwas alwaysMergeBatchregardless of the error. This sometimes led to /hold-ing one of the PRs from the batch, and then it would merge.Now, it will take one of the PRs and it will try to merge it.
🤖 Assisted by Claude.