Koji builds cancellation#3000
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces functionality to cancel previous Koji builds when a new one is triggered. This is a useful feature to save resources by avoiding redundant builds. The implementation adds methods to find running builds and logic in the handlers to cancel them, controlled by the CANCEL_RUNNING_JOBS environment variable.
My review focuses on code duplication and consistency. I've identified an opportunity to refactor duplicated build cancellation logic into a shared method. I also suggest a minor change to improve code consistency.
Please note that the PR description indicates that tests for this new functionality are not yet included. Adding tests would be crucial to ensure the feature works as expected and to prevent future regressions.
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 46s |
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 54s |
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 52s |
|
@nforro I've cleaned the implementation and resolved the comments. One last issue remains: should we also implement cancellation for downstream production builds? I'm not sure it makes sense, since production builds are triggered by distgit MR merges. |
I don't understand this reasoning 🙂 Why should that matter? First, it's not just PR merges but pushes in general (depending on configuration, you are correct that by default it's only Packit-created PR merges), and second, even if in most cases a next push won't happen before a build is finished, there are packages that build for hours and for them cancelling obsolete builds brings the biggest benefit (unlocking resources for the current build). |
|
Oh, I see. I was mentally locked into a "one MR" situation (i.e., a user needs to restart an already failed Koji build because the failure wasn't caused by the MR itself, but rather by an infrastructure error or a missing dependency), and didn't consider builds obsoleted by a subsequent MR merge. So for production builds, basically all Koji builds with older NVRs should be cancelled, right? |
How is this relevant here? I thought this was about cancelling running builds, not retriggering failed ones.
It's the exact same situation as for upstream Koji builds with |
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 47s |
|
@nforro OK, I've pushed (just another) final version :) Getting currently running builds is simplified, only one method is used which filters according to project event type+event_id. |
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 51s |
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 50s |
Nice, thanks! Could you also add integration tests, ideally for all cases? |
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 52s |
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 50s |
|
Added some integration tests for all three handlers. They are quite complicated, I needed to mock a lot of infrastructure. Inspiration came from |
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 47s |
| self.report_status_to_build_for_chroot( | ||
| state=BaseCommitStatus.canceled, | ||
| description="Build was canceled.", | ||
| url=get_koji_build_info_url(build.id), | ||
| chroot=build.target, | ||
| ) |
There was a problem hiding this comment.
I'm afraid this won't work for Fedora CI, base check_name deduced from chroot will be different and target_branch will be None when passed to set_status():
packit-service/packit_service/worker/reporting/reporters/pagure.py
Lines 27 to 36 in c1b2dfb
There was a problem hiding this comment.
I see, good catch! It's even more complicated than I originally thought. Upstream koji builds and downstream scratch builds use completely different reporting mechanisms. The currently used self.report_status_to_build_for_chroot() is only usable for upstream builds, while downstream builds rely on the FedoraCIHelper.report() method with different check name construction (they use a hardwired string, not chroot based one).
I don't see any other option for implementing reporting than adding a new callback parameter (e.g., report_canceled) to the cancel_running_builds() method containing a callable that would be used for cancellation reporting. Since reporting only matters for downstream scratch builds (there is no reporting for downstream production builds, and upstream builds statuses are automagically hidden by github/gitlab since they are bound to a commit sha), I'd only set the callback there (for now).
There was a problem hiding this comment.
upstream builds statuses are automagically hidden by github/gitlab since they are bound to a commit sha
Are we 100% sure this is always the case? I think reporting canceled status wouldn't hurt. OTOH on GitHub this would count towards rate limits, so I suppose it makes sense not to implement it. However, I would explicitly mention that in a comment.
There was a problem hiding this comment.
Well, I have been 100% sure multiple times in this PR... But a comment is reasonable.
| @pytest.fixture | ||
| def mock_pr_functionality(request): | ||
| packit_yaml = "{'specfile_path': 'the-specfile.spec', 'jobs':" + str(request.param) + "}" | ||
| flexmock( | ||
| GithubProject, | ||
| full_repo_name="packit/hello-world", | ||
| get_file_content=lambda path, ref, headers: packit_yaml, | ||
| get_files=lambda ref, filter_regex: ["the-specfile.spec"], | ||
| get_web_url=lambda: "https://github.com/the-namespace/the-repo", | ||
| get_pr=lambda pr_id: flexmock(head_commit="12345"), | ||
| ) | ||
| flexmock(Github, get_repo=lambda full_name_or_id: None) | ||
|
|
||
| pr_model = ( | ||
| flexmock(PullRequestModel(pr_id=123)) | ||
| .should_receive("get_project_event_models") | ||
| .and_return([flexmock(commit_sha="12345")]) | ||
| .mock() | ||
| ) | ||
| project_event = ( | ||
| flexmock(ProjectEventModel(type=JobConfigTriggerType.pull_request, id=123456)) | ||
| .should_receive("get_project_event_object") | ||
| .and_return(pr_model) | ||
| .mock() | ||
| .should_receive("set_packages_config") | ||
| .mock() | ||
| ) | ||
| flexmock(LocalProject, refresh_the_arguments=lambda: None) | ||
| flexmock(ProjectEventModel).should_receive("get_or_create").and_return( | ||
| project_event, | ||
| ) | ||
| flexmock(ProjectEventModel).should_receive("get_by_id").with_args( | ||
| 123456, | ||
| ).and_return(project_event) | ||
| flexmock(PullRequestModel).should_receive("get_or_create").with_args( | ||
| pr_id=123, | ||
| namespace="packit", | ||
| repo_name="hello-world", | ||
| project_url="https://github.com/packit/hello-world", | ||
| ).and_return(pr_model) | ||
| flexmock(PullRequestModel).should_receive("get_by_id").with_args(123).and_return( | ||
| pr_model, | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
I don't think it makes sense to move only one fixture of three closely related ones.
There was a problem hiding this comment.
The reason is to re-use this fixture in multiple test files. But if that's a problem, I guess I could rearrange the tests. I created a new test_koji_build_cancel.py file to keep all three handlers tests together, but there is only small advantage of such layout (if any). I can move the tests to test_koji_build.py / test_dg_pr.py.
There was a problem hiding this comment.
That's not my point. There are 3 correlated fixtures that are now defined in different places.
There was a problem hiding this comment.
Yes, sure. And your point is valid. I'm just trying to figure out what approach would be the best:
-
move all three correlated fixtures to conftest - I don't like it much because two of these fixtures would be used only by one file.
-
move the upstream cancellation test to test_check_rerun.py - not great, cancellation is kind of out of place there. Downstream tests could go to test_koji_build.py which is fine.
-
keep all three of them in test_koji_build_cancel.py, and duplicate the needed setup in a local fixture there (or inline it in the test itself). This is solution I'm currently leaning towards.
There was a problem hiding this comment.
- move all three correlated fixtures to conftest - I don't like it much because two of these fixtures would be used only by one file.
Why is that a problem? In the future some test might require one of those fixtures and it will be already available.
There was a problem hiding this comment.
It's not a problem - I just didn't like it. But I guess it's cleaner then splitting those correlated fixtures or introducing duplicated code.
This change makes KojiBuildJobHelper instance available also for downstream Koji builds handlers. To avoid MRO conflicts, the base class of GetKojiBuildJobHelperMixin was changed from ConfigFromEventMixin to Config (the common Protocol). ConfigFromEventMixin was then added explicitly to KojiBuildHandler, which previously received it via GetKojiBuildJobHelperMixin. Signed-off-by: Marek Blaha <mblaha@redhat.com>
The implementation does not rely on `commit_sha_before`, which is not set for some events (such as PR comments, check reruns, Pagure events, etc.). This patch uses project event type + event_id to identify all obsolete builds instead. Otherwise it is similar to what's already present for Copr builds. For: packit#2989 Signed-off-by: Marek Blaha <mblaha@redhat.com>
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 51s |
nforro
left a comment
There was a problem hiding this comment.
Thanks! Let's test this on stage.
|
Build succeeded (gate pipeline). ✔️ pre-commit SUCCESS in 1m 49s |
TODO:
Fixes #2989
RELEASE NOTES BEGIN
Packit now cancels obsoleted Koji builds before starting new ones.
RELEASE NOTES END