-
Notifications
You must be signed in to change notification settings - Fork 255
chore: ensure we patch RGD status for any Reconcile errors #702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore: ensure we patch RGD status for any Reconcile errors #702
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: michaelhtm The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @michaelhtm. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| var resourcesInformation []v1alpha1.ResourceInformation | ||
| defer func() { | ||
| if err := r.updateStatus(ctx, o, topologicalOrder, resourcesInformation); err != nil { | ||
| reconcileErr = errors.Join(reconcileErr, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did your deferlinter catch this @jakobmoellerdev ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, https://github.com/jakobmoellerdev/deferrlint spotted this:
deferrlint ./...
.../kro/pkg/controller/instance/controller_reconcile.go:120:6: deferred function assigns to error-typed variable `err`, but it is not a named return value
.../kro/pkg/controller/resourcegraphdefinition/controller.go:166:6: deferred function assigns to error-typed variable `err`, but it is not a named return value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say this is a strange one and the linter doesnt detect this correctly though, I believe reconcileErr should be fine here because it is reassigned
|
/ok-to-test |
8a13805 to
1f5e594
Compare
44318ed to
3929110
Compare
| if !apierrors.IsNotFound(err) { | ||
| reconcileErr = errors.Join(reconcileErr, err) | ||
| } else { | ||
| reconcileErr = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would you set reconcileErr to nil explicitly here? because youre using err there is no conflict anyhow right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I hate to be a nit picker here, but would it be possible to write a test to make sure the RGD status is patched on error case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add tests for patch behavior, but currently there's no package that mocks finalize failures.
why would you set reconcileErr to nil explicitly here?
The main reason was to avoid triggering another reconciliation if the resource no longer exists..
During RGD reconcile, if there are any issues when setting RGD CR as managed or unmanaged, any issues that may occur are not reflected on the RGD CR status. With these changes, we will ensure any issues are patched to the RGD status
a80ed36 to
32114c6
Compare
| defer func() { | ||
| // if the updateStatus error is a k8s NotFound error we want to skip it. | ||
| // If not, we will merge it to the returned error | ||
| if patchErr := r.updateStatus(ctx, o, topologicalOrder, resourcesInformation); patchErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this updateStatus is happening in a deferred func, now we ALWAYS will run this function. This does not match the behavior that exists below. For example, if !o.DeletionTimestamp.IsZero(), we run r.cleanupResourceGraphDefinition instead and then return.
With this change, we would run r.cleanupResourceGraphDefinition, try to return, and then try to run r.updateStatus afterwards which definitely isn't correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is mainly addressing issues when either cleanup or finalizer add/remove fails. In those cases, we want kro to report this information in the RGD status. This would allow users to see any issues in the resource status instead of looking for them in the logs..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doesn't this now run even if the cleanup succeeds where it wasn't before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does. The information of the successful cleanup may be useful to users. Especially if the RGD has other finalizers that are preventing it from being removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so it's explicit/intentional that we are changing from only updating this sometime to always updating it. If that's the case then this makes sense
| const ( | ||
| // ResourceGraphDefinitionConditionTypeGraphVerified indicates the state of the directed | ||
| // acyclic graph (DAG) that kro uses to manage the resources in a | ||
| // ResourceGraphDefinition. | ||
| ResourceGraphDefinitionConditionTypeGraphVerified ConditionType = "GraphVerified" | ||
| // ResourceGraphDefinitionConditionTypeCustomResourceDefinitionSynced indicates the state of the | ||
| // CustomResourceDefinition (CRD) that kro uses to manage the resources in a | ||
| // ResourceGraphDefinition. | ||
| ResourceGraphDefinitionConditionTypeCustomResourceDefinitionSynced ConditionType = "CustomResourceDefinitionSynced" | ||
| // ResourceGraphDefinitionConditionTypeReconcilerReady indicates the state of the reconciler. | ||
| // Whenever an ResourceGraphDefinition resource is created, kro will spin up a | ||
| // reconciler for that resource. This condition indicates the state of the | ||
| // reconciler. | ||
| ResourceGraphDefinitionConditionTypeReconcilerReady ConditionType = "ReconcilerReady" | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not used anywhere in the code base
| if apierrors.IsNotFound(patchErr) { | ||
| // we don't want to return an error and requeue if | ||
| // rgd is not found (deleted) | ||
| err = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore the error if resource no longer exists
| defer func() { | ||
| // if the updateStatus error is a k8s NotFound error we want to skip it. | ||
| // If not, we will merge it to the returned error | ||
| if patchErr := r.updateStatus(ctx, o, topologicalOrder, resourcesInformation); patchErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is mainly addressing issues when either cleanup or finalizer add/remove fails. In those cases, we want kro to report this information in the RGD status. This would allow users to see any issues in the resource status instead of looking for them in the logs..
32114c6 to
41ad875
Compare
| mark.ResourceGraphUnfinalized(err.Error()) | ||
| return ctrl.Result{}, err | ||
| } | ||
| mark.ResourceGraphFinalized("added") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, ResourceGraphFinalized feels a little off here, semantically. Still rather a nit though, because the message describes it reasonably well.
| // if the updateStatus error is a k8s NotFound error we want to skip it. | ||
| // If not, we will merge it to the returned error | ||
| if patchErr := r.updateStatus(ctx, o, topologicalOrder, resourcesInformation); patchErr != nil { | ||
| if apierrors.IsNotFound(patchErr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an additional check for err == nil here, to prevent suppressing potential other errors? Or maybe at least log err if err != nil?
| if apierrors.IsNotFound(patchErr) { | |
| if apierrors.IsNotFound(patchErr) && err == nil { |
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Description:
During RGD reconcile, if there are any issues when setting RGD CR as
managed or unmanaged, any issues that may occur are not reflected on the
RGD CR status.
As part of these changes, I'm introducing two new conditions.
FinalizedCleanedUpFinalizedwill be marked to True if the controller was successfully able to add/remove the kro finalizer from the RGD.If not, it will set it to false with the reason for error.
CleanedUpwill mainly be used during RGD deletions. kro will set this condition to False if it was not able to do a clean up on RGD deletion. If the cleanup is successful, kro will set the condition to True