Skip to content

Conversation

@michaelhtm
Copy link
Contributor

@michaelhtm michaelhtm commented Sep 29, 2025

Description:
During RGD reconcile, if there are any issues when setting RGD CR as
managed or unmanaged, any issues that may occur are not reflected on the
RGD CR status.

As part of these changes, I'm introducing two new conditions.

  • Finalized
  • CleanedUp

Finalized will be marked to True if the controller was successfully able to add/remove the kro finalizer from the RGD.
If not, it will set it to false with the reason for error.

CleanedUp will mainly be used during RGD deletions. kro will set this condition to False if it was not able to do a clean up on RGD deletion. If the cleanup is successful, kro will set the condition to True

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Sep 29, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: michaelhtm / name: Michael Tewoldemedhin (41ad875, 57b20ba)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: michaelhtm
Once this PR has been reviewed and has the lgtm label, please assign matthchr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 29, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @michaelhtm. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Sep 29, 2025
var resourcesInformation []v1alpha1.ResourceInformation
defer func() {
if err := r.updateStatus(ctx, o, topologicalOrder, resourcesInformation); err != nil {
reconcileErr = errors.Join(reconcileErr, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did your deferlinter catch this @jakobmoellerdev ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, https://github.com/jakobmoellerdev/deferrlint spotted this:

deferrlint ./...
.../kro/pkg/controller/instance/controller_reconcile.go:120:6: deferred function assigns to error-typed variable `err`, but it is not a named return value
.../kro/pkg/controller/resourcegraphdefinition/controller.go:166:6: deferred function assigns to error-typed variable `err`, but it is not a named return value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say this is a strange one and the linter doesnt detect this correctly though, I believe reconcileErr should be fine here because it is reassigned

@barney-s
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 29, 2025
@michaelhtm michaelhtm force-pushed the chore/patchrgdstatusforanyerror branch from 8a13805 to 1f5e594 Compare September 29, 2025 22:49
@michaelhtm michaelhtm force-pushed the chore/patchrgdstatusforanyerror branch 2 times, most recently from 44318ed to 3929110 Compare September 30, 2025 01:14
if !apierrors.IsNotFound(err) {
reconcileErr = errors.Join(reconcileErr, err)
} else {
reconcileErr = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would you set reconcileErr to nil explicitly here? because youre using err there is no conflict anyhow right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I hate to be a nit picker here, but would it be possible to write a test to make sure the RGD status is patched on error case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add tests for patch behavior, but currently there's no package that mocks finalize failures.

why would you set reconcileErr to nil explicitly here?

The main reason was to avoid triggering another reconciliation if the resource no longer exists..

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 2, 2025
During RGD reconcile, if there are any issues when setting RGD CR as
managed or unmanaged, any issues that may occur are not reflected on the
RGD CR status.

With these changes, we will ensure any issues are patched to the RGD
status
@michaelhtm michaelhtm force-pushed the chore/patchrgdstatusforanyerror branch from a80ed36 to 32114c6 Compare October 2, 2025 17:17
defer func() {
// if the updateStatus error is a k8s NotFound error we want to skip it.
// If not, we will merge it to the returned error
if patchErr := r.updateStatus(ctx, o, topologicalOrder, resourcesInformation); patchErr != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this updateStatus is happening in a deferred func, now we ALWAYS will run this function. This does not match the behavior that exists below. For example, if !o.DeletionTimestamp.IsZero(), we run r.cleanupResourceGraphDefinition instead and then return.

With this change, we would run r.cleanupResourceGraphDefinition, try to return, and then try to run r.updateStatus afterwards which definitely isn't correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is mainly addressing issues when either cleanup or finalizer add/remove fails. In those cases, we want kro to report this information in the RGD status. This would allow users to see any issues in the resource status instead of looking for them in the logs..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't this now run even if the cleanup succeeds where it wasn't before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. The information of the successful cleanup may be useful to users. Especially if the RGD has other finalizers that are preventing it from being removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it's explicit/intentional that we are changing from only updating this sometime to always updating it. If that's the case then this makes sense

Comment on lines -27 to -42
const (
// ResourceGraphDefinitionConditionTypeGraphVerified indicates the state of the directed
// acyclic graph (DAG) that kro uses to manage the resources in a
// ResourceGraphDefinition.
ResourceGraphDefinitionConditionTypeGraphVerified ConditionType = "GraphVerified"
// ResourceGraphDefinitionConditionTypeCustomResourceDefinitionSynced indicates the state of the
// CustomResourceDefinition (CRD) that kro uses to manage the resources in a
// ResourceGraphDefinition.
ResourceGraphDefinitionConditionTypeCustomResourceDefinitionSynced ConditionType = "CustomResourceDefinitionSynced"
// ResourceGraphDefinitionConditionTypeReconcilerReady indicates the state of the reconciler.
// Whenever an ResourceGraphDefinition resource is created, kro will spin up a
// reconciler for that resource. This condition indicates the state of the
// reconciler.
ResourceGraphDefinitionConditionTypeReconcilerReady ConditionType = "ReconcilerReady"
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not used anywhere in the code base

if apierrors.IsNotFound(patchErr) {
// we don't want to return an error and requeue if
// rgd is not found (deleted)
err = nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore the error if resource no longer exists

defer func() {
// if the updateStatus error is a k8s NotFound error we want to skip it.
// If not, we will merge it to the returned error
if patchErr := r.updateStatus(ctx, o, topologicalOrder, resourcesInformation); patchErr != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is mainly addressing issues when either cleanup or finalizer add/remove fails. In those cases, we want kro to report this information in the RGD status. This would allow users to see any issues in the resource status instead of looking for them in the logs..

@michaelhtm michaelhtm force-pushed the chore/patchrgdstatusforanyerror branch from 32114c6 to 41ad875 Compare October 8, 2025 23:10
Comment on lines +192 to +195
mark.ResourceGraphUnfinalized(err.Error())
return ctrl.Result{}, err
}
mark.ResourceGraphFinalized("added")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, ResourceGraphFinalized feels a little off here, semantically. Still rather a nit though, because the message describes it reasonably well.

// if the updateStatus error is a k8s NotFound error we want to skip it.
// If not, we will merge it to the returned error
if patchErr := r.updateStatus(ctx, o, topologicalOrder, resourcesInformation); patchErr != nil {
if apierrors.IsNotFound(patchErr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an additional check for err == nil here, to prevent suppressing potential other errors? Or maybe at least log err if err != nil?

Suggested change
if apierrors.IsNotFound(patchErr) {
if apierrors.IsNotFound(patchErr) && err == nil {

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 16, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants