Commit cea31d3
committed
fix(resources): prevent indefinite blocking on cloud resource cleanup during deletion
When ensureCloudResourcesDestroyed() attempts to clean up guest cluster
resources, it queries the guest cluster's KubeAPIServer. If the KubeAPIServer
is already deleted during cluster deletion, these operations fail with
connection errors, causing the CloudResourcesDestroyed condition to never
become True, which blocks cluster deletion indefinitely.
This fix implements two safety mechanisms to handle KubeAPIServer unavailability:
1. Early KAS check: Verify the kube-apiserver deployment exists in the control
plane namespace before attempting cleanup. If not found, skip cleanup
immediately as the guest cluster is already gone.
2. Connection error tracking: Track consecutive connection failures in-memory
and skip cleanup after 5 attempts or 5 minutes, whichever comes first. This
prevents infinite retry loops when the KubeAPIServer is unreachable.
Key implementation details:
- Added isKubeAPIServerAvailable() to check KAS deployment existence using
the control plane client
- Added isConnectionError() using proper K8s API errors (IsTimeout,
IsServerTimeout, IsServiceUnavailable) and Go's net.Error interface instead
of string matching
- Implemented in-memory failure tracking with cleanupFailureTracker to avoid
persisting state and potential API errors
- Failure tracker is NOT reset when skipping due to max failures/timeout to
prevent condition flip-flopping on subsequent reconciliations
- Added comprehensive unit tests covering KAS unavailability, connection error
detection, and failure tracking
The implementation ensures stable CloudResourcesDestroyed condition status,
allowing cluster deletion to proceed even when the guest cluster API is
unavailable.
Signed-off-by: Mulham Raee <[email protected]>
Assisted-by: Claude 4.5 Sonnet (via Cursor)1 parent a78c79e commit cea31d3
File tree
7 files changed
+790
-32
lines changed- api/hypershift/v1beta1
- control-plane-operator
- controllers/hostedcontrolplane
- hostedclusterconfigoperator/controllers/resources
- support/util
- vendor/github.com/openshift/hypershift/api/hypershift/v1beta1
7 files changed
+790
-32
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
249 | 249 | | |
250 | 250 | | |
251 | 251 | | |
| 252 | + | |
| 253 | + | |
252 | 254 | | |
253 | 255 | | |
254 | 256 | | |
| |||
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2786 | 2786 | | |
2787 | 2787 | | |
2788 | 2788 | | |
| 2789 | + | |
| 2790 | + | |
| 2791 | + | |
| 2792 | + | |
| 2793 | + | |
| 2794 | + | |
| 2795 | + | |
2789 | 2796 | | |
2790 | 2797 | | |
2791 | 2798 | | |
| |||
0 commit comments