Skip to content

Conversation

@samuel-esp
Copy link
Collaborator

Motivation

When the list of resources to be processed is large (e.g., 2000+ resources), the KubeDownscaler algorithm may take up to half an hour to complete its run. This can lead to the following scenario:

  1. All resources are initially retrieved.
  2. As KubeDownscaler processes each resource, a resource that was previously retrieved may be modified by another entity (such as an HPA, Keda, or through manual intervention).
  3. When KubeDownscaler attempts to process this modified resource, it encounters the following error: "the object has been modified; please apply your changes to the latest version and try again".

Even if KubeDownscaler will process the resource again during the next iteration, this could be a blocking behavior when --once argument is enabled on a large cluster. It could lead to some resources not being correctly processed

Changes

Added a new argument --max-retries-on-conflict that allows the user to specify how many times the resource update should be retried before moving on to the next resource

Tests done

  • Unit Tests
  • Built custom image and tested on a test cluster

TODO

  • I've assigned myself to this PR

@samuel-esp samuel-esp added the enhancement New feature or request label Nov 16, 2024
@samuel-esp samuel-esp self-assigned this Nov 16, 2024
@samuel-esp samuel-esp linked an issue Nov 16, 2024 that may be closed by this pull request
@samuel-esp samuel-esp marked this pull request as ready for review November 17, 2024 15:56
@samuel-esp samuel-esp requested a review from JTaeuber November 20, 2024 10:48
Copy link
Member

@JTaeuber JTaeuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change for consistency. Other than that LGTM

@samuel-esp samuel-esp requested a review from JTaeuber November 22, 2024 17:42
Copy link
Member

@JTaeuber JTaeuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@samuel-esp samuel-esp merged commit 92416ee into caas-team:main Nov 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resource Processing Failed Due to Concurrent Update

2 participants