Skip to content

Conversation

@norbertcyran
Copy link
Contributor

@norbertcyran norbertcyran commented Oct 17, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR is a part of granular resource limits initiative (#8703). It implements the foundation for the new resource quotas system. The legacy system supports only cluster-wide resource limits coming from the cloud provider. This PR introduces possibility to provide multiple quotas that can apply to different subset of nodes.

For now, the new package is not integrated with the rest of the codebase. This is done on purpose to safely ship the new system in smaller chunks. Therefore, this PR does not introduce any user-facing changes.

Which issue(s) this PR fixes:

Part of #8703.

Special notes for your reviewer:

This PR ended up larger than I expected. Caching of node deltas, support for storage and ephemeral storage, and integration with scale up and scale down will be implemented in the next PRs. See the proposal #8702 for more details.

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/cluster-autoscaler labels Oct 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: norbertcyran
Once this PR has been reviewed and has the lgtm label, please assign feiskyer for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 17, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @norbertcyran. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 17, 2025
@norbertcyran norbertcyran changed the title New resource limits Granular resource limits Oct 17, 2025
@norbertcyran norbertcyran changed the title Granular resource limits [WI{Granular resource limits Oct 17, 2025
@norbertcyran norbertcyran changed the title [WI{Granular resource limits [WIP] Granular resource limits Oct 17, 2025
@norbertcyran
Copy link
Contributor Author

FYI: not ready for review yet

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 30, 2025
@norbertcyran norbertcyran changed the title [WIP] Granular resource limits [Granular resource limits] Add support for granular resource quotas Oct 30, 2025
@norbertcyran norbertcyran marked this pull request as ready for review October 30, 2025 15:12
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2025
@k8s-ci-robot k8s-ci-robot requested a review from elmiko October 30, 2025 15:12
@norbertcyran
Copy link
Contributor Author

FYI: not ready for review yet

Ready now

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code is looking good to me, i have a couple questions. i like the tests too.

continue
}

if limitsLeft < resourceDelta*int64(nodeDelta) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not following the math here, could you explain what resourceDelta*int64(nodeDelta) is calculating?

i might be confused about nodeDelta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeDelta is the number of nodes (of the same shape) to be added to the cluster, resourceDelta is the quantity of a specific resources in a node of that shape. For instance, if we want to add 3 nodes with 4 CPU each, resourceDelta*int64(nodeDelta) will evaluate to 12. This condition basically checks if adding 12 CPUs to the cluster would exceed the limit

Perhaps it would be cleaner to call these nodesToBeAdded and resourcesToBeAdded or something similar. However, I'm thinking about adding support for negative deltas later on to remove duplication in the scale down logic (https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/scaledown/planner/planner.go#L164, https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/scaledown/resource/limits.go).

I can add some comments to clarify what deltas mean, unless you have other suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, thank you for the explanation. it makes sense to me now.

Perhaps it would be cleaner to call these nodesToBeAdded and resourcesToBeAdded or something similar.

i like this, perhaps names that are more descriptive with what is planned next, but this would definitely help with readability.

I can add some comments to clarify what deltas mean, unless you have other suggestions?

i think changing the variable names would help, and i also like having more comments here. i think even something as brief as what you described here would be helpful.

// NewQuotasTracker calculates resources used by the nodes for every
// quota returned by the Provider. Then, based on usages and limits it calculates
// how many resources can be still added to the cluster. Returns a Tracker object.
func (f *TrackerFactory) NewQuotasTracker(ctx *context.AutoscalingContext, nodes []*corev1.Node) (*Tracker, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a question of curiosity, is the intention that a new Tracker will be created on each scan interval of the core?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will probably be created here, replacing the legacy logic: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/scaleup/orchestrator/orchestrator.go#L124

Performance-wise it's not ideal, but it's not very different from the current logic, except that the loop over nodes will be repeated over all quotas. Still, the complexity will be negligible compared to scheduling simulations and bin-packing. Ideally we'd have a goroutine updating the tracker state in the background, but that seems like a lot of effort and edge cases related to consistency. At this point, I would say it would be a premature optimization, but we might want to improve it in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thank you for the explanation =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants