OCPBUGS-75869: kubelet: Less aggressive low memory reservation#5716
OCPBUGS-75869: kubelet: Less aggressive low memory reservation#5716sdodson merged 1 commit intoopenshift:mainfrom
Conversation
Out of the box a standard OpenShift worker has about 3000 Mi of unevictable workload. Thus when we reserve 2GiB on an 8GiB instance that node will not autoscale down because it never drops below the 50% usage threshold. Therefore, lets reduce the system reserved on the lowest end. The assumption here is that nodes this small are less likely to run the full 250 pods and actually consume the full set of resources. We should make sure that this aligns with our understanding of the problem we're trying to solve by enabling dynamic resource reservation in the first place, which I believe is the fact that massive nodes were only getting 1GiB of reserved memory despite running hundreds of pods. Here's the difference in memory reservation at common sizes : | Total | Old Reserved | New Reserved | | ----- | ------------ | ------------ | | 8 | 2 | 1 | | 16 | 3 | 1.48 | | 32 | 4 | 2.44 | | 64 | 5 | 4.36 | | 128 | 9 | 8.2 | | 256 | 12 | 10.44 | | 512 | 17 | 15.56 | | 1024 | 27 | 25.8 | | 2048 | 48 | 46.28 |
|
Skipping CI for Draft Pull Request. |
|
@sdodson: This pull request references Jira Issue OCPBUGS-75869, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sdodson: This pull request references Jira Issue OCPBUGS-75869, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sdodson: This pull request references Jira Issue OCPBUGS-75869, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
The configuration of the auto-node sizing will be covered as part of long running test:
These are additional tasks that can be taken up post this merge:
|
|
/payload-job periodic-ci-openshift-release-master-ci-4.22-e2e-aws-ovn-techpreview-serial-2of3 periodic-ci-openshift-release-master-ci-4.22-e2e-aws-ovn-techpreview-serial-3of3 Running additional tests that were attempted during the auto-node sizing |
|
@ngopalak-redhat: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/payload-job periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-2of3 periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-techpreview-serial-3of3 |
|
@ngopalak-redhat: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a74baa70-1698-11f1-8d41-df7d51334589-0 |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips 10 |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1614df30-1699-11f1-9639-0fa330f5b6dd-0 |
|
/test e2e-aws-mco-disruptive |
|
/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of2 |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d3bf6d80-16ac-11f1-86d3-185bd84fe8af-0 |
|
/payload-aggregate periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-upgrade-fips 1 |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4b9bfa70-16ae-11f1-9ae2-ecfe0ac2f325-0 |
|
/retest a follow-up: we should document the history of this, and in the future investigate having this value have a relationship between max pods and memory reservation (or use some other variable). |
|
/payload-job periodic-ci-openshift-release-main-ci-4.22-e2e-aws-upgrade-ovn-single-node periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ovn-two-node-arbiter-upgrade |
|
@eggfoobar: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/13b709a0-1c9a-11f1-80c1-e025692eaaad-0 |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-single-node-workers |
|
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3a7a2220-1c9a-11f1-96e0-925662212127-0 |
|
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: haircommander, sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/verified by CI |
|
@sdodson: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cherry-pick release-4.21 |
|
@sdodson: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@sdodson: Jira Issue Verification Checks: Jira Issue OCPBUGS-75869 Jira Issue OCPBUGS-75869 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sdodson: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@sdodson: new pull request created: #5756 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Fix included in accepted release 4.22.0-0.nightly-2026-03-13-065313 |
Out of the box a standard OpenShift worker has about 3000 Mi of unevictable workload. Thus when we reserve 2GiB on an 8GiB instance that node will not autoscale down because it never drops below the 50% usage threshold.
Therefore, lets reduce the system reserved on the lowest end. The assumption here is that nodes this small are less likely to run the full 250 pods and actually consume the full set of resources. We should make sure that this aligns with our understanding of the problem we're trying to solve by enabling dynamic resource reservation in the first place, which I believe is the fact that massive nodes were only getting 1GiB of reserved memory despite running hundreds of pods.
Here's the difference in memory reservation at common sizes :
Fixes OCPBUGS-75869
Please provide the following information:
- What I did
Amended the dynamic system reservation scripts to only reserve 1GiB of the first 8GiB of memory. All other memory reservation logic is left in place. See the table above
- How to verify it
Launch a cluster with an 8GiB node, review allocatable and it should be 7GiB rather than 6GiB.
- Description for the changelog
Reduced dynamic memory reservation, on for workers by default in clusters installed on 4.21 or newer, for the first 8GiB of memory to a static 1GiB which mirrors the old non dynamic reservation. This slightly reduces all reservations by less than 2GiB.