Skip to content

Commit 6a04a91

Browse files
Merge pull request #671 from tiraboschi/memory_metrics
Introduce a combined metric for CPU and memory
2 parents 3689dc5 + ccf0c16 commit 6a04a91

13 files changed

+102
-22
lines changed

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ The profile exposes the following customization:
210210
- `devActualUtilizationProfile`: Enable load-aware descheduling.
211211
- `devDeviationThresholds`: Have the thresholds be based on the average utilization.
212212

213-
By default, this profile will enable load-aware descheduling based on the `PrometheusCPUCombined` Prometheus query.
213+
By default, this profile will enable load-aware descheduling based on the `PrometheusCPUMemoryCombinedProfile` Prometheus query. That query is based on a recording rule combining the impact of CPU and memory utilization and PSI pressure.
214214
By default, the thresholds will be dynamic (based on the distance from the average utilization) and asymmetric (all the nodes below the average will be considered as underutilized to help rebalancing overutilized outliers) tolerating low deviations (10%).
215215

216216
By default, this profile configures the descheduler to restrict the maximum number of overall parallel evictions to 5 and
@@ -255,6 +255,7 @@ The operator provides the following profiles:
255255
- `PrometheusMemoryPSIPressure`: `rate(node_pressure_memory_waiting_seconds_total[1m])` (`node_pressure_memory_waiting_seconds_total` is reported in OpenShift only for nodes configured with psi=1 kernel argument)
256256
- `PrometheusIOPSIPressure`: `rate(node_pressure_io_waiting_seconds_total[1m])` (`node_pressure_memory_waiting_seconds_total` is reported in OpenShift only for nodes configured with psi=1 kernel argument)
257257
- `PrometheusCPUCombined`: `descheduler:combined_utilization_and_pressure:avg1m` (`descheduler:combined_utilization_and_pressure:avg1m` uses a combination of CPU utilization and CPU PSI pressure based on a recording rule; CPU PSI pressure is reported in OpenShift only for nodes configured with psi=1 kernel argument)
258+
- `PrometheusCPUMemoryCombinedProfile`: `descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m` (`descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m` uses a multidimensional combination of CPU (utilization and pressure) and memory (utilization and pressure) based on a recording rule; PSI pressure is reported in OpenShift only for nodes configured with psi=1 kernel argument)
258259

259260
```yaml
260261
apiVersion: operator.openshift.io/v1
@@ -266,9 +267,9 @@ spec:
266267
managementState: Managed
267268
deschedulingIntervalSeconds: 3600
268269
profiles:
269-
- LongLifecycle
270+
- KubeVirtRelieveAndMigrate
270271
profileCustomizations:
271-
devActualUtilizationProfile: PrometheusCPUUsage
272+
devActualUtilizationProfile: PrometheusCPUMemoryCombinedProfile
272273
```
273274
274275
## Descheduling modes

bindata/assets/kube-descheduler/prometheusrule.yaml

Lines changed: 84 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,33 +7,108 @@ metadata:
77
spec:
88
groups:
99
- name: recordingRules.rules
10+
interval: 30s
1011
rules:
12+
# Base metrics (CPU and Memory utilization)
1113
- record: descheduler:nodeutilization:cpu:avg1m
1214
expr: avg by (instance) (1 - rate(node_cpu_seconds_total{mode='idle'}[1m]))
1315

1416
- record: descheduler:averageworkersutilization:cpu:avg1m
1517
expr: avg(descheduler:nodeutilization:cpu:avg1m * on(instance) group_left(node) label_replace(kube_node_role{role="worker"}, 'instance', "$1", 'node', '(.+)'))
1618

19+
- record: descheduler:nodeutilization:memory:avg1m
20+
expr: |-
21+
(
22+
1 - avg_over_time(node_memory_MemAvailable_bytes[1m]) /
23+
on(instance) label_replace(kube_node_status_allocatable{resource="memory"}, 'instance', "$1", 'node', '(.+)')
24+
) and on(instance)
25+
label_replace(kube_node_status_allocatable{resource="memory"}, 'instance', "$1", 'node', '(.+)') > 0
26+
27+
- record: descheduler:averageworkersutilization:memory:avg1m
28+
expr: avg(descheduler:nodeutilization:memory:avg1m * on(instance) group_left(node) label_replace(kube_node_role{role="worker"}, 'instance', "$1", 'node', '(.+)'))
29+
30+
# Pressure metrics
1731
- record: descheduler:nodepressure:cpu:avg1m
1832
# return the cpu pressure if the cpu usage is over 70% otherwise
1933
# return cpu pressure as zero to (partially) filter out false
2034
# positives pressure spikes due to CPU limited pods.
2135
# See: https://github.com/kubernetes/enhancements/issues/5062
2236
expr: |-
23-
avg by (instance) (
24-
rate(node_pressure_cpu_waiting_seconds_total[1m])
25-
) and (
26-
1 - avg by (instance) (
27-
rate(node_cpu_seconds_total{mode='idle'}[1m])
28-
)
29-
) > 0.7
37+
(
38+
avg by (instance) (rate(node_pressure_cpu_waiting_seconds_total[1m]))
39+
and
40+
(1 - avg by (instance) (rate(node_cpu_seconds_total{mode='idle'}[1m]))) > 0.7
41+
)
3042
or
43+
(avg by (instance) (rate(node_pressure_cpu_waiting_seconds_total[1m])) * 0)
44+
45+
- record: descheduler:averageworkerspressure:cpu:avg1m
46+
expr: avg(descheduler:nodepressure:cpu:avg1m * on(instance) group_left(node) label_replace(kube_node_role{role="worker"}, 'instance', "$1", 'node', '(.+)'))
47+
48+
- record: descheduler:nodepressure:memory:avg1m
49+
expr: |-
3150
avg by (instance) (
32-
rate(node_pressure_cpu_waiting_seconds_total[1m])
33-
) * 0
51+
rate(node_pressure_memory_waiting_seconds_total[1m])
52+
)
53+
54+
- record: descheduler:averageworkerspressure:memory:avg1m
55+
expr: avg(descheduler:nodepressure:memory:avg1m * on(instance) group_left(node) label_replace(kube_node_role{role="worker"}, 'instance', "$1", 'node', '(.+)'))
3456

3557
- record: descheduler:combined_utilization_and_pressure:avg1m
3658
expr: |-
3759
(descheduler:nodeutilization:cpu:avg1m and on() descheduler:averageworkersutilization:cpu:avg1m < 0.8)
3860
or
3961
(descheduler:nodepressure:cpu:avg1m)
62+
63+
- record: descheduler:averageworkersutilization:memory:avg1m
64+
expr: avg(descheduler:nodeutilization:memory:avg1m * on(instance) group_left(node) label_replace(kube_node_role{role="worker"}, 'instance', "$1", 'node', '(.+)'))
65+
66+
- record: descheduler:nodeutilization:memory:avg1m:positivedeviation
67+
expr: |-
68+
descheduler:nodeutilization:memory:avg1m - on() group_left() descheduler:averageworkersutilization:memory:avg1m
69+
and
70+
descheduler:nodeutilization:memory:avg1m - on() group_left() descheduler:averageworkersutilization:memory:avg1m >= 0
71+
or
72+
descheduler:nodeutilization:memory:avg1m * 0
73+
74+
- record: descheduler:nodeutilization:cpu:avg1m:positivedeviation
75+
expr: |-
76+
descheduler:nodeutilization:cpu:avg1m - on() group_left() descheduler:averageworkersutilization:cpu:avg1m
77+
and
78+
descheduler:nodeutilization:cpu:avg1m - on() group_left() descheduler:averageworkersutilization:cpu:avg1m >= 0
79+
or
80+
descheduler:nodeutilization:cpu:avg1m * 0
81+
82+
- record: descheduler:nodepressure:cpu:avg1m:positivedeviation
83+
expr: |-
84+
descheduler:nodepressure:cpu:avg1m - on() group_left() descheduler:averageworkerspressure:cpu:avg1m
85+
and
86+
descheduler:nodepressure:cpu:avg1m - on() group_left() descheduler:averageworkerspressure:cpu:avg1m >= 0
87+
or
88+
descheduler:nodepressure:cpu:avg1m * 0
89+
90+
- record: descheduler:nodepressure:memory:avg1m:positivedeviation
91+
expr: |-
92+
descheduler:nodepressure:memory:avg1m - on() group_left() descheduler:averageworkerspressure:memory:avg1m
93+
and
94+
descheduler:nodepressure:memory:avg1m - on() group_left() descheduler:averageworkerspressure:memory:avg1m >= 0
95+
or
96+
descheduler:nodepressure:memory:avg1m * 0
97+
98+
# Ideal Point Positive Distance (Euclidean distance from ideal using positive deviations)
99+
- record: descheduler:node:ideal_point_positive_distance:avg1m
100+
expr: |-
101+
sqrt(
102+
descheduler:nodeutilization:cpu:avg1m:positivedeviation ^ 2 +
103+
descheduler:nodepressure:cpu:avg1m:positivedeviation ^ 2 +
104+
descheduler:nodeutilization:memory:avg1m:positivedeviation ^ 2 +
105+
descheduler:nodepressure:memory:avg1m:positivedeviation ^ 2
106+
)
107+
108+
# Linear Amplified Ideal Point Positive Distance (k=3.0) - Amplified by 3x, clamped to [0,1]
109+
- record: descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m
110+
expr: |-
111+
clamp_max(
112+
3 * descheduler:node:ideal_point_positive_distance:avg1m,
113+
1.0
114+
)

pkg/apis/descheduler/v1/types_descheduler.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,8 @@ const (
186186
PrometheusIOPSIPressureProfile ActualUtilizationProfile = "PrometheusIOPSIPressure"
187187
// PrometheusCPUCombinedProfile uses a combination of CPU utilization and CPU pressure based on a recording rule
188188
PrometheusCPUCombinedProfile ActualUtilizationProfile = "PrometheusCPUCombined"
189+
// PrometheusCPUMemoryCombinedProfile uses a multidimensional combination of CPU (utilization and pressure) and memory (utilization and pressure) based on a recording rule
190+
PrometheusCPUMemoryCombinedProfile ActualUtilizationProfile = "PrometheusCPUMemoryCombinedProfile"
189191
)
190192

191193
// Namespaces overrides included and excluded namespaces while keeping

pkg/operator/target_config_reconciler.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -837,6 +837,8 @@ func utilizationProfileToPrometheusQuery(profile deschedulerv1.ActualUtilization
837837
return "rate(node_pressure_io_waiting_seconds_total[1m])", nil
838838
case deschedulerv1.PrometheusCPUCombinedProfile:
839839
return "descheduler:combined_utilization_and_pressure:avg1m", nil
840+
case deschedulerv1.PrometheusCPUMemoryCombinedProfile:
841+
return "descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m", nil
840842
default:
841843
if !strings.HasPrefix(string(profile), "query:") {
842844
return "", fmt.Errorf("unknown prometheus profile: %v", profile)
@@ -1092,7 +1094,7 @@ func kubeVirtRelieveAndMigrateProfile(profileCustomizations *deschedulerv1.Profi
10921094
args := profile.PluginConfigs[0].Args.Object.(*nodeutilization.LowNodeUtilizationArgs)
10931095

10941096
// profile defaults
1095-
const defaultActualUtilizationProfile = deschedulerv1.PrometheusCPUCombinedProfile
1097+
const defaultActualUtilizationProfile = deschedulerv1.PrometheusCPUMemoryCombinedProfile
10961098
args.UseDeviationThresholds = true
10971099
query, err := utilizationProfileToPrometheusQuery(defaultActualUtilizationProfile)
10981100
if err != nil {

pkg/operator/testdata/assets/relieveAndMigrateDefaults.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ profiles:
2020
- openshift-kube-scheduler
2121
metricsUtilization:
2222
prometheus:
23-
query: descheduler:combined_utilization_and_pressure:avg1m
23+
query: descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m
2424
source: Prometheus
2525
targetThresholds:
2626
MetricResource: 10

pkg/operator/testdata/assets/relieveAndMigrateDynamicThresholdsHigh.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ profiles:
2020
- openshift-kube-scheduler
2121
metricsUtilization:
2222
prometheus:
23-
query: descheduler:combined_utilization_and_pressure:avg1m
23+
query: descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m
2424
source: Prometheus
2525
targetThresholds:
2626
MetricResource: 30

pkg/operator/testdata/assets/relieveAndMigrateDynamicThresholdsLow.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ profiles:
2020
- openshift-kube-scheduler
2121
metricsUtilization:
2222
prometheus:
23-
query: descheduler:combined_utilization_and_pressure:avg1m
23+
query: descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m
2424
source: Prometheus
2525
targetThresholds:
2626
MetricResource: 10

pkg/operator/testdata/assets/relieveAndMigrateDynamicThresholdsMedium.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ profiles:
2020
- openshift-kube-scheduler
2121
metricsUtilization:
2222
prometheus:
23-
query: descheduler:combined_utilization_and_pressure:avg1m
23+
query: descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m
2424
source: Prometheus
2525
targetThresholds:
2626
MetricResource: 20

pkg/operator/testdata/assets/relieveAndMigrateEvictionLimits.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ profiles:
2020
- openshift-kube-scheduler
2121
metricsUtilization:
2222
prometheus:
23-
query: descheduler:combined_utilization_and_pressure:avg1m
23+
query: descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m
2424
source: Prometheus
2525
targetThresholds:
2626
MetricResource: 10

pkg/operator/testdata/assets/relieveAndMigrateHighConfig.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ profiles:
2020
- openshift-kube-scheduler
2121
metricsUtilization:
2222
prometheus:
23-
query: descheduler:combined_utilization_and_pressure:avg1m
23+
query: descheduler:node:linear_amplified_ideal_point_positive_distance:k3:avg1m
2424
source: Prometheus
2525
targetThresholds:
2626
MetricResource: 70

0 commit comments

Comments
 (0)