-
Notifications
You must be signed in to change notification settings - Fork 759
Description
What version of descheduler are you using?
descheduler version:
helm chart 0.33.0
Does this issue reproduce with the latest release?
yes
Which descheduler CLI options are you using?
cmdOptions:
logging-format: json
v: 5Please provide a copy of your descheduler policy config file
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
gracePeriodSeconds: 600
metricsProviders:
- source: KubernetesMetrics
profiles:
- name: default
pluginConfig:
- args:
minReplicas: 2
nodeFit: false
name: DefaultEvictor
- name: RemoveFailedPods
- args:
maxPodLifeTimeSeconds: 43200
states:
- Unknown
name: PodLifeTime
- args:
includingInitContainers: true
podRestartThreshold: 20
name: RemovePodsHavingTooManyRestarts
- name: RemovePodsViolatingNodeTaints
- args:
nodeAffinityType:
- requiredDuringSchedulingIgnoredDuringExecution
- preferredDuringSchedulingIgnoredDuringExecution
name: RemovePodsViolatingNodeAffinity
- name: RemovePodsViolatingInterPodAntiAffinity
- args:
constraints:
- DoNotSchedule
- ScheduleAnyway
topologyBalanceNodeFit: true
name: RemovePodsViolatingTopologySpreadConstraint
- args:
evictableNamespaces:
exclude:
- kube-public
- kube-system
- kube-node-lease
- gke-managed-dpv2-observability
- gke-managed-system
- gke-managed-volumepopulator
metricsUtilization:
source: KubernetesMetrics
targetThresholds:
cpu: 65
memory: 65
thresholds:
cpu: 55
memory: 55
name: LowNodeUtilization
plugins:
balance:
enabled:
- RemovePodsViolatingTopologySpreadConstraint
- LowNodeUtilization
deschedule:
enabled:
- RemoveFailedPods
- RemovePodsHavingTooManyRestarts
- RemovePodsViolatingNodeTaints
- RemovePodsViolatingNodeAffinity
- RemovePodsViolatingInterPodAntiAffinity
- PodLifeTimeWhat k8s version are you using (kubectl version)?
kubectl version Output
$ kubectl version Client Version: v1.33.3 Kustomize Version: v5.6.0 Server Version: v1.32.6-gke.1025000
What did you do?
- Enable a similar policy, without nodeFit and using LowNodeUtilization
- Have a cluster with 3 nodes:
a. NodeA is classified as overutilized and contains the majority of pods
b. NodeB and NodeC is classified as underutilized but no pods are scheduled there - I have noticed that GKE CA puts
DeletionCandidateOfClusterAutoscaler:1755242398=PreferNoScheduletaint on the two nodes prohibiting descheduler from removing pods from nodeA even though nodeFit is disabled
{"ts":1755596562126.3423,"caller":"utils/pod.go:236","msg":"Pod doesn't tolerate node taint","v":5,"pod":{"name":"foo-54657668f-kkpxq","namespace":"redacted"},"nodeName":"foobar-v2-1d75c737-zg71","taint":"DeletionCandidateOfClusterAutoscaler:1755242398=PreferNoSchedule"}What did you expect to see?
If I understand correctly, the nodeFit option should control if descheduler looks at the node taints and omits pods if they cannot be scheduled anywhere. In our case this doesn't seem to be working, or there is a bug in my config that i do not see.
What did you see instead?
nodeFit: false would imply that we ignore the taints created by CA (why the nodes were tainted in a different issue: kubernetes/autoscaler#7964), but instead we see descheduler backing of pods on the tainted nodes.