-
Notifications
You must be signed in to change notification settings - Fork 758
Description
What version of descheduler are you using?
descheduler version: chart version is 0.31.0
Does this issue reproduce with the latest release?
Yes
Which descheduler CLI options are you using?
--policy-config-file=/policy-dir/policy.yaml --descheduling-interval=60m --v=3
Please provide a copy of your descheduler policy config file
policy.yaml: |
apiVersion: "descheduler/v1alpha2"
kind: "DeschedulerPolicy"
profiles:
- name: default
pluginConfig:
- args:
evictLocalStoragePods: true
ignorePvcPods: true
name: DefaultEvictor
- args:
nodeAffinityType:
- preferredDuringSchedulingIgnoredDuringExecution
name: RemovePodsViolatingNodeAffinity
plugins:
deschedule:
enabled:
- RemovePodsViolatingNodeAffinity
What k8s version are you using (kubectl version)?
kubectl version Output
$ kubectl version Server Version: v1.31.8-gke.1045000
What did you do?
We are using the RemovePodsViolatingNodeAffinity strategy to evict pods from on-demand nodes and schedule them on spot ones
Scenario:
-
Some pods were initially scheduled on on-demand nodes because spot nodes were temporarily unavailable.
-
The descheduler evicted those pods due to node affinity preferences.
-
New pods were created and marked as pending.
-
Cluster Autoscaler saw that pending pods require spot nodes and began scaling up the corresponding spot node pools.
-
However, the Kubernetes scheduler, not seeing the (yet-to-be-created) spot nodes, scheduled the pending pods back on available on-demand nodes.
-
As a result, pods end up again on the regular nodes, and the spot nodes that eventually come up remain unused.
-
Then Cluster Autoscaler detects the spot nodes are unused and scales them down.
-
This cycle repeats endlessly, causing unnecessary evictions and scaling churn.
What did you expect to see?
I expected the descheduler to somehow avoid creating such a tight eviction loop, potentially by:
- Pre-scaling of target node pools: Descheduler could have an option to trigger pre-scaling of node pools that match the preferred node affinity (e.g., spot pools), similar to how it’s done in estafette-gke-node-pool-shifter. This would ensure that suitable nodes are already provisioning before evictions occur.
Or perhaps smarter coordination between descheduler and cluster autoscaler behavior.
Environment Details:
Cloud Provider: GCP
Nodes: mix of on-demand and spot
Cluster Autoscaler: enabled