Skip to content

Descheduler triggers infinite eviction/reschedule loop with Cluster Autoscaler and spot nodes #1713

@AlexGurtoff

Description

@AlexGurtoff

What version of descheduler are you using?

descheduler version: chart version is 0.31.0

Does this issue reproduce with the latest release?

Yes

Which descheduler CLI options are you using?

--policy-config-file=/policy-dir/policy.yaml --descheduling-interval=60m --v=3

Please provide a copy of your descheduler policy config file

  policy.yaml: |
    apiVersion: "descheduler/v1alpha2"
    kind: "DeschedulerPolicy"
    profiles:
    - name: default
      pluginConfig:
      - args:
          evictLocalStoragePods: true
          ignorePvcPods: true
        name: DefaultEvictor
      - args:
          nodeAffinityType:
          - preferredDuringSchedulingIgnoredDuringExecution
        name: RemovePodsViolatingNodeAffinity
      plugins:
        deschedule:
          enabled:
          - RemovePodsViolatingNodeAffinity

What k8s version are you using (kubectl version)?

kubectl version Output
$ kubectl version
Server Version: v1.31.8-gke.1045000

What did you do?

We are using the RemovePodsViolatingNodeAffinity strategy to evict pods from on-demand nodes and schedule them on spot ones

Scenario:

  1. Some pods were initially scheduled on on-demand nodes because spot nodes were temporarily unavailable.

  2. The descheduler evicted those pods due to node affinity preferences.

  3. New pods were created and marked as pending.

  4. Cluster Autoscaler saw that pending pods require spot nodes and began scaling up the corresponding spot node pools.

  5. However, the Kubernetes scheduler, not seeing the (yet-to-be-created) spot nodes, scheduled the pending pods back on available on-demand nodes.

  6. As a result, pods end up again on the regular nodes, and the spot nodes that eventually come up remain unused.

  7. Then Cluster Autoscaler detects the spot nodes are unused and scales them down.

  8. This cycle repeats endlessly, causing unnecessary evictions and scaling churn.

What did you expect to see?
I expected the descheduler to somehow avoid creating such a tight eviction loop, potentially by:

  • Pre-scaling of target node pools: Descheduler could have an option to trigger pre-scaling of node pools that match the preferred node affinity (e.g., spot pools), similar to how it’s done in estafette-gke-node-pool-shifter. This would ensure that suitable nodes are already provisioning before evictions occur.

Or perhaps smarter coordination between descheduler and cluster autoscaler behavior.

Environment Details:
Cloud Provider: GCP
Nodes: mix of on-demand and spot
Cluster Autoscaler: enabled

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions