Descheduler triggers infinite eviction/reschedule loop with Cluster Autoscaler and spot nodes

**What version of descheduler are you using?**

descheduler version: chart version is 0.31.0


**Does this issue reproduce with the latest release?**

Yes


**Which descheduler CLI options are you using?**

`--policy-config-file=/policy-dir/policy.yaml --descheduling-interval=60m --v=3`


**Please provide a copy of your descheduler policy config file**

```
  policy.yaml: |
    apiVersion: "descheduler/v1alpha2"
    kind: "DeschedulerPolicy"
    profiles:
    - name: default
      pluginConfig:
      - args:
          evictLocalStoragePods: true
          ignorePvcPods: true
        name: DefaultEvictor
      - args:
          nodeAffinityType:
          - preferredDuringSchedulingIgnoredDuringExecution
        name: RemovePodsViolatingNodeAffinity
      plugins:
        deschedule:
          enabled:
          - RemovePodsViolatingNodeAffinity
```


**What k8s version are you using (`kubectl version`)?**


<details><summary><code>kubectl version</code> Output</summary><br><pre>
$ kubectl version
Server Version: v1.31.8-gke.1045000

</pre></details>


**What did you do?**

We are using the RemovePodsViolatingNodeAffinity strategy to evict pods from on-demand nodes and schedule them on spot ones

Scenario:

1. Some pods were initially scheduled on on-demand nodes because spot nodes were temporarily unavailable.

2. The descheduler evicted those pods due to node affinity preferences.

3. New pods were created and marked as pending.

4. Cluster Autoscaler saw that pending pods require spot nodes and began scaling up the corresponding spot node pools.

5. However, the Kubernetes scheduler, not seeing the (yet-to-be-created) spot nodes, scheduled the pending pods back on available on-demand nodes.

6. As a result, pods end up again on the regular nodes, and the spot nodes that eventually come up remain unused.

7. Then Cluster Autoscaler detects the spot nodes are unused and scales them down.

8. This cycle repeats endlessly, causing unnecessary evictions and scaling churn.

**What did you expect to see?**
I expected the descheduler to somehow avoid creating such a tight eviction loop, potentially by:

- **Pre-scaling of target node pools**: Descheduler could have an option to trigger pre-scaling of node pools that match the preferred node affinity (e.g., spot pools), similar to how it’s done in [estafette-gke-node-pool-shifter](https://github.com/estafette/estafette-gke-node-pool-shifter). This would ensure that suitable nodes are already provisioning before evictions occur.

Or perhaps smarter coordination between descheduler and cluster autoscaler behavior.

**Environment Details:**
Cloud Provider: GCP
Nodes: mix of on-demand and spot
Cluster Autoscaler: enabled

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Descheduler triggers infinite eviction/reschedule loop with Cluster Autoscaler and spot nodes #1713

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Descheduler triggers infinite eviction/reschedule loop with Cluster Autoscaler and spot nodes #1713

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions