HasHighlyAvailableControlPlane incorrectly uses filtered instance groups

# Issue: HasHighlyAvailableControlPlane incorrectly uses filtered instance groups

/kind bug

## 1. What `kops` version are you running?

```
kops version 1.33.0
```

## 2. What Kubernetes version are you running?

```
Kubernetes v1.33.0
```

## 3. What cloud provider are you using?

AWS

## 4. What commands did you run? What is the simplest way to reproduce this issue?

On a cluster with multiple control plane nodes (HA setup), run:

```bash
kops update cluster <cluster-name> --instance-group nodes --yes
```

Where `nodes` is a worker instance group (not a control plane instance group).

## 5. What happened after the commands executed?

When updating a specific non-control-plane instance group using `--instance-group` or `--instance-group-roles` filters, the cluster-wide addons like:
- `aws-load-balancer-controller`
- `node-termination-handler`
- `cluster-autoscaler`
- `aws-ebs-csi-driver`

incorrectly have their replica count reduced from 2 to 1, even though the cluster still has multiple control plane nodes.

## 6. What did you expect to happen?

The replica count for cluster-wide controllers should remain at 2 (or the correct value based on the actual number of control plane nodes in the cluster), regardless of which instance group is being updated via filters.

## 7. Root Cause Analysis

The `HasHighlyAvailableControlPlane()` function in `upup/pkg/fi/cloudup/template_functions.go` uses `tf.InstanceGroups`, which is a **filtered** list based on `--instance-group` or `--instance-group-roles` flags.

When updating only worker nodes, `tf.InstanceGroups` contains no control plane nodes, causing the function to incorrectly return `false` even though the cluster has multiple control plane nodes.

This function is used by `ControlPlaneControllerReplicas()` which determines the replica count for various controllers that should run at cluster level, not at the filtered instance group level.

**Code Location:**
- Bug: `upup/pkg/fi/cloudup/template_functions.go:503`
- Context: `pkg/model/context.go:52-59` (definition of `InstanceGroups` vs `AllInstanceGroups`)

## 8. Proposed Solution

Change `HasHighlyAvailableControlPlane()` to use `tf.AllInstanceGroups` instead of `tf.InstanceGroups`.

This is appropriate because:
1. HA status is a **cluster-wide property**, not specific to filtered instance groups
2. Other cluster-wide operations already use `AllInstanceGroups` (e.g., IAM configuration on line 720)
3. The comment in `context.go` explicitly states: "we sometimes need the full list for example when configuring cluster-wide IAM"

The fix includes:
- Code change to use `AllInstanceGroups`
- Comprehensive test coverage including a regression test for this specific scenario

## 9. Impact

This bug can cause:
- Reduced availability of critical controllers during instance group updates
- Unexpected downscaling of cluster-wide services
- Potential service disruptions if the single remaining replica experiences issues

The issue only manifests when using `--instance-group` or `--instance-group-roles` filters with `kops update cluster`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HasHighlyAvailableControlPlane incorrectly uses filtered instance groups #17739

Issue: HasHighlyAvailableControlPlane incorrectly uses filtered instance groups

1. What `kops` version are you running?

2. What Kubernetes version are you running?

3. What cloud provider are you using?

4. What commands did you run? What is the simplest way to reproduce this issue?

5. What happened after the commands executed?

6. What did you expect to happen?

7. Root Cause Analysis

8. Proposed Solution

9. Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HasHighlyAvailableControlPlane incorrectly uses filtered instance groups #17739

Description

Issue: HasHighlyAvailableControlPlane incorrectly uses filtered instance groups

1. What kops version are you running?

2. What Kubernetes version are you running?

3. What cloud provider are you using?

4. What commands did you run? What is the simplest way to reproduce this issue?

5. What happened after the commands executed?

6. What did you expect to happen?

7. Root Cause Analysis

8. Proposed Solution

9. Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. What `kops` version are you running?