-
Notifications
You must be signed in to change notification settings - Fork 301
doc: lb admin state design doc #9634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: documentation
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for kubernetes-sigs-cloud-provide-azure ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: nilo19 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
|
||
| ### `PreemptScheduled` Event | ||
|
|
||
| 1. The events informer surfaces a `PreemptScheduled` warning for node `N`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the event read from IMDS or API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubernetes api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about events of Kubernetes nodes? Is there an existing mechanism to pass preempt events from IMDS to events of nodes? https://learn.microsoft.com/en-us/azure/virtual-machines/windows/scheduled-events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we want to use imds instead of watching kubernetes events? cloud-node-manager is a light weight component for node init, designed to replace node controller in ccm, maybe we don't want to put many features into it.
| ### High-Level Flow | ||
|
|
||
| 1. Kubernetes adds the `node.kubernetes.io/out-of-service` taint to a node or emits a `PreemptScheduled` event indicating an imminent Spot eviction. | ||
| 2. For Spot events, cloud-provider-azure patches the node with `cloudprovider.azure.microsoft.com/draining=spot-eviction` (one-time) so the signal persists beyond controller restarts; manual application of this taint is treated the same way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cloudprovider.azure.microsoft.com/draining=spot-eviction"
is it a label or an annotation? why not asking the user to patch node with this directly instead of patching add taint or emit event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A label. The user can patch this manually. But if we detect a spot eviction events, we would patch the node proactively and trigger the logic. We can trigger the loggic directly without patching the node, but that is not robust as the controller manager could restart during the middle, making the signal lost.
|
|
||
| ### High-Level Flow | ||
|
|
||
| 1. Kubernetes adds the `node.kubernetes.io/out-of-service` taint to a node or emits a `PreemptScheduled` event indicating an imminent Spot eviction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like "node.kubernetes.io/out-of-service" is a well-known taint
https://kubernetes.io/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service
However, it's not added by k8s native controllers, but manually by user. What's the suggested effect do you propose here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you expect the user/caller to drain the pods on the node right after tainting the node? Since it requires to call SLB control plane to set the AdminState=Down, it might fail or take long time. I think it's better to have the controller to signal back on the node indicating that the AdminState=Down is set for the node, so that the caller can continue to proceed to drain the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user can manually taint the node. Besides, we taint the node when it's shutdown, and removes the taint when the node goes back.
I am a bit worried when the node quickly jumps between shutdown/ready state. In this case we may generate a lot of adminstate down requests.
Let me re-consider the logic here. How do you think we can prevent this case? @robbiezhang
Also, for the callback signal you mentioned, we can patch the node again after setting adminstate down, and clear this patch after setting back to None. Let me see how to refine the document.
|
|
||
| - Introduce an `AdminStateManager` interface responsible for: | ||
| - Translating a Kubernetes node into all relevant backend pool IDs (leveraging existing helpers such as `reconcileBackendPoolHosts`, `getBackendPoolIDsForService`, and the multiple-SLB bookkeeping). | ||
| - Calling the Azure SDK (`LoadBalancerBackendAddressPoolsClient`) to update the specific backend address with `AdminState`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the throttling limit for this API? does it allow concurrent updates on different backend addresses? think of the scenario that the upgrader taint multiple nodes at the same time, what's the latency do we expect to set the AdminState for all nodes? Is it sequential, or in parallel, or in batch?
| 1. Fetch the node object from the informer cache and compute desired admin state (`Down` when tainted, `None` when cleared). | ||
| 2. Resolve backend entries using existing pool-mapping helpers. | ||
| 3. Issue Azure SDK calls immediately (within the same event) so the time between taint observation and traffic cutover is bounded only by ARM latency. | ||
| 4. Record success/failure metrics and retry via the queue if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the controller should emit event on failure so that we/cx can observe the events.
the controller should also add latency as metrics so that we can evaluate the effectiveness. This feature is useless if the latency is longer than 10s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deninately
| - Restore admin state to `None` when a node returns to service so steady-state behavior is unaffected. | ||
| - Avoid additional user-facing APIs, annotations, or CLI switches. | ||
|
|
||
| ## Non-Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a non-goal is optimizing LB backend pool membership when an underlying node VM goes offline (hard down)? I assume the existing Azure LB backend pool polling (LB regularly polls backend pool IP addresses) is sufficient to automatically evict downed VMs from the backend pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean lb health probe by "regularly polls"?
What type of PR is this?
/kind design
What this PR does / why we need it:
Add design doc for lb admin state control.
Which issue(s) this PR fixes:
Fixes #
Related: #9633
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: