fix MicroK8s node status reporting issue #5276

sonianuj287 · 2025-10-23T17:41:03Z

Related issue - #5275

Summary

The issue was caused by missing timeout configurations in MicroK8s that control how Kubernetes monitors and reports node health. This led to:

Nodes remaining "Ready" even when offline
Healthy nodes being incorrectly marked "NotReady"
Inconsistent cluster behavior

Root Cause Analysis
Missing critical parameters in the Kubernetes control plane components:

--node-monitor-grace-period in kube-controller-manager
--pod-eviction-timeout in kube-controller-manager
--node-status-update-frequency in kubelet

Changes

Updated kube-controller-manager configuration:
Updated kubelet configuration:

Expected Behavior After Fix
Healthy nodes remain "Ready" at all times
Failed nodes are marked "NotReady" within ~40-50 seconds
Only actually failed nodes are marked "NotReady"
Consistent behavior across different cluster configurations

Testing

Possible Regressions

Checklist

Read the contributions page.
Submitted the CLA form, if you are a first time contributor.
The introduced changes are covered by unit and/or integration tests.

Notes

sonianuj287 · 2025-10-23T19:01:46Z

Hi @lazzarello @akaihola @xnox @timgreen , please review this PR and suggest me changes. Thanks :)

xnox · 2025-10-23T19:35:23Z

This is nice! Lots of Pro customers are hitting this issue, and it seemed always wild, but this is likely the root cause for it.

Especially since microk8s without these settings behaves unlike other k8s deployments.

sonianuj287 · 2025-10-24T02:24:01Z

This is nice! Lots of Pro customers are hitting this issue, and it seemed always wild, but this is likely the root cause for it.

Especially since microk8s without these settings behaves unlike other k8s deployments.

Thanks @xnox , how can we run the workflow to veify the changes.., if possible I want it to be merge before end of October to be counted for Hacktoberfest

ktsakalozos · 2025-11-14T11:04:59Z

Hi @sonianuj287 @xnox appologies for the late response. We will try to reproduce this issue and verify the fix. In the meantime could you please sing the SLA and address the lint errors?

kcarson77 · 2025-11-25T13:18:23Z

Any updates to this - easy to reproduce. Install a 3 host cluster, take link down on host1, for example. Sometimes node becomes not ready, sometimes 2 nodes become not ready sometimes no nodes become notready (less common) Seeing this across different systems and having critical impact.

fix MicroK8s node status reporting issue

826cb05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix MicroK8s node status reporting issue #5276

fix MicroK8s node status reporting issue #5276

sonianuj287 commented Oct 23, 2025

Uh oh!

sonianuj287 commented Oct 23, 2025

Uh oh!

xnox commented Oct 23, 2025

Uh oh!

sonianuj287 commented Oct 24, 2025

Uh oh!

ktsakalozos commented Nov 14, 2025

Uh oh!

kcarson77 commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix MicroK8s node status reporting issue #5276

Are you sure you want to change the base?

fix MicroK8s node status reporting issue #5276

Conversation

sonianuj287 commented Oct 23, 2025

Summary

Changes

Testing

Possible Regressions

Checklist

Notes

Uh oh!

sonianuj287 commented Oct 23, 2025

Uh oh!

xnox commented Oct 23, 2025

Uh oh!

sonianuj287 commented Oct 24, 2025

Uh oh!

ktsakalozos commented Nov 14, 2025

Uh oh!

kcarson77 commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants