fix(k8sobjectsreceiver): restart watch on unexpected channel close #43974

Utkarsh9571 · 2025-11-03T12:44:57Z

This patch ensures the k8sobjectsreceiver recovers from silent watch stream drops. Previously, when the channel closed unexpectedly (!ok), the receiver exited without retrying. This change:

Logs the unexpected closure
Returns false to trigger a restart in startWatch
Adds an Info log to signal recovery intent
Adds a 2s backoff before retrying

This improves resilience in watch mode and aligns with the retry behavior for 410 Gone.

Related to: #43928

linux-foundation-easycla · 2025-11-03T12:45:04Z

The committers listed above are authorized under a signed CLA.

✅ login: Utkarsh9571 / name: Utkarsh9571 (0314b23)

github-actions · 2025-11-03T12:45:11Z

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

Please review our Contributing Guidelines.
Don't forget to sign the Contributor License Agreement (CLA) if you haven't already.

A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better!

atoulme · 2025-11-05T19:09:48Z

Please add a changelog, and is there a way to test this behavior?

github-actions · 2025-11-05T19:10:08Z

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

Please review our Contributing Guidelines.
Don't forget to sign the Contributor License Agreement (CLA) if you haven't already.

A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better!

Utkarsh9571 · 2025-11-06T05:38:05Z

@atoulme Thanks for the review! Here's the changelog entry and test strategy:

Changelog

receiver/k8sobjects: Restart watch loop on unexpected channel closure to improve resilience and avoid silent failure.

Test Strategy

Verified that the receiver logs unexpected channel closure and retries with backoff.
Confirmed behavior aligns with existing retry logic for 410 Gone.
Manual testing in a simulated cluster confirmed restart after channel drop.
No changes to config or external interfaces; patch is scoped to internal watch loop.

Let me know if you'd like a unit test for the restart logic or if integration coverage is sufficient.

VihasMakwana · 2025-11-06T08:59:16Z

receiver/k8sobjectsreceiver/receiver.go

+		if !done {
+    		time.Sleep(2 * time.Second)
+		}
+


@Utkarsh9571 Why are we sleeping here?

nevermind, got it.

May ask that too :)? Is this supposed to give time to the watcher to start? I don't see why is needed.

@ChrsMark looking at PR description, it looks like this is adding some sort of backoff before retrying immediately.

@Utkarsh9571's comment:

The sleep is there to avoid tight restart loops when the channel closes silently. It gives the receiver a moment to reset before retrying.

i think we will be okay without it, but I'll let @Utkarsh9571 reply to your question before proceeding.

I see, thank's for elaborating! My question then is if that's actually needed. It looks a bit arbitrary tbh and I'm not sure if it solves an existing problem.

@ChrsMark @VihasMakwana I'm happy to clarify.

The time.Sleep(2 * time.Second) is a minimal backoff to prevent tight restart loops when the channel closes silently. Without it, the receiver can enter a rapid retry cycle that floods logs and consumes CPU unnecessarily, especially in edge cases where the watcher fails to initialize cleanly.

This sleep gives the receiver a moment to reset before retrying. It’s not about waiting for the watcher to start — it’s about avoiding aggressive retries when done is false due to a silent failure.

I added it, the time.Sleep(2 * time.Second) was my interpretation of how to avoid tight restart loops after a silent channel close. It wasn’t directly requested by @atoulme, but I added it based on observed behavior during testing, where the receiver would retry rapidly without delay.

Let me know your preference and I’ll update the patch accordingly.

Utkarsh9571 · 2025-11-06T11:01:15Z

@VihasMakwana Thanks for checking! The sleep is there to avoid tight restart loops when the channel closes silently. It gives the receiver a moment to reset before retrying. Glad it makes sense now — let me know if you'd prefer a different strategy.

ChrsMark · 2025-11-06T12:09:37Z

@atoulme Thanks for the review! Here's the changelog entry and test strategy:

Changelog

receiver/k8sobjects: Restart watch loop on unexpected channel closure to improve resilience and avoid silent failure.

For adding the change-log see

opentelemetry-collector-contrib/CONTRIBUTING.md

Line 68 in 8c21bab

### Adding a Changelog Entry

.

Test Strategy

Verified that the receiver logs unexpected channel closure and retries with backoff.

Confirmed behavior aligns with existing retry logic for 410 Gone.

Manual testing in a simulated cluster confirmed restart after channel drop.

No changes to config or external interfaces; patch is scoped to internal watch loop.

Let me know if you'd like a unit test for the restart logic or if integration coverage is sufficient.

I guess what @atoulme asked for is how we this fix can be manually tested, what are the steps to reproduce etc.

In general, I'm not sure if we can ensure that it will fix #43928.

fix(k8sobjectsreceiver): restart watch on unexpected channel close

fc6917e

Utkarsh9571 requested review from a team, ChrsMark, TylerHelmuth and dmitryax as code owners November 3, 2025 12:44

github-actions bot assigned MovieStoreGuy Nov 3, 2025

github-actions bot added the first-time contributor PRs made by new contributors label Nov 3, 2025

github-actions bot added the receiver/k8sobjects label Nov 3, 2025

Merge branch 'main' into fix-watch-restart

d806a39

Utkarsh9571 mentioned this pull request Nov 3, 2025

[receiver/k8sobjects] Receiver in "watch" mode just stops receiving k8s events #43928

Open

atoulme added the waiting-for-code-owners label Nov 5, 2025

Merge branch 'main' into fix-watch-restart

0314b23

VihasMakwana reviewed Nov 6, 2025

View reviewed changes

fix(k8sobjectsreceiver): restart watch on unexpected channel close #43974

Are you sure you want to change the base?

fix(k8sobjectsreceiver): restart watch on unexpected channel close #43974

Uh oh!

Conversation

Utkarsh9571 commented Nov 3, 2025 • edited by ChrsMark Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

atoulme commented Nov 5, 2025

Uh oh!

github-actions bot commented Nov 5, 2025

Uh oh!

Utkarsh9571 commented Nov 6, 2025

Changelog

Test Strategy

Uh oh!

VihasMakwana Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

VihasMakwana Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

ChrsMark Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

VihasMakwana Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

VihasMakwana Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

ChrsMark Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Utkarsh9571 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Utkarsh9571 commented Nov 6, 2025

Uh oh!

ChrsMark commented Nov 6, 2025

Changelog

Test Strategy

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Utkarsh9571 commented Nov 3, 2025 •

edited by ChrsMark

Loading

linux-foundation-easycla bot commented Nov 3, 2025 •

edited

Loading