Skip to content

Allow suppressing/downgrading the "Uncaught error during event processing" log without disabling retry (for exceptions handled in updateErrorStatus) #3418

@benkeil

Description

@benkeil

Feature request

When a Reconciler throws an exception that is deliberately handled in updateErrorStatus and classified as expected & retryable (e.g. "waiting for a dependency that isn't present yet"), there is currently no way to keep JOSDK's native retry (exponential backoff via @GradualRetry) while suppressing or downgrading the EventProcessor "Uncaught error during event processing" log. Retry and that log are the same code path.

Why they can't be separated today

In ReconciliationDispatcher.handleErrorStatusHandler, the updateErrorStatus result only avoids re-throwing when isNoRetry() is set:

if (errorStatusUpdateControl.isNoRetry()) {
  ... // non-exception PostExecutionControl, optional reschedule
  return postExecutionControl;
}
throw e;

So any retry-enabled control rethrows e, which becomes exceptionDuringExecution, which triggers EventProcessor.handleRetryOnException — and that single method both schedules the retry and emits the WARN via retryAwareErrorLogging. The only built-in escape is withNoRetry() (also implied by ErrorStatusUpdateControl.rescheduleAfter()), which suppresses the log but disables native retry: you lose the @GradualRetry exponential backoff and the retry counter, and would have to reimplement backoff via flat rescheduleAfter.

With @GradualRetry(maxAttempts = -1) it's permanent: isLastAttempt() is never true, so every attempt hits the WARN branch — a continuous stream of WARN + stack trace for an entirely expected condition, and duplicate logging on top, since the exception is already logged at the level chosen inside updateErrorStatus.

Workarounds and why they're insufficient

  • withNoRetry() / rescheduleAfter(...): removes the WARN but disables native exponential-backoff retry.
  • Lowering the EventProcessor logger to ERROR: global to the class, can't distinguish handled/expected from genuinely unexpected errors, and relies on logger-name coupling rather than the SDK API.

Precedent

The same area already does selective, level-aware downgrading: the status-patch failure path in handleErrorStatusHandler downgrades 409/422 to DEBUG when the next reconciliation is imminent, and retryAwareErrorLogging special-cases a 409 KubernetesClientException to info. Treating some handled exceptions as not-worth-a-WARN is already an established pattern — this asks to make it user-controllable.

Proposed directions (open to design)

  1. Let ErrorStatusUpdateControl carry a logging hint while still retrying — e.g. withoutDefaultErrorLogging() or withErrorLogLevel(Level.INFO). Leaves isNoRetry() untouched (retry/backoff intact) and lets the place that already classifies the error decide how loud the framework should be.
  2. A configurable log level / logging callback for the "uncaught error" messages via ConfigurationService or controller configuration.
  3. Skip or downgrade the WARN when updateErrorStatus returned a non-default control (the user demonstrably handled it), opt-in to preserve current behavior.

Environment

  • java-operator-sdk: 6.4.3 (behavior present on main).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions