-
Notifications
You must be signed in to change notification settings - Fork 104
Open
Labels
enhancementNew feature or requestNew feature or request
Description
What would you like to be added/changed?
This came up in #2373 (comment). Right now the operator has no way to track crash-looping processes, e.g. processes that report for a short amount of time to the cluster and then crash. Some of the root causes could be networking issues or resource constraints. Since the operator is not tracking the restart count and the crash-looping of the fdbserver is not reported to the Pod (because the fdbmonitor will run and just restart the fdbserver) it's currently hard for the operator to replace such flaky pods. Those pods could cause issues to the cluster and block certain operations that expect that the cluster is up for a specific time. In addition it can be hard for a human operator to track down those crash-looping processes.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request