-
Notifications
You must be signed in to change notification settings - Fork 69
Description
This issue is based on #4408.
In the caching service is difficult to verify the state. It is related mainly to Infinispan, but in general it could be an issue of any distributed cache implementation.
The issue is that Infinispan (JGroups) has a independent communication that is not monitored. The way how we check the state of Caching service is to check the state of service. But the status UP doesn't mean that service is working properly. To be accurate we should look in the log for GMS message that inform us that the JGroup ports are binded. It is not enough to detect if all nodes we joined. To do that a user required debug log of JGroup and needs to analyze communication if all nodes are communicating each other. It is very complicated even with knowledge how it works.
The aim of this issue is to simplify that.
The health endpoint should contains new checks:
- jgroup is listening
- list of connected nodes
- amount of nodes
It is questionable if the JGroup should influence the service status, because even some instances in HA are down the service should be available, but it probably makes sense to change the service state during the start-up. At least service should be down till JGroup is listening. Then I would suggest to set status to down till first time the cluster is connected (the requirement establish cluster at least one time).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status