Caching service health endpoint

This issue is based on #4408.

In the caching service is difficult to verify the state. It is related mainly to Infinispan, but in general it could be an issue of any distributed cache implementation.

The issue is that Infinispan (JGroups) has a independent communication that is not monitored. The way how we check the state of Caching service is to check the state of service. But the status UP doesn't mean that service is working properly. To be accurate we should look in the log for `GMS` message that inform us that the JGroup ports are binded. It is not enough to detect if all nodes we joined. To do that a user required debug log of JGroup and needs to analyze communication if all nodes are communicating each other. It is very complicated even with knowledge how it works.

The aim of this issue is to simplify that.

The health endpoint should contains new checks:
- jgroup is listening
- list of connected nodes
- amount of nodes

It is questionable if the JGroup should influence the service status, because even some instances in HA are down the service should be available, but it probably makes sense to change the service state during the start-up. At least service should be down till JGroup is listening. Then I would suggest to set status to down till first time the cluster is connected (the requirement establish cluster at least one time).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Caching service health endpoint #4427

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Caching service health endpoint #4427

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions