Skip to content

Commit d16e703

Browse files
committed
add grpc server metrics.
Signed-off-by: morvencao <[email protected]>
1 parent 8ccbe0f commit d16e703

File tree

1 file changed

+126
-0
lines changed
  • enhancements/sig-architecture/141-grpc-based-registration

1 file changed

+126
-0
lines changed

enhancements/sig-architecture/141-grpc-based-registration/README.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,132 @@ The following security principles should be considered between the broker and so
163163
- The sources should be authorized by broker to avoid one source can consume event messages from other sources
164164
- The agent should be authorized by broker to avoid one agent can consume event messages from other clusters
165165

166+
### Metrics
167+
168+
The gRPC server exposes Prometheus metrics to monitor the health and performance. They are grouped into two categories:
169+
170+
1. **General gRPC server metrics**
171+
2. **CloudEvents-specific gRPC server metrics**
172+
173+
#### General gRPC server metrics
174+
175+
Common metrics for gRPC server health and performance, started with `grpc_server` as Prometheus subsystem name.
176+
177+
- **`grpc_server_active_connections`** (Gauge) - Current number of active connections.
178+
For example:
179+
```
180+
grpc_server_active_connections{local_addr="10.244.0.18:8090",remote_addr="10.244.0.16:45128"} 1
181+
```
182+
183+
- **`grpc_server_started_total`** (Counter) - Total number of RPCs started on the server.
184+
For example:
185+
```
186+
grpc_server_started_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
187+
grpc_server_started_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 4
188+
```
189+
190+
- **`grpc_server_msg_received_total`** (Counter) - Total number of RPC messages received on the server.
191+
For example:
192+
```
193+
grpc_server_msg_received_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
194+
grpc_server_msg_received_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 4
195+
```
196+
197+
- **`grpc_server_msg_sent_total`** (Counter) - Total number of gRPC messages sent by the server.
198+
For example:
199+
```
200+
grpc_server_msg_sent_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
201+
grpc_server_msg_sent_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 1
202+
```
203+
204+
- **`grpc_server_msg_received_bytes_total`** (Counter) - Total number of message bytes received on the gRPC server.
205+
For example:
206+
```
207+
grpc_server_msg_received_bytes_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 1729
208+
grpc_server_msg_received_bytes_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 245
209+
```
210+
211+
- **`grpc_server_msg_sent_bytes_total`** (Counter) - Total number of message bytes sent by the gRPC server.
212+
For example:
213+
```
214+
grpc_server_msg_sent_bytes_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 0
215+
grpc_server_msg_sent_bytes_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 1147
216+
```
217+
218+
- **`grpc_server_handled_total`** (Counter) - Total number of RPCs completed on the server, regardless of success or failure.
219+
For example:
220+
```
221+
grpc_server_handled_total{grpc_code="OK",grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
222+
```
223+
224+
- **`grpc_server_handling_seconds`** (Histogram) - Histogram of the duration of RPC handling by the gRPC server.
225+
For example:
226+
```
227+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.005"} 3
228+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.01"} 3
229+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.025"} 3
230+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.05"} 3
231+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.1"} 3
232+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.25"} 3
233+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.5"} 3
234+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="1"} 3
235+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="2.5"} 3
236+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="5"} 3
237+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="10"} 3
238+
grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="+Inf"} 3
239+
grpc_server_handling_seconds_sum{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 0.0055182140000000005
240+
grpc_server_handling_seconds_count{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
241+
```
242+
243+
#### CloudEvents-specific gRPC server metrics
244+
245+
Metrics specific to CloudEvents RPC calls, started with `grpc_server_ce` as Prometheus subsystem name.
246+
247+
- **`grpc_server_ce_called_total`** (Counter) - Total number of RPC requests called on the server.
248+
For example:
249+
```
250+
grpc_server_ce_called_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Publish"} 1
251+
grpc_server_ce_called_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Subscribe"} 1
252+
```
253+
254+
- **`grpc_server_ce_msg_received_total`** (Counter) - Total number of messages received on the gRPC server.
255+
For example:
256+
```
257+
grpc_server_ce_message_received_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Publish"} 1
258+
grpc_server_ce_message_received_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Subscribe"} 1
259+
```
260+
261+
- **`grpc_server_ce_msg_sent_total`** (Counter) - Total number of messages sent by the gRPC server.
262+
For example:
263+
```
264+
grpc_server_ce_message_sent_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Publish"} 1
265+
```
266+
267+
- **`grpc_server_ce_processed_total`** (Counter) - Total number of messages sent by the gRPC server.
268+
For example:
269+
```
270+
grpc_server_ce_processed_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish"} 1
271+
```
272+
273+
- **`grpc_server_ce_processed_duration_seconds_bucket`** (Histogram) - Histogram of the duration of RPC requests for cloudevents processed on the server.
274+
For example:
275+
```
276+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.005"} 0
277+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.01"} 1
278+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.025"} 1
279+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.05"} 1
280+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.1"} 1
281+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.25"} 1
282+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.5"} 1
283+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="1"} 1
284+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="2.5"} 1
285+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="5"} 1
286+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="10"} 1
287+
grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="+Inf"} 1
288+
grpc_server_ce_processed_duration_seconds_sum{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish"} 0.001053519
289+
grpc_server_ce_processed_duration_seconds_count{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish"} 1
290+
```
291+
166292
### Test Plan
167293

168294
**Note:** *Section not required until targeted at a release.*

0 commit comments

Comments
 (0)