@@ -163,6 +163,132 @@ The following security principles should be considered between the broker and so
163163- The sources should be authorized by broker to avoid one source can consume event messages from other sources
164164- The agent should be authorized by broker to avoid one agent can consume event messages from other clusters
165165
166+ ### Metrics
167+
168+ The gRPC server exposes Prometheus metrics to monitor the health and performance. They are grouped into two categories:
169+
170+ 1 . ** General gRPC server metrics**
171+ 2 . ** CloudEvents-specific gRPC server metrics**
172+
173+ #### General gRPC server metrics
174+
175+ Common metrics for gRPC server health and performance, started with ` grpc_server ` as Prometheus subsystem name.
176+
177+ - ** ` grpc_server_active_connections ` ** (Gauge) - Current number of active connections.
178+ For example:
179+ ```
180+ grpc_server_active_connections{local_addr="10.244.0.18:8090",remote_addr="10.244.0.16:45128"} 1
181+ ```
182+
183+ - ** ` grpc_server_started_total ` ** (Counter) - Total number of RPCs started on the server.
184+ For example:
185+ ```
186+ grpc_server_started_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
187+ grpc_server_started_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 4
188+ ```
189+
190+ - ** ` grpc_server_msg_received_total ` ** (Counter) - Total number of RPC messages received on the server.
191+ For example:
192+ ```
193+ grpc_server_msg_received_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
194+ grpc_server_msg_received_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 4
195+ ```
196+
197+ - ** ` grpc_server_msg_sent_total ` ** (Counter) - Total number of gRPC messages sent by the server.
198+ For example:
199+ ```
200+ grpc_server_msg_sent_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
201+ grpc_server_msg_sent_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 1
202+ ```
203+
204+ - ** ` grpc_server_msg_received_bytes_total ` ** (Counter) - Total number of message bytes received on the gRPC server.
205+ For example:
206+ ```
207+ grpc_server_msg_received_bytes_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 1729
208+ grpc_server_msg_received_bytes_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 245
209+ ```
210+
211+ - ** ` grpc_server_msg_sent_bytes_total ` ** (Counter) - Total number of message bytes sent by the gRPC server.
212+ For example:
213+ ```
214+ grpc_server_msg_sent_bytes_total{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 0
215+ grpc_server_msg_sent_bytes_total{grpc_method="Subscribe",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="server_stream"} 1147
216+ ```
217+
218+ - ** ` grpc_server_handled_total ` ** (Counter) - Total number of RPCs completed on the server, regardless of success or failure.
219+ For example:
220+ ```
221+ grpc_server_handled_total{grpc_code="OK",grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
222+ ```
223+
224+ - ** ` grpc_server_handling_seconds ` ** (Histogram) - Histogram of the duration of RPC handling by the gRPC server.
225+ For example:
226+ ```
227+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.005"} 3
228+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.01"} 3
229+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.025"} 3
230+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.05"} 3
231+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.1"} 3
232+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.25"} 3
233+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="0.5"} 3
234+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="1"} 3
235+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="2.5"} 3
236+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="5"} 3
237+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="10"} 3
238+ grpc_server_handling_seconds_bucket{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary",le="+Inf"} 3
239+ grpc_server_handling_seconds_sum{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 0.0055182140000000005
240+ grpc_server_handling_seconds_count{grpc_method="Publish",grpc_service="io.cloudevents.v1.CloudEventService",grpc_type="unary"} 3
241+ ```
242+
243+ #### CloudEvents-specific gRPC server metrics
244+
245+ Metrics specific to CloudEvents RPC calls, started with ` grpc_server_ce ` as Prometheus subsystem name.
246+
247+ - ** ` grpc_server_ce_called_total ` ** (Counter) - Total number of RPC requests called on the server.
248+ For example:
249+ ```
250+ grpc_server_ce_called_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Publish"} 1
251+ grpc_server_ce_called_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Subscribe"} 1
252+ ```
253+
254+ - ** ` grpc_server_ce_msg_received_total ` ** (Counter) - Total number of messages received on the gRPC server.
255+ For example:
256+ ```
257+ grpc_server_ce_msg_received_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Publish"} 1
258+ grpc_server_ce_msg_received_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Subscribe"} 1
259+ ```
260+
261+ - ** ` grpc_server_ce_msg_sent_total ` ** (Counter) - Total number of messages sent by the gRPC server.
262+ For example:
263+ ```
264+ grpc_server_ce_msg_sent_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",method="Publish"} 1
265+ ```
266+
267+ - ** ` grpc_server_ce_processed_total ` ** (Counter) - Total number of messages sent by the gRPC server.
268+ For example:
269+ ```
270+ grpc_server_ce_processed_total{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish"} 1
271+ ```
272+
273+ - ** ` grpc_server_ce_processed_duration_seconds_bucket ` ** (Histogram) - Histogram of the duration of RPC requests for cloudevents processed on the server.
274+ For example:
275+ ```
276+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.005"} 0
277+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.01"} 1
278+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.025"} 1
279+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.05"} 1
280+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.1"} 1
281+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.25"} 1
282+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="0.5"} 1
283+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="1"} 1
284+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="2.5"} 1
285+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="5"} 1
286+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="10"} 1
287+ grpc_server_ce_processed_duration_seconds_bucket{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish",le="+Inf"} 1
288+ grpc_server_ce_processed_duration_seconds_sum{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish"} 0.001053519
289+ grpc_server_ce_processed_duration_seconds_count{cluster="cluster1",data_type="io.open-cluster-management.works.v1alpha1.manifestbundles",grpc_code="OK",method="Publish"} 1
290+ ```
291+
166292### Test Plan
167293
168294** Note:** * Section not required until targeted at a release.*
0 commit comments