Skip to content

Conversation

@kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Nov 5, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Artifacts generated in #23676

These metrics are using rate and so the unit can't be per second, it should be a percentage. They reflect the percentage of time spent per interval reading records, either via iteration or point gets.

sum(rate({table_metric('state_store_get_duration_bucket')
sum(rate({table_metric('state_store_iter_init_duration_bucket')
sum(rate({table_metric('state_store_iter_scan_duration_bucket')

Further, these metrics are all at the second level granularity. Since rate is:

rate(v range-vector) calculates the per-second average rate of increase of the time series in the range vector.

Therefore we don't need to apply normalization as we would for other metrics tracked as sub-second level (e.g. ns, ms, etc...).

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

Copy link
Contributor Author

kwannoel commented Nov 5, 2025

@kwannoel kwannoel changed the title update storage metrics fix(grafana): use percentage for hummock read metrics Nov 5, 2025
@github-actions github-actions bot added type/fix Type: Bug fix. Only for pull requests. and removed Invalid PR Title labels Nov 5, 2025
@kwannoel kwannoel marked this pull request as ready for review November 5, 2025 09:19
@kwannoel kwannoel requested review from Li0k and wenym1 November 5, 2025 09:19
@hzxa21
Copy link
Collaborator

hzxa21 commented Nov 6, 2025

These metrics are using rate and so the unit can't be per second, it should be a percentage. They reflect the percentage of time spent per interval reading records, either via iteration or point gets.

Can you explain more on this? I think it returns the tail duration in time unit instead of percentage.

For example, this returns the duration (in seconds) that 99% of your API calls completed within, calculated over the 5-minute window

histogram_quantile(0.99, sum(rate(api_call_duration_seconds_bucket[5m])) by (le, path))

@kwannoel
Copy link
Contributor Author

kwannoel commented Nov 6, 2025

These metrics are using rate and so the unit can't be per second, it should be a percentage. They reflect the percentage of time spent per interval reading records, either via iteration or point gets.

Can you explain more on this? I think it returns the tail duration in time unit instead of percentage.

For example, this returns the duration (in seconds) that 99% of your API calls completed within, calculated over the 5-minute window

histogram_quantile(0.99, sum(rate(api_call_duration_seconds_bucket[5m])) by (le, path))

Nevermind ignore me. I misinterpreted the metric: api_call_duration_seconds_bucket to be measuring seconds, rather than counts in duration buckets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/fix Type: Bug fix. Only for pull requests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants