2.8.0-rc.0
Pre-releaseThis release contains 210 PRs from 53 authors, including new contributors Abdurrahman J. Allawala, Ashray Jain, Cyrill N, Daniel Barnes, Dave, David van der Spek, day4me, Devin Trejo, Dmitriy Okladin, Gabriel Santos, inbarpatashnik, Johannes Tandler, Julien Girard, KingJ, Miller, Rafał Boniecki, Raphael Ferreira, Raúl Marín, Ruslan Kovalov, Shagit Ziganshin, shanmugara, Wilfried ROSET. Thank you!
Grafana Mimir version 2.8.0-rc.0 release notes
Grafana Labs is excited to announce version 2.8 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Features and enhancements
- Changed default value of block storage retention period The default value for
-blocks-storage.tsdb.retention-periodwas24hand now is13h. - Query-frontend cached results now contain timestamp This allows Mimir to check if cached results are still valid based on current TTL configured for tenant. Results cached by previous Mimir version are used until they expire from cache, which can take up to 7 days. If you need to use per-tenant TTL sooner, please flush results cache manually.
- Experimental support for using Redis as cache Mimir now can use Redis for caching results, chunks, index and metadata.
- Experimental support for fetching secret from Vault for TLS configuration.
Helm chart improvements
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
Important changes
In Grafana Mimir 2.8 we have removed the following previously deprecated or experimental configuration options or metrics.
The following metrics have been removed cortex_bucket_store_series_get_all_duration_seconds, cortex_bucket_store_series_merge_duration_seconds,
cortex_ingester_tsdb_wal_replay_duration_seconds.
The following configuration options are deprecated and will be removed in Grafana Mimir 2.10:
- The CLI flag
-blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startupand its respective YAML configuration optiontsdb.max_tsdb_opening_concurrency_on_startup.
The following experimental options and features are now stable:
- Use protobuf internal query result payload format by default.
Bug fixes
- Querier: Streaming remote read will now continue to return multiple chunks per frame after the first frame. PR 4423
- Query-frontend: don't retry queries which error inside PromQL. PR 4643
- Store-gateway & query-frontend: report more consistent statistics for fetched index bytes. PR 4671
- Native histograms: fix how IsFloatHistogram determines if mimirpb.Histogram is a float histogram. PR 4706
- Query-frontend: fix query sharding for native histograms. PR 4666
Changelog
2.8.0-rc.0
Grafana Mimir
- [CHANGE] Ingester: changed experimental CLI flag from
-out-of-order-blocks-external-label-enabledto-ingester.out-of-order-blocks-external-label-enabled#4440 - [CHANGE] Store-gateway: The following metrics have been removed: #4332
cortex_bucket_store_series_get_all_duration_secondscortex_bucket_store_series_merge_duration_seconds
- [CHANGE] Ingester: changed default value of
-blocks-storage.tsdb.retention-periodfrom24hto13h. If you're running Mimir with a custom configuration and you're overriding-querier.query-store-afterto a value greater than the default12hthen you should increase-blocks-storage.tsdb.retention-periodaccordingly. #4382 - [CHANGE] Ingester: the configuration parameter
-blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startuphas been deprecated and will be removed in Mimir 2.10. #4445 - [CHANGE] Query-frontend: Cached results now contain timestamp which allows Mimir to check if cached results are still valid based on current TTL configured for tenant. Results cached by previous Mimir version are used until they expire from cache, which can take up to 7 days. If you need to use per-tenant TTL sooner, please flush results cache manually. #4439
- [CHANGE] Ingester: the
cortex_ingester_tsdb_wal_replay_duration_secondsmetrics has been removed. #4465 - [CHANGE] Query-frontend and ruler: use protobuf internal query result payload format by default. This feature is no longer considered experimental. #4557 #4709
- [CHANGE] Ruler: reject creating federated rule groups while tenant federation is disabled. Previously the rule groups would be silently dropped during bucket sync. #4555
- [CHANGE] Compactor: the
/api/v1/upload/block/{block}/finishendpoint now returns a429status code when the compactor has reached the limit specified by-compactor.max-block-upload-validation-concurrency. #4598 - [CHANGE] Compactor: when starting a block upload the maximum byte size of the block metadata provided in the request body is now limited to 1 MiB. If this limit is exceeded a
413status code is returned. #4683 - [CHANGE] Store-gateway: cache key format for expanded postings has changed. This will invalidate the expanded postings in the index cache when deployed. #4667
- [FEATURE] Cache: Introduce experimental support for using Redis for results, chunks, index, and metadata caches. #4371
- [FEATURE] Vault: Introduce experimental integration with Vault to fetch secrets used to configure TLS for clients. Server TLS secrets will still be read from a file.
tls-ca-path,tls-cert-pathandtls-key-pathwill denote the path in Vault for the following CLI flags when-vault.enabledis true: #4446.-distributor.ha-tracker.etcd.*-distributor.ring.etcd.*-distributor.forwarding.grpc-client.*-querier.store-gateway-client.*-ingester.client.*-ingester.ring.etcd.*-querier.frontend-client.*-query-frontend.grpc-client-config.*-query-frontend.results-cache.redis.*-blocks-storage.bucket-store.index-cache.redis.*-blocks-storage.bucket-store.chunks-cache.redis.*-blocks-storage.bucket-store.metadata-cache.redis.*-compactor.ring.etcd.*-store-gateway.sharding-ring.etcd.*-ruler.client.*-ruler.alertmanager-client.*-ruler.ring.etcd.*-ruler.query-frontend.grpc-client-config.*-alertmanager.sharding-ring.etcd.*-alertmanager.alertmanager-client.*-memberlist.*-query-scheduler.grpc-client-config.*-query-scheduler.ring.etcd.*-overrides-exporter.ring.etcd.*
- [FEATURE] Distributor, ingester, querier, query-frontend, store-gateway: add experimental support for native histograms. Requires that the experimental protobuf query result response format is enabled by
-query-frontend.query-result-response-format=protobufon the query frontend. #4286 #4352 #4354 #4376 #4377 #4387 #4396 #4425 #4442 #4494 #4512 #4513 #4526 - [FEATURE] Added
-<prefix>.s3.storage-classflag to configure the S3 storage class for objects written to S3 buckets. #4300 - [FEATURE] Add
freebsdto the target OS when generating binaries for a Mimir release. #4654 - [FEATURE] Ingester: Add
prepare-shutdownendpoint which can be used as part of Kubernetes scale down automations. #4718 - [ENHANCEMENT] Add timezone information to Alpine Docker images. #4583
- [ENHANCEMENT] Ruler: Sync rules when ruler JOINING the ring instead of ACTIVE, In order to reducing missed rule iterations during ruler restarts. #4451
- [ENHANCEMENT] Allow to define service name used for tracing via
JAEGER_SERVICE_NAMEenvironment variable. #4394 - [ENHANCEMENT] Querier and query-frontend: add experimental, more performant protobuf query result response format enabled with
-query-frontend.query-result-response-format=protobuf. #4304 #4318 #4375 - [ENHANCEMENT] Compactor: added experimental configuration parameter
-compactor.first-level-compaction-wait-period, to configure how long the compactor should wait before compacting 1st level blocks (uploaded by ingesters). This configuration option allows to reduce the chances compactor begins compacting blocks before all ingesters have uploaded their blocks to the storage. #4401 - [ENHANCEMENT] Store-gateway: use more efficient chunks fetching and caching. #4255
- [ENHANCEMENT] Query-frontend and ruler: add experimental, more performant protobuf internal query result response format enabled with
-ruler.query-frontend.query-result-response-format=protobuf. #4331 - [ENHANCEMENT] Ruler: increased tolerance for missed iterations on alerts, reducing the chances of flapping firing alerts during ruler restarts. #4432
- [ENHANCEMENT] Optimized
.*and.+regular expression label matchers. #4432 - [ENHANCEMENT] Optimized regular expression label matchers with alternates (e.g.
a|b|c). #4647 - [ENHANCEMENT] Added an in-memory cache for regular expression matchers, to avoid parsing and compiling the same expression multiple times when used in recurring queries. #4633
- [ENHANCEMENT] Query-frontend: results cache TTL is now configurable by using
-query-frontend.results-cache-ttland-query-frontend.results-cache-ttl-for-out-of-order-time-windowoptions. These values can also be specified per tenant. Default values are unchanged (7 days and 10 minutes respectively). #4385 - [ENHANCEMENT] Ingester: added advanced configuration parameter
-blocks-storage.tsdb.wal-replay-concurrencyrepresenting the maximum number of CPUs used during WAL replay. #4445 - [ENHANCEMENT] Ingester: added metrics
cortex_ingester_tsdb_open_duration_seconds_totalto measure the total time it takes to open all existing TSDBs. The time tracked by this metric also includes the TSDBs WAL replay duration. #4465 - [ENHANCEMENT] Store-gateway: use streaming implementation for LabelNames RPC. The batch size for streaming is controlled by
-blocks-storage.bucket-store.batch-series-size. #4464 - [ENHANCEMENT] Memcached: Add support for TLS or mTLS connections to cache servers. #4535
- [ENHANCEMENT] Compactor: blocks index files are now validated for correctness for blocks uploaded via the TSDB block upload feature. #4503
- [ENHANCEMENT] Compactor: block chunks and segment files are now validated for correctness for blocks uploaded via the TSDB block upload feature. #4549
- [ENHANCEMENT] Ingester: added configuration options to configure the "postings for matchers" cache of each compacted block queried from ingesters: #4561
-blocks-storage.tsdb.block-postings-for-matchers-cache-ttl-blocks-storage.tsdb.block-postings-for-matchers-cache-size-blocks-storage.tsdb.block-postings-for-matchers-cache-force
- [ENHANCEMENT] Compactor: validation of blocks uploaded via the TSDB block upload feature is now configurable on a per tenant basis: #4585
-compactor.block-upload-validation-enabledhas been added,compactor_block_upload_validation_enabledcan be used to override per tenant-compactor.block-upload.block-validation-enabledwas the previous global flag and has been removed
- [ENHANCEMENT] TSDB Block Upload: block upload validation concurrency can now be limited with
-compactor.max-block-upload-validation-concurrency. #4598 - [ENHANCEMENT] OTLP: Add support for converting OTel exponential histograms to Prometheus native histograms. The ingestion of native histograms must be enabled, please set
-ingester.native-histograms-ingestion-enabledtotrue. #4063 #4639 - [ENHANCEMENT] Query-frontend: add metric
cortex_query_fetched_index_bytes_totalto measure TSDB index bytes fetched to execute a query. #4597 - [ENHANCEMENT] Query-frontend: add experimental limit to enforce a max query expression size in bytes via
-query-frontend.max-query-expression-size-bytesormax_query_expression_size_bytes. #4604 - [ENHANCEMENT] Query-tee: improve message logged when comparing responses and one response contains a non-JSON payload. #4588
- [ENHANCEMENT] Distributor: add ability to set per-distributor limits via
distributor_limitsblock in runtime configuration in addition to the existing configuration. #4619 - [ENHANCEMENT] Querier: reduce peak memory consumption for queries that touch a large number of chunks. #4625
- [ENHANCEMENT] Query-frontend: added experimental
-query-frontend.query-sharding-max-regexp-size-byteslimit to query-frontend. When set to a value greater than 0, query-frontend disabled query sharding for any query with a regexp matcher longer than the configured limit. #4632 - [ENHANCEMENT] Store-gateway: include statistics from LabelValues and LabelNames calls in
cortex_bucket_store_series*metrics. #4673 - [ENHANCEMENT] Query-frontend: improve readability of distributed tracing spans. #4656
- [ENHANCEMENT] Update Docker base images from
alpine:3.17.2toalpine:3.17.3. #4685 - [ENHANCEMENT] Querier: improve performance when shuffle sharding is enabled and the shard size is large. #4711
- [ENHANCEMENT] Ingester: improve performance when Active Series Tracker is in use. #4717
- [ENHANCEMENT] Store-gateway: optionally select
-blocks-storage.bucket-store.series-selection-strategy, which can limit the impact of large posting lists (when many series share the same label name and value). #4667 #4695 #4698 - [ENHANCEMENT] Querier: Cache the converted float histogram from chunk iterator, hence there is no need to lookup chunk every time to get the converted float histogram. #4684
- [BUGFIX] Querier: Streaming remote read will now continue to return multiple chunks per frame after the first frame. #4423
- [BUGFIX] Store-gateway: the values for
stage="processed"for the metricscortex_bucket_store_series_data_touchedandcortex_bucket_store_series_data_size_touched_byteswhen using fine-grained chunks caching is now reporting the correct values of chunks held in memory. #4449 - [BUGFIX] Compactor: fixed reporting a compaction error when compactor is correctly shut down while populating blocks. #4580
- [BUGFIX] OTLP: Do not drop exemplars of the OTLP Monotonic Sum metric. #4063
- [BUGFIX] Packaging: flag
/etc/default/mimirand/etc/sysconfig/mimiras config to prevent overwrite. #4587 - [BUGFIX] Query-frontend: don't retry queries which error inside PromQL. #4643
- [BUGFIX] Store-gateway & query-frontend: report more consistent statistics for fetched index bytes. #4671
- [BUGFIX] Native histograms: fix how IsFloatHistogram determines if mimirpb.Histogram is a float histogram. #4706
- [BUGFIX] Query-frontend: fix query sharding for native histograms. #4666
- [BUGFIX] Ring status page: fixed the owned tokens percentage value displayed. #4730
- [BUGFIX] Querier: fixed chunk iterator that can return sample with wrong timestamp. #4450
Mixin
- [ENHANCEMENT] Queries: Display data touched per sec in bytes instead of number of items. #4492
- [ENHANCEMENT]
_config.job_names.<job>values can now be arrays of regular expressions in addition to a single string. Strings are still supported and behave as before. #4543 - [ENHANCEMENT] Queries dashboard: remove mention to store-gateway "streaming enabled" in panels because store-gateway only support streaming series since Mimir 2.7. #4569
- [ENHANCEMENT] Ruler: Add panel description for Read QPS panel in Ruler dashboard to explain values when in remote ruler mode. #4675
- [BUGFIX] Ruler dashboard: show data for reads from ingesters. #4543
- [BUGFIX] Pod selector regex for deployments: change
(.*-mimir-)to(.*mimir-). #4603
Jsonnet
- [CHANGE] Ruler: changed ruler deployment max surge from
0to50%, and max unavailable from1to0. #4381 - [CHANGE] Memcached connections parameters
-blocks-storage.bucket-store.index-cache.memcached.max-idle-connections,-blocks-storage.bucket-store.chunks-cache.memcached.max-idle-connectionsand-blocks-storage.bucket-store.metadata-cache.memcached.max-idle-connectionssettings are now configured based onmax-get-multi-concurrencyandmax-async-concurrency. #4591 - [CHANGE] Add support to use external Redis as cache. Following are some changes in the jsonnet config: #4386 #4640
- Renamed
memcached_*_enabledconfig options tocache_*_enabled - Renamed
memcached_*_max_item_size_mbconfig options tocache_*_max_item_size_mb - Added
cache_*_backendconfig options
- Renamed
- [CHANGE] Store-gateway StatefulSets with disabled multi-zone deployment are also unregistered from the ring on shutdown. This eliminated resharding during rollouts, at the cost of extra effort during scaling down store-gateways. For more information see Scaling down store-gateways. #4713
- [ENHANCEMENT] Alertmanager: add
alertmanager_data_disk_sizeandalertmanager_data_disk_classconfiguration options, by default no storage class is set. #4389 - [ENHANCEMENT] Update
rollout-operatortov0.4.0. #4524 - [ENHANCEMENT] Update memcached to
memcached:1.6.19-alpine. #4581 - [ENHANCEMENT] Add support for mTLS connections to Memcached servers. #4553
- [ENHANCEMENT] Update the
memcached-exportertov0.11.2. #4570 - [ENHANCEMENT] Autoscaling: Add
autoscaling_query_frontend_memory_target_utilization,autoscaling_ruler_query_frontend_memory_target_utilization, andautoscaling_ruler_memory_target_utilizationconfiguration options, for controlling the corresponding autoscaler memory thresholds. Each has a default of 1, i.e. 100%. #4612 - [ENHANCEMENT] Distributor: add ability to set per-distributor limits via
distributor_instance_limitsusing runtime configuration. #4627 - [BUGFIX] Add missing query sharding settings for user_24M and user_32M plans. #4374
Mimirtool
- [ENHANCEMENT] Backfill: mimirtool will now sleep and retry if it receives a 429 response while trying to finish an upload due to validation concurrency limits. #4598
- [ENHANCEMENT]
gaugepanel type is supported now inmimirtool analyze dashboard. #4679 - [ENHANCEMENT] Set a
User-Agentheader on requests to Mimir or Prometheus servers. #4700
Mimir Continuous Test
- [FEATURE] Allow continuous testing of native histograms as well by enabling the flag
-tests.write-read-series-test.histogram-samples-enabled. The metrics exposed by the tool will now have a new label calledtypewith possible values offloat,histogram_float_counter,histogram_float_gauge,histogram_int_counter,histogram_int_gauge, the list of metrics impacted: #4457mimir_continuous_test_writes_totalmimir_continuous_test_writes_failed_totalmimir_continuous_test_queries_totalmimir_continuous_test_queries_failed_totalmimir_continuous_test_query_result_checks_totalmimir_continuous_test_query_result_checks_failed_total
- [ENHANCEMENT] Added a new metric
mimir_continuous_test_build_infothat reports version information, similar to the existingcortex_build_infometric exposed by other Mimir components. #4712 - [ENHANCEMENT] Add coherency for the selected ranges and instants of test queries. #4704
Documentation
- [CHANGE] Clarify what deprecation means in the lifecycle of configuration parameters. #4499
- [CHANGE] Update compactor
split-groupsandsplit-and-merge-shardsrecommendation on component page. #4623 - [FEATURE] Add instructions about how to configure native histograms. #4527
- [ENHANCEMENT] Runbook for MimirCompactorHasNotSuccessfullyRunCompaction extended to include scenario where compaction has fallen behind. #4609
- [ENHANCEMENT] Add explanation for QPS values for reads in remote ruler mode and writes generally, to the Ruler dashboard page. #4629
- [ENHANCEMENT] Expand zone-aware replication page to cover single physical availability zone deployments. #4631
- [FEATURE] Add instructions to use puppet module. #4610
Tools
- [ENHANCEMENT] tsdb-index: iteration over index is now faster when any equal matcher is supplied. #4515
All changes in this release: mimir-2.7.1...mimir-2.8.0-rc.0