Ingest storage architecture / Kafka topic retention #13354
-
|
I've updated a deployment with the new mimir-distributed chart v6 / Mimir v3 and thus switched to the new ingest storage architecture. However, with approx. 50k samples / second ingested, the Kafka disk seems to fill up unexpectedly fast (using Kafka default config from the chart). Previously, 3 ingester replicas with a 15GB disk each handled everything nicely. With the updated deployment, the assigned Kafka disk of 50GB fills up within roughly half a day. This disk usage feels a bit extreme in comparison? I'v thus tried to look into the data retention settings of Kafka and noticed that it's deployed with a retention time of 24 hours (https://github.com/grafana/mimir/blob/mimir-distributed-6.0.1/operations/helm/charts/mimir-distributed/templates/kafka/kafka-statefulset.yaml#L101-L102). The auto-configured topic seems not to override this? I couldn't find any information on retention requirements in the Mimir docs (which may be useful too when setting up your own/external Kafka instance). But as far as I understand the ingesters previously also generated new blocks every few hours and moved things to long-term storage. Hence the 24 hour local retention feels a bit extreme? Can anyone provide guidance on this, can retention settings in Kafka be lowered? Which value would be acceptable? Or is this expected, am I missing something else regarding (local) storage consumption, etc? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
|
The data written to Kafka is pretty verbose because we write the full write requests where series labels are repeated on each request. Ingesters typically consume from Kafka with a subsecond delay so, theoretically, you would just need few seconds of retention in Kafka. In practice, a longer retention is required to compensate for the case ingesters are unhealthy or lagging behind. I agree that 24h Kafka retention is very high and it's reasonable (and safe) to set the retention to a lower value. |
Beta Was this translation helpful? Give feedback.
-
|
Do we have a working example of this configuration. I have added the following to the helm values;- However now the kafka container has two values for KAFKA_LOG_RETENTION_HOURS KAFKA_LOG_RETENTION_HOURS: 24 |
Beta Was this translation helpful? Give feedback.
The data written to Kafka is pretty verbose because we write the full write requests where series labels are repeated on each request. Ingesters typically consume from Kafka with a subsecond delay so, theoretically, you would just need few seconds of retention in Kafka. In practice, a longer retention is required to compensate for the case ingesters are unhealthy or lagging behind.
I agree that 24h Kafka retention is very high and it's reasonable (and safe) to set the retention to a lower value.