-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Is your proposal related to a problem?
Even when the Thanos engine can process a query by consuming streams of inputs, it still builds the whole result in-memory as a single promql.Result containing a matrix or vector slice.
While Query can return the response to the requestor over PrometheusRequest gRPC as a stream, it still builds the result entirely in-memory before it starts to stream it.
This is generally OK when the data consumer is a human because there's a limited amount of data it's useful to present to a human. But it's a problem when the engine is being used to produce intermediate results, e.g. in individual Query instances servicing requests for an upstream Query instance in distributed mode. Such responses can be very large if the query cannot be aggregated on the individual Query instance level.
It'd difficult to reliably size Query instances to prevent OOMs and crashes if such queries may have high-cardinality outputs, large numbers of steps in range requests, and/or series with lots of long label values. Conservative limits to prevent OOMs can also prevent many perfectly reasonable queries from executing.
Describe the solution you'd like
I propose adopting a variant of promql.Query that yields an iterator for streaming out results as the engine generates them.
This might be done by defining the interface in Prometheus upstream and implementing it in the Prometheus executor then implementing it in Thanos. Or Thanos might wrap promql.Query in its own extended interface that adds streaming yield, and provide a wrapper to yield from the result vector for engines that don't implement it.
Either way, a new executor method like ExecStreaming that yields from an iterator would be added, and the rest of Thanos would use this instead of promql.Query's Exec(...) for both the Thanos engine and the Prometheus engine; the latter via a wrapper around Exec(...).
Thanos Query would then consume the iterator and - when serving a PrometheusRequest gRPC - stream each result directly to the client. When serving Prometheus HTTP+json protocol, it would still have to build the whole json serialization of the result in memory, but it can avoid having to also store the whole slice in promql parser structs format at the same time.
This would be particularly beneficial for Query instances that serve responses to an upstream distributed Query. But would also be helpful for clients that directly adopt Thanos gRPC PrometheusQuery (see PoC CLI here), and for any future work to integrate the functionality of Frontend into Query.
This will not get rid of all memory-spike problems in Thanos. It'll still often accumulate entire series into memory via Series gRPC requests (on downstream Query instances) and/or buffering whole responses from downstream PrometheusRequest gRPCs (on upstream distributed Query instances). But it'll help at least reduce unnecessary in-memory buffering of the result.
As a side-benefit, response latency and overall runtime should be reduced because clients using streamable gRPC can start reading and processing series while the executor is still running the query.
Describe alternatives you've considered
There aren't any, really. It's an optimisation.
Additional context
This ties in with
- Add iterator-returning variant of promql.Query engine interface prometheus/prometheus#17276
- Streaming Result for Query/QueryRange HTTP API prometheus/prometheus#10040
- Directly push-down PromQL to Prometheus from Query in distributed mode #8507
- Add sidecar flag bypass Prometheus response buffering and re-sorting #8487
- tools: Add 'thanos tools' 'query range' and 'query instant' subcommands for running PromQL queries over Thanos gRPC API #8501