Skip to content

Conversation

@GiedriusS
Copy link
Member

Enable streaming by putting external labels into the TSDB. If the external labels ever change then the head is pruned during startup.

@GiedriusS GiedriusS force-pushed the ext_as_normal branch 2 times, most recently from 7c7a1f2 to 2bfaa32 Compare October 28, 2025 14:35
Enable streaming by putting external labels into the TSDB. If the
external labels ever change then the head is pruned during startup.

Signed-off-by: Giedrius Statkevičius <[email protected]>
@GiedriusS GiedriusS marked this pull request as ready for review October 29, 2025 12:13
Copy link

@ringerc ringerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is a patch for thanos-receive to materialize external-labels in the TSDB rather than injecting them into Series gRPC.

It's analogous to the discussion were were involved with re Prometheus, external-labels, sidecar and remote-read. But this change is for helping with queries serviced by Receive, not Sidecar.

Sound about right?

If so: Won't changes also be required for the Store gateway to stream series from blocks known to contain embedded external labels? And in Compactor to preserve the external labels added by Receive, and to inject external labels when processing Sidecar-uploaded blocks?

Also, these extlabel-embedded blocks would only be from Receive, since Sidecar uploaded blocks won't have the external labels. Meaning a Thanos that mixes Receive and Sidecar will not benefit much from this change, because freshly uploaded uncompacted blocks from Sidecar will not have the external labels embedded yet.

To support streaming Series gRPC from Sidecar, a similar change would be required in Prometheus itself to materialize external labels in the TSDB. And some assessment made of the cost/performance impact of doing so.

To benefit from this proposed change it seems like it'd be necessary to deploy one or more agent-mode Prometheus instances for scraping, and configure them to remote-write to Thanos Receive. Is that the intended use model here, where Sidecar is omitted and Receive handles Series gRPC for fresh samples?

Comment on lines +84 to +85
// WithExternalLabelsInTSDB enables putting external labels in the TSDB.
// This permits streaming from the TSDB to the querier.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// WithExternalLabelsInTSDB enables putting external labels in the TSDB.
// This ensures that external labels and normal labels are sorted correctly, so a Series gRPC request
// can be streamed without the need for a buffer-and-re-sort pass after injecting external labels.


### ext-labels-in-tsdb

If enabled then it will put the current external labels as "normal" labels inside of the TSDB. This also adds a special marker to the meta files in blocks so that it would be known whether external labels are part of the series inside of the TSDB.
Copy link

@ringerc ringerc Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if this proposed addition to the readme for feature flags you've added looks sensible?:

...
If all blocks matched by a request had this feature flag enabled at the time they were written, Receive can stream responses to Series gRPC requests from Query instead of buffering them in memory. This lowers Receive's peak memory consumption for high-cardinality or high-sample-count requests.

However, storing external labels in the TSDB will increase TSDB index sizes because each external label must appear in the inverted index for every series. TSDB blocks will be somewhat larger and need more disk I/O to read, and HEAD chunks will require somewhat more memory.

If the configured external labels are changed, Receive will flush the current HEAD block to disk and start a new HEAD block. No data is discarded.

Comment on lines +750 to +755
oldLset := labels.FromMap(m.Thanos.Labels)
if labels.Equal(oldLset, curLset) {
return nil
}

level.Info(t.logger).Log("msg", "changed external labelset detected, dumping the head block", "newLset", curLset.String(), "oldLset", oldLset.String())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider "flushing ... to disk" instead of "dumping"; the latter makes it sound like the data is being discarded.

}

if st.addExtLabelsToTSDB {
allBlocksHaveExtension := func() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ // If all blocks have external labels embedded in them, we can skip building a matrix of results
+ // in-memory, injecting external labels and re-sorting them. The results may be streamed directly
+ // to the caller instead.

@ringerc
Copy link

ringerc commented Oct 30, 2025

Consider adding to PR description that

This relates to prior work on re-sorting responses with external labels in #8487, and memory-use increases as discussed in #7395 .

This change only affects blocks in Receive's memory, not blocks shipped to the object store and fetched via Store gateway, or blocks read from Prometheus via Sidecar. But Query will still benefit from the streaming response from Receive even if other components still need to buffer, inject labels and sort.

@fpetkovski
Copy link
Contributor

fpetkovski commented Oct 31, 2025

I don't know if this solves the problem completely. The key issue is that dropping an internal label can cause the set to become unsorted, I don't think where we store external labels changes this.

@MichaHoffmann
Copy link
Contributor

I don't know if this solves the problem completely. The key issue is that dropping an internal label can cause the set to become unsorted, I don't think where we store external labels changes this.

Agree - I dont think this solves the issue - We should probably add external labels as a slice in the Series gRPC call and only allow dropping from that slice. Then we will keep the series order and can drop I think.

@GiedriusS
Copy link
Member Author

I don't know if this solves the problem completely. The key issue is that dropping an internal label can cause the set to become unsorted, I don't think where we store external labels changes this.

We aren't dropping any labels anywhere with this patch. Could you elaborate a bit more?

@fpetkovski
Copy link
Contributor

I thought this PR is needed to enable streaming series without resorting so my comment is related to that feature.

@GiedriusS
Copy link
Member Author

I thought this PR is needed to enable streaming series without resorting so my comment is related to that feature.

Yeah but with this patch "without replica labels" doesn't do anything anymore - we put sorted series in with external labels, get sorted series out. Your original comment said that dropping labels causes the set to be unsorted but we don't do that anywhere, do we?

@ringerc
Copy link

ringerc commented Nov 30, 2025

The key issue is that dropping an internal label can cause the set to become unsorted, I don't think where we store external labels changes this.

This is only true if the series data has collisions with the external labels.

If I understand correctly, @fpetkovski is probably saying that this change has to either clobber existing values by replacing them with the configured external labels, or it has to respect the existing user values and fail to enforce the external labels.

Is that right?


I increasingly think that conflicts between external and user labels should simply be made an error that results in ingestion, scrape, or query failure (depending on where the conflict arises). Or at least an annotation warning that query results may be invalid and incorrect. A pre-existing label with the same name as an external label would be permitted and accepted, but only if its value is the same as the configured external label.

That'd make the whole mess go away throughout the stack. A feature flag could be used to introduce enforcement optionally, then made on by default but allow people to turn it off for a while before wholly removing the option. When enforcement is on, Sidecar wouldn't have to re-sort (it'd just check for and enforce labels instead, then inject them into the stream) so no need for #8487. And we could probably add a similar FF to Prometheus for its remote-read external labels, or simply stop using prom-level external labels in favour of having Sidecar inject them as it processes the metric stream it is forwarding.

Receive, the subject of this PR, could drop samples with conflicting external labels and increment a suitable counter to ensure this can be observed. Then it could discard all label-pairs that match the configured external labels; there'd be no need to store them in the TSDB if they're known from external labels already, they can just be injected when streaming-out the response without changing the sort order.

For cascaded setups, fan-outs, etc, the lower tiers would need one set of external label names, and the higher tiers another. E.g. the lower tier might have {prometheus="foo",datacentre="bar"} and the upper tier might only have {datacentre="bar"}. So the upper tier would still be able to enforce datacentre="bar", and the prometheus="foo" (or whatever value) would be promoted to a normal series label for ingestion, forwarding, etc. But my understanding is that this is basically how it should work anyway, nobody should be overriding external labels in lower tiers of a distributed query fan-out or similar. Correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants