Skip to content

Conversation

@zenador
Copy link
Contributor

@zenador zenador commented Nov 9, 2025

What this PR does

Support the experimental info function in MQE.

Which issue(s) this PR fixes or relates to

Fixes https://github.com/grafana/mimir-squad/issues/3084

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Note

Implements info() to enrich series with labels from info metrics, adds planning/selector wiring, and comprehensive tests.

  • Functions/Operators:
    • info() implemented via new InfoFunction operator to enrich base series with labels from matching info metrics (instance,job).
      • Resolves overlapping info by latest timestamp; ignores enriching info-series themselves; preserves base histograms; errors if info metrics have histograms.
  • Planner:
    • Normalizes info() args: defaults second arg to target_info; enforces vector selector; auto-adds __name__="target_info" if missing; marks FUNCTION_INFO as not needing dedup.
    • Registers FUNCTION_INFO with InfoFunctionOperatorFactory.
  • Selector:
    • Adds ReturnSampleTimestampsPreserveHistograms to InstantVectorSelector to expose timestamps as floats while preserving histograms (used by info()).
  • Tests:
    • Adds upstream and custom info.test covering label matching, lookback/@/offset, churn, conflicts, and error cases.

Written by Cursor Bugbot for commit 7f1aa4a. This will update automatically on new commits. Configure here.

@zenador zenador requested a review from a team as a code owner November 9, 2025 21:15

newLabelSets = append(newLabelSets, lb.Labels())
labelSetsOrder = append(labelSetsOrder, makeLabelSetsHash(labelSets))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Labels Builder Accumulates, Corrupting Label Combinations

The labels.Builder in combineLabels is missing a reset between iterations of the outer loop over labelSetsMap. After calling lb.Reset(innerSeries.Labels) once at line 379, the builder accumulates labels from each iteration without resetting. This causes labels from previous labelSets iterations to leak into subsequent iterations, creating incorrect combined label sets. The builder should be reset to innerSeries.Labels at the start of each iteration through labelSetsMap.

Fix in Cursor Fix in Web


timeRange types.QueryTimeRange
expressionPosition posrange.PositionRange
enableDelayedNameRemoval bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Comment on lines +506 to +509
dataLabelMatchersExpr, ok := expr.Args[1].(*parser.VectorSelector)
if !ok {
return nil, fmt.Errorf("expected second argument to 'info' function to be a VectorSelector, got %T", expr.Args[1])
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to add a special case to CSE to ensure it doesn't replace the second argument to info with a deduplicated expression.

Comment on lines +499 to +504
if len(expr.Args) == 1 {
infoExpr, err := parser.ParseExpr("target_info")
if err != nil {
return nil, err
}
expr.Args = append(expr.Args, infoExpr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to do this as a very early AST optimisation pass. This will allow all other optimisation passes to ignore the fact that info can have one or two arguments, and instead they can just handle the two argument case.

Comment on lines +77 to +82
ivs, ok := f.Info.(*selectors.InstantVectorSelector)
if !ok {
return nil, fmt.Errorf("info function 2nd argument is not an instant vector selector")
}
// Override float values to reflect original timestamps.
ivs.ReturnSampleTimestampsPreserveHistograms = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we require Info to be an InstantVectorSelector, then we should enforce this by changing the type of the parameter to NewInfoFunction.

Comment on lines +77 to +82
ivs, ok := f.Info.(*selectors.InstantVectorSelector)
if !ok {
return nil, fmt.Errorf("info function 2nd argument is not an instant vector selector")
}
// Override float values to reflect original timestamps.
ivs.ReturnSampleTimestampsPreserveHistograms = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than setting this here, we should set it on the vector selector node in nodeFromExpr. This will allow optimisation passes to respect that this selector needs to behave differently (eg. that it's not equivalent to another similar selector without this set).

(You'll then need to update the vector selector node's equivalence and description methods to match.)

Comment on lines +84 to +90
innerMetadata, err := f.Inner.SeriesMetadata(ctx, matchers)
if err != nil {
return nil, err
}
defer types.SeriesMetadataSlicePool.Put(&innerMetadata, f.MemoryConsumptionTracker)

infoMetadata, err := f.Info.SeriesMetadata(ctx, matchers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't pass matchers as-is here as they might apply to the labels from Inner or from Info, and so they could filter out all the series from the other selector.

(eg. imagine matchers contains a environment="prod" matcher, and the environment label only appears on Inner - passing this matcher to Info will cause it to incorrectly return no results)

Comment on lines +84 to +90
innerMetadata, err := f.Inner.SeriesMetadata(ctx, matchers)
if err != nil {
return nil, err
}
defer types.SeriesMetadataSlicePool.Put(&innerMetadata, f.MemoryConsumptionTracker)

infoMetadata, err := f.Info.SeriesMetadata(ctx, matchers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we do need to do here: we should generate some matchers for job and instance based on the series returned by Inner. Otherwise we could unnecessarily select all known target_info series during the query time range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rejig this to avoid loading all of the info series upfront, as this could be a lot of data.

It's OK for SeriesMetadata to return series that will later turn out to have no samples, so we can do something similar to what we do for binary operations.

I'm imagining something like this:

  • in SeriesMetadata:
    • load all series metadata from Inner
    • compute the possible set of instance and job labels
    • load all series metadata from Info, passing the matchers for instance and job
    • compute the cross product of all possible inner and info series, in the same order that the inner series are produced
  • in NextSeries:
    • if we don't have an active inner series:
      • read the next inner series
      • read all corresponding info series for this inner series, buffering any we read that will be needed for other inner series
      • compute all of the output series based on the inner series and the info series (in the same order these were returned by SeriesMetadata)
      • return the first output series for this inner series
    • if we have an active inner series, return the next output series we computed when we first read it

As a later improvement (not in this PR), if we can ask ingesters and store-gateways to sort returned series by arbitrary labels, then we can ask them to send series sorted by instance and job (and then by all other labels), and there should be almost no buffering required. This would also be beneficial for binary operations and aggregations - we could ask for series to be sorted by binary operation matching labels or aggregation grouping labels to reduce the amount of buffering these operators need to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants