-
Notifications
You must be signed in to change notification settings - Fork 669
MQE: Implement experimental info function #13443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…a so we can get from the pool just once with the right number
|
|
||
| newLabelSets = append(newLabelSets, lb.Labels()) | ||
| labelSetsOrder = append(labelSetsOrder, makeLabelSetsHash(labelSets)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Labels Builder Accumulates, Corrupting Label Combinations
The labels.Builder in combineLabels is missing a reset between iterations of the outer loop over labelSetsMap. After calling lb.Reset(innerSeries.Labels) once at line 379, the builder accumulates labels from each iteration without resetting. This causes labels from previous labelSets iterations to leak into subsequent iterations, creating incorrect combined label sets. The builder should be reset to innerSeries.Labels at the start of each iteration through labelSetsMap.
|
|
||
| timeRange types.QueryTimeRange | ||
| expressionPosition posrange.PositionRange | ||
| enableDelayedNameRemoval bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
| dataLabelMatchersExpr, ok := expr.Args[1].(*parser.VectorSelector) | ||
| if !ok { | ||
| return nil, fmt.Errorf("expected second argument to 'info' function to be a VectorSelector, got %T", expr.Args[1]) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to add a special case to CSE to ensure it doesn't replace the second argument to info with a deduplicated expression.
| if len(expr.Args) == 1 { | ||
| infoExpr, err := parser.ParseExpr("target_info") | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| expr.Args = append(expr.Args, infoExpr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to do this as a very early AST optimisation pass. This will allow all other optimisation passes to ignore the fact that info can have one or two arguments, and instead they can just handle the two argument case.
| ivs, ok := f.Info.(*selectors.InstantVectorSelector) | ||
| if !ok { | ||
| return nil, fmt.Errorf("info function 2nd argument is not an instant vector selector") | ||
| } | ||
| // Override float values to reflect original timestamps. | ||
| ivs.ReturnSampleTimestampsPreserveHistograms = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we require Info to be an InstantVectorSelector, then we should enforce this by changing the type of the parameter to NewInfoFunction.
| ivs, ok := f.Info.(*selectors.InstantVectorSelector) | ||
| if !ok { | ||
| return nil, fmt.Errorf("info function 2nd argument is not an instant vector selector") | ||
| } | ||
| // Override float values to reflect original timestamps. | ||
| ivs.ReturnSampleTimestampsPreserveHistograms = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than setting this here, we should set it on the vector selector node in nodeFromExpr. This will allow optimisation passes to respect that this selector needs to behave differently (eg. that it's not equivalent to another similar selector without this set).
(You'll then need to update the vector selector node's equivalence and description methods to match.)
| innerMetadata, err := f.Inner.SeriesMetadata(ctx, matchers) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer types.SeriesMetadataSlicePool.Put(&innerMetadata, f.MemoryConsumptionTracker) | ||
|
|
||
| infoMetadata, err := f.Info.SeriesMetadata(ctx, matchers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't pass matchers as-is here as they might apply to the labels from Inner or from Info, and so they could filter out all the series from the other selector.
(eg. imagine matchers contains a environment="prod" matcher, and the environment label only appears on Inner - passing this matcher to Info will cause it to incorrectly return no results)
| innerMetadata, err := f.Inner.SeriesMetadata(ctx, matchers) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer types.SeriesMetadataSlicePool.Put(&innerMetadata, f.MemoryConsumptionTracker) | ||
|
|
||
| infoMetadata, err := f.Info.SeriesMetadata(ctx, matchers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we do need to do here: we should generate some matchers for job and instance based on the series returned by Inner. Otherwise we could unnecessarily select all known target_info series during the query time range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rejig this to avoid loading all of the info series upfront, as this could be a lot of data.
It's OK for SeriesMetadata to return series that will later turn out to have no samples, so we can do something similar to what we do for binary operations.
I'm imagining something like this:
- in
SeriesMetadata:- load all series metadata from
Inner - compute the possible set of
instanceandjoblabels - load all series metadata from
Info, passing the matchers forinstanceandjob - compute the cross product of all possible inner and info series, in the same order that the inner series are produced
- load all series metadata from
- in
NextSeries:- if we don't have an active inner series:
- read the next inner series
- read all corresponding info series for this inner series, buffering any we read that will be needed for other inner series
- compute all of the output series based on the inner series and the info series (in the same order these were returned by
SeriesMetadata) - return the first output series for this inner series
- if we have an active inner series, return the next output series we computed when we first read it
- if we don't have an active inner series:
As a later improvement (not in this PR), if we can ask ingesters and store-gateways to sort returned series by arbitrary labels, then we can ask them to send series sorted by instance and job (and then by all other labels), and there should be almost no buffering required. This would also be beneficial for binary operations and aggregations - we could ask for series to be sorted by binary operation matching labels or aggregation grouping labels to reduce the amount of buffering these operators need to do.
What this PR does
Support the experimental info function in MQE.
Which issue(s) this PR fixes or relates to
Fixes https://github.com/grafana/mimir-squad/issues/3084
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]. If changelog entry is not needed, please add thechangelog-not-neededlabel to the PR.about-versioning.mdupdated with experimental features.Note
Implements info() to enrich series with labels from info metrics, adds planning/selector wiring, and comprehensive tests.
info()implemented via newInfoFunctionoperator to enrich base series with labels from matching info metrics (instance,job).info()args: defaults second arg totarget_info; enforces vector selector; auto-adds__name__="target_info"if missing; marksFUNCTION_INFOas not needing dedup.FUNCTION_INFOwithInfoFunctionOperatorFactory.ReturnSampleTimestampsPreserveHistogramstoInstantVectorSelectorto expose timestamps as floats while preserving histograms (used byinfo()).info.testcovering label matching, lookback/@/offset, churn, conflicts, and error cases.Written by Cursor Bugbot for commit 7f1aa4a. This will update automatically on new commits. Configure here.