Skip to content

Conversation

@roaga
Copy link
Member

@roaga roaga commented Nov 22, 2025

Fixes bug in the spans attributes RPC where built-in fields like span.op and span.description were not included sometimes. This change simplifies the code for us and matches the attributes available on the frontend exactly.

This also serves as an RPC for logs and metrics attributes, just by passing in a different itemType.

Closes AIML-1635, AIML-1670, and AIML-1672

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 22, 2025
@roaga roaga marked this pull request as ready for review November 24, 2025 13:00
@roaga roaga requested review from a team as code owners November 24, 2025 13:00
@roaga roaga requested review from aayush-se and aliu39 November 24, 2025 13:00
@codecov
Copy link

codecov bot commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/seer/assisted_query/traces_tools.py 93.75% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #103874      +/-   ##
===========================================
- Coverage   80.58%    80.42%   -0.16%     
===========================================
  Files        9292      9299       +7     
  Lines      396718    398110    +1392     
  Branches    25286     25286              
===========================================
+ Hits       319681    320190     +509     
- Misses      76577     77460     +883     
  Partials      460       460              

"attributeType": attr_type,
"itemType": item_type,
"statsPeriod": stats_period,
"project": project_ids,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for errors and generally all rpcs I'm treating empty project_ids as all projects, should we do this here?

project_ids = project_ids or [ALL_ACCESS_PROJECTS]

params=query_params,
)

fields[attr_type] = [item["name"] for item in resp.data if "name" in item]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fields[attr_type] = [item["name"] for item in resp.data if "name" in item]
fields[attr_type] = [item["name"] for item in resp.data]

so this doesn't fail silently


query_params = {
"itemType": item_type,
"attributeType": "string",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there can be attrs w other types right? should we pass in the field's type, maybe return it in the _names rpc?

Copy link
Member Author

@roaga roaga Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old RPC this is replacing only supported strings, which makes sense because this is about searching values by substring

didn't want to make any breaking changes in this PR either way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if we reuse this for metrics/logs (I think we should!) can do a refactor then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already re-usable by changing itemType, that's the intention, and the same thing applies where we can only do substring search on string fields

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as more context, it only makes sense to search for values of string types as booleans are also stored as strings and numerical search (int/float/datetimes) I believe aren't supported because of insane cardinality which is why even in sentry search it defaults to a few suggested values.

values.setdefault(field, set()).update(field_values_list[:limit])

# Convert sets to sorted lists for JSON serialization
return {"values": {field: sorted(field_values) for field, field_values in values.items()}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem like field_values is converted here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorted converts any iterable into a list

return {"consent": False, "consent_url": consent_url}


def get_attribute_names(*, org_id: int, project_ids: list[int], stats_period: str) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we'd need to clean these up in a follow up, to avoid breaking current query agent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR (at least what I intended, double check if me if wrong) just moves those functions into a new file and keeps the same input and output signatures, and I just change the internal logic, so nothing should break

}
"""
organization = Organization.objects.get(id=org_id)
api_key = ApiKey(organization_id=org_id, scope_list=API_KEY_SCOPES)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: client.get() uses user=None, leading to AnonymousUser() and permission failures for organization access.
Severity: CRITICAL | Confidence: High

🔍 Detailed Analysis

The client.get() method is invoked with user=None, which causes src/sentry/api/client.py to default to AnonymousUser(). This AnonymousUser() lacks organization membership, leading to permission checks failing within OrganizationEventsV2EndpointBase (specifically in get_snuba_params()) when the endpoint /organizations/{organization.slug}/trace-items/attributes/ is called. Consequently, the API calls will consistently fail with permission errors, rendering the feature non-functional.

💡 Suggested Fix

Provide an authenticated user with organization permissions to the user parameter of client.get().

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/seer/assisted_query/traces_tools.py#L39

Potential issue: The `client.get()` method is invoked with `user=None`, which causes
`src/sentry/api/client.py` to default to `AnonymousUser()`. This `AnonymousUser()` lacks
organization membership, leading to permission checks failing within
`OrganizationEventsV2EndpointBase` (specifically in `get_snuba_params()`) when the
endpoint `/organizations/{organization.slug}/trace-items/attributes/` is called.
Consequently, the API calls will consistently fail with permission errors, rendering the
feature non-functional.

Did we get this right? 👍 / 👎 to inform future reviews.
Reference ID: 3254778

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what we do everywhere with client.get

# Extract "value" from each item, filter out None/empty, and respect limit
field_values_list = [item["value"] for item in resp.data if item.get("value")]
# Merge with existing values if field already exists (multiple substrings for same field)
values.setdefault(field, set()).update(field_values_list[:limit])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Redundant limiting causes inefficient memory usage

The function applies [:limit] twice: once when adding values to the set (line 127) and again when returning the sorted result (line 131). When the same field is queried with multiple substrings, the set accumulates up to limit values from each query, potentially growing to several times the limit before being trimmed again. This wastes memory and processing by storing and sorting unnecessary values. The slicing on line 127 should be removed to collect all API responses, then apply the limit once on the final sorted output.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi limit lets us balance the results between the two fields somewhat which i kinda like

@roaga roaga merged commit 83b90b4 into master Nov 24, 2025
67 checks passed
@roaga roaga deleted the assisted-query/spans-attrs-rpc-fix branch November 24, 2025 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants