Skip to content

Conversation

@mifu67
Copy link
Contributor

@mifu67 mifu67 commented Oct 20, 2025

When a detector is deleted, we need to delete the rule data in Seer. Similar implementation to the legacy alert rule code, but uses detector ID instead of alert rule ID.

Note that because we are no longer able to use the post delete signal (the QuerySubscription model is deleted by the time we get to post delete so we cannot access the source_id needed) we cannot ensure Seer knows to delete data on their end during a cascade deletion - however Seer will delete data that doesn't receive an update for 90 days so it will eventually be cleaned up.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 20, 2025
@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 81.57895% with 7 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/seer/anomaly_detection/delete_rule.py 77.27% 5 Missing ⚠️
src/sentry/incidents/logic.py 81.81% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #101843      +/-   ##
===========================================
+ Coverage   76.26%    80.67%   +4.40%     
===========================================
  Files        9200      9206       +6     
  Lines      393001    393451     +450     
  Branches    25000     25000              
===========================================
+ Hits       299740    317413   +17673     
+ Misses      92836     75613   -17223     
  Partials      425       425              

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@saponifi3d saponifi3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there's a bit of code we can cleanup before merging, generally the setup for the hooks etc all looks good though!

return True


def delete_rule_in_seer_legacy(alert_rule: "AlertRule") -> bool:
Copy link
Contributor

@saponifi3d saponifi3d Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this method just invoke return delete_rule_in_seer(alert_rule.id)? (that way we can reduce code replication)

and

Suggested change
def delete_rule_in_seer_legacy(alert_rule: "AlertRule") -> bool:
def delete_rule_in_seer_legacy(alert_rule: AlertRule) -> bool:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two methods are slightly different in what they pass to Seer. I think we've done this code duplication for all anomaly detection methods so we can easily delete the legacy code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to maintain this code in the future, it might be a cool time to decompose the method and reuse what we can. That would also mean we don't need to maintain two code paths while we're waiting to remove the legacy code. Instead we cold take the opportunity to make this a bit easier to manage, and a lot easier for us to debug any issues in the mean time.

The way i tend to decompose things with two examples is to just look at those differences and try to determine where it would make the most sense to expose new methods. For example, maybe we could wrap the send to seer and handle errors as a method, then delete_rule_in_seer_legacy could compose those shared methods and make any tweaks it might need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we can delete the legacy code outright if every alert rule has a detector. The call only needs to happen once.
I see your point about decomp, will look into it.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@saponifi3d saponifi3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think biggest feedback would be to make delete_rule_in_seer a little more debug friendly; we can either decompose the method and share it between alert rule / detector deletion or we can update the logs so we can easily differentiate (i think that'd just be changing rule to detector).

return True


def delete_rule_in_seer_legacy(alert_rule: "AlertRule") -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to maintain this code in the future, it might be a cool time to decompose the method and reuse what we can. That would also mean we don't need to maintain two code paths while we're waiting to remove the legacy code. Instead we cold take the opportunity to make this a bit easier to manage, and a lot easier for us to debug any issues in the mean time.

The way i tend to decompose things with two examples is to just look at those differences and try to determine where it would make the most sense to expose new methods. For example, maybe we could wrap the send to seer and handle errors as a method, then delete_rule_in_seer_legacy could compose those shared methods and make any tweaks it might need.

@mifu67
Copy link
Contributor Author

mifu67 commented Nov 6, 2025

@saponifi3d removed the legacy code. Let me know if you think we should still break down the error handling into a separate method.

extra={
"rule_id": alert_rule.id,
},
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When are we still hitting this code path? is it only for people using the legacy API? If so, don't we need to call this on delete too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the delete code in the detector lifecycle hook, so any time a detector is deleted the code will be called. If all alert rules are dual written, then we don't need the call in multiple places.
I didn't hook deletion into updates, however, so this is for users of the legacy API. Good callout that I should create an update hook as well.

)

@classmethod
def delete_data_in_seer(cls, instance: AlertRule, **kwargs: Any) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now require the source id (the query subscription id) instead of the rule id we can't execute this in post delete because the query subscription is deleted by this point. Instead we call it in logic.py's delete_alert_rule method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post_delete is used for cascade deletions (like when a project gets deleted), during which the code in logic.py will not be executed 😭 I don't think we can take this out of here

Copy link
Contributor Author

@mifu67 mifu67 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't matter if Seer has a cleanup task, though EDIT: since we're overriding the detector delete method, we don't need anything for the alert rule at all, so this is good to remove.


incidents = Incident.objects.filter(alert_rule=alert_rule)
if incidents.exists():
if alert_rule.detection_type == AlertRuleDetectionType.DYNAMIC:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're now deleting the rule in Seer regardless of whether an incident exists or not - previously we had it behind that condition because we were deleting it in the post delete signal and due to the order of deletions there would still be incidents after the rule was deleted so this was the only way we could make sure a dynamic alert rule with incidents sent the data to Seer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're overriding the delete() method on the detector, I don't think we need any logic to do it for the alert rule (given that every alert rule has a corresponding detector). Let's take this out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have it here for the legacy API use case - if they hit this directly, it won't hit the detector validator code path right? Worst case Seer gets 2 requests to delete the same data, but the 2nd one would just fail because it's already gone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see that we changed the validator and not the model itself, in which case this should stay.

},
)
return

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The delete_data_in_seer_for_detector() function incompletely deletes data for detectors with multiple data sources.
Severity: CRITICAL | Confidence: 1.00

🔍 Detailed Analysis

If a detector has multiple associated data sources, the delete_data_in_seer_for_detector() function will only clean up the first one from Seer due to its use of .first(). The remaining data sources will have orphaned data in Seer, causing data bloat and inconsistent state where some detector data remains after deletion, despite the operation appearing to succeed.

💡 Suggested Fix

Modify delete_data_in_seer_for_detector() to iterate over all DataSourceDetector objects associated with the detector, rather than using .first(), to ensure all related data in Seer is properly cleaned up.

🤖 Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/seer/anomaly_detection/delete_rule.py#L40

Potential issue: If a detector has multiple associated data sources, the
`delete_data_in_seer_for_detector()` function will only clean up the first one from Seer
due to its use of `.first()`. The remaining data sources will have orphaned data in
Seer, causing data bloat and inconsistent state where some detector data remains after
deletion, despite the operation appearing to succeed.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no detectors w/ multiple data sources

Delete accompanying data in Seer for anomaly detection rules
"""
try:
source_id = QuerySubscription.objects.get(snuba_query_id=snuba_query.id).id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Handle Multiple Subscriptions Gracefully

QuerySubscription.objects.get(snuba_query_id=snuba_query.id) will raise MultipleObjectsReturned when an alert rule has multiple projects, since each project has its own subscription pointing to the same SnubaQuery. The function should either use .first() instead of .get(), or iterate through all subscriptions if multiple Seer deletions are needed.

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no multi-project alert rules/detectors exist

@mifu67
Copy link
Contributor Author

mifu67 commented Nov 12, 2025

Note for approver (I cannot approve since I started the PR): without the post-delete, we now have no way to clean up Seer data when a dynamic detector/alert rule is cascade deleted (such as during project or organization deletion). Seer has a cleanup task to delete data on their end for all alerts that haven't sent updates in 90 days, so it's not the end of the world, but I wanted to call out that we necessarily must lose some functionality here due to the switch to data source for the Seer key.

if status is None or status is not True:
logger.error(
"Request to delete alert rule from Seer was unsuccessful",
extra=extra_data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When status is not True, Seer includes a message indicating what failed. It will be worth logging that here.

Copy link
Member

@ram-senth ram-senth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the seer interaction part and have one comment about logging failure. I do not have full context to review the other aspects like retrieving the alert details.

@ceorourke ceorourke merged commit 62b1f73 into master Nov 13, 2025
66 checks passed
@ceorourke ceorourke deleted the mifu67/aci/delete-data-in-seer branch November 13, 2025 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants