Skip to content

Conversation

@kian-thompson
Copy link
Contributor

@kian-thompson kian-thompson commented Dec 3, 2025

There is currently a race condition around when snapshots become available for a summarizer. During catching up, a Container may see that it has a new snapshot it should load from. If so, it will attempt to load this new snapshot and close the container. See ContainerRuntime.fetchLatestSnapshotAndMaybeClose.

For normal summarizer cases, this is fine as the summarizer will simply be recreated based on the latest snapshot. But for the on-demand flow, this may be unexpected behavior from the consumer's perspective. So, we should attempt a retry in the event that this is the reason the first on-demand attempt failed.

This was caught via e2e tests against ODSP and FRS consistently failing.

AB#50569
AB#50568

Copy link
Contributor

@MarioJGMsoft MarioJGMsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@steffenloesch steffenloesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Left a nit feedback on comments.

* Returns success/failure and an optional error for host-side handling.
*
* @legacy @alpha
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be good to mention the retry in the doc comment too.

@kian-thompson kian-thompson merged commit a92b8bd into microsoft:main Dec 4, 2025
34 checks passed
@kian-thompson kian-thompson deleted the 50569-ondemand-retry branch December 4, 2025 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants