fix(semantic): memory queue stall + permanent fs-error classification#1531
Open
ZaynJarvis wants to merge 8 commits intomainfrom
Open
fix(semantic): memory queue stall + permanent fs-error classification#1531ZaynJarvis wants to merge 8 commits intomainfrom
ZaynJarvis wants to merge 8 commits intomainfrom
Conversation
_process_memory_directory() had early return paths that could bypass report_success()/report_error() in on_dequeue(), leaving the queue's in_progress counter permanently stuck. This caused the semantic queue to appear stalled with pending items never being processed. All code paths now properly propagate to the completion callbacks. Fixes #864.
…inite retry Address review feedback: filesystem errors (FileNotFoundError, PermissionError, IsADirectoryError, NotADirectoryError) are now classified as permanent by classify_api_error(), so they hit report_error() instead of being infinitely re-enqueued. Tests updated to exercise real classifier behavior without mocking.
DequeueHandlerBase.set_callbacks now takes (on_success, on_requeue, on_error); the original PR #951 test harness called it with only (on_success, on_error). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
|
Failed to generate code suggestions for PR |
Collaborator
Author
|
pending build fix. |
_mark_failed's two call sites were removed when _process_memory_directory started raising on error. The closure itself was left behind. Delete it — telemetry failure is now reported by on_dequeue's exception handler via get_request_wait_tracker().mark_semantic_failed(). Add test_memory_ls_transient_error_requeues to cover the transient branch of the memory path: a 500-class error from ls() must route through _reenqueue_semantic_msg() and fire report_requeue() + report_success(), not report_error(). The previous tests only exercised permanent errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rebased-on-main cherry-pick of #951 (by @deepakdevp) with minor adjustments to land on current
main. Original commits preserved viagit cherry-pickto keep the contributor's authorship intact.46a6961—_process_memory_directory()raises on ls/write failures instead of swallowing them, soon_dequeue()can always route toreport_success()/report_error()and the in-progress counter no longer gets stuck. Fixes Memory semantic queue stalls on context_type=memory jobs; pending backlog grows while processed stays at 0 #864.c252078— filesystem errors (FileNotFoundError,PermissionError,IsADirectoryError,NotADirectoryError) are now classified aspermanentso they hitreport_error()instead of being re-enqueued forever.8581232— test harness fixup:DequeueHandlerBase.set_callbacksnow takes(on_success, on_requeue, on_error); the original PR's tests called it with two args.Conflict resolution notes
classify_api_errormoved fromopenviking/utils/circuit_breaker.pytoopenviking/utils/model_retry.pyafter PR fix(semantic): ensure memory processing always reports completion status #951 was opened. The_PERMANENT_IO_ERRORSisinstance check was applied tomodel_retry.pyaccordingly._process_memory_directory()now releaselifecycle_lock_handle_idbefore re-raising, matching the new lock-ownership model onmain.Test plan
uv run pytest tests/utils/test_circuit_breaker.py tests/storage/test_memory_semantic_stall.py— 21 passed🤖 Generated with Claude Code