fix(kv-cache): drop superseded continued snapshots on store#175
Open
unsaltedbutter-ai wants to merge 1 commit into
Open
fix(kv-cache): drop superseded continued snapshots on store#175unsaltedbutter-ai wants to merge 1 commit into
unsaltedbutter-ai wants to merge 1 commit into
Conversation
A chat request whose prompt exceeds the continued-interval threshold writes nested snapshots at every 10240-token boundary (10k, 20k, 30k, ...) plus a final cold/evict snapshot at the trim length. The intermediate continued snapshots strictly dominate each other on prefix lookup, but they all compete for the disk-cache budget. In production logs the older snapshots are picked as eviction victims within seconds of being written, ending the request with no usable cache entry. Add kv_cache_prune_supersedes, called from the store path before kv_cache_evict for cold and continued stores. It walks the index and unlinks any CONTINUED entry whose text_bytes are a strict prefix of the new snapshot (verified by recomputing SHA-1 over the new text at the older entry's byte length). Cold/evict/shutdown entries on disk are intentional checkpoints (for example the anchor cold at the chat-task boundary) and are left alone so workloads that diverge past their length can still hit them. The prune is skipped for evict and shutdown stores because those save a live state at the moment it has just diverged from an incoming request; the saved content can include post-divergence tokens that no future prompt will match, so shorter same-session continued snapshots remain strictly pre-divergence prefixes that still serve correctly. Deleting them on an evict store would replace a near-match disk hit with a much shorter one, wasting all the prefill work from earlier in the same session. Eviction's refresh then sees the pruned directory and runs against a smaller candidate set. Result: one snapshot per workload prefix instead of N nested copies. A 49k-token prefill that previously wrote 2.1 GiB of snapshots, all evicted within the request, ends with a single 666 MiB entry on disk that the next same-prompt request can hit. Cross-conversation hits on explicit cold checkpoints are preserved, and shorter pre- divergence snapshots survive evict stores so a live-miss followed by a near-match request still loads from the longest available prefix. Adds four unit tests: strict-prefix continued unlink, unrelated entry kept, self-skip, cold-checkpoint kept. test_kv_text_stub_file grows a reason parameter; both existing callers pass KV_REASON_COLD.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #174
Implements the supersession proposed in #174.
Adds kv_cache_prune_supersedes in the store path, called before kv_cache_evict for cold and continued stores (skipped for evict/shutdown stores where the saved live state may diverge from upcoming requests). Includes four unit tests.