Skip to content

fix(kv-cache): drop superseded continued snapshots on store#175

Open
unsaltedbutter-ai wants to merge 1 commit into
antirez:mainfrom
unsaltedbutter-ai:fix/kv-cache-prune-superseded
Open

fix(kv-cache): drop superseded continued snapshots on store#175
unsaltedbutter-ai wants to merge 1 commit into
antirez:mainfrom
unsaltedbutter-ai:fix/kv-cache-prune-superseded

Conversation

@unsaltedbutter-ai
Copy link
Copy Markdown
Contributor

Fixes #174

Implements the supersession proposed in #174.
Adds kv_cache_prune_supersedes in the store path, called before kv_cache_evict for cold and continued stores (skipped for evict/shutdown stores where the saved live state may diverge from upcoming requests). Includes four unit tests.

A chat request whose prompt exceeds the continued-interval threshold
writes nested snapshots at every 10240-token boundary (10k, 20k, 30k,
...) plus a final cold/evict snapshot at the trim length. The
intermediate continued snapshots strictly dominate each other on
prefix lookup, but they all compete for the disk-cache budget. In
production logs the older snapshots are picked as eviction victims
within seconds of being written, ending the request with no usable
cache entry.

Add kv_cache_prune_supersedes, called from the store path before
kv_cache_evict for cold and continued stores. It walks the index and
unlinks any CONTINUED entry whose text_bytes are a strict prefix of
the new snapshot (verified by recomputing SHA-1 over the new text at
the older entry's byte length). Cold/evict/shutdown entries on disk
are intentional checkpoints (for example the anchor cold at the
chat-task boundary) and are left alone so workloads that diverge
past their length can still hit them.

The prune is skipped for evict and shutdown stores because those
save a live state at the moment it has just diverged from an
incoming request; the saved content can include post-divergence
tokens that no future prompt will match, so shorter same-session
continued snapshots remain strictly pre-divergence prefixes that
still serve correctly. Deleting them on an evict store would replace
a near-match disk hit with a much shorter one, wasting all the
prefill work from earlier in the same session.

Eviction's refresh then sees the pruned directory and runs against
a smaller candidate set.

Result: one snapshot per workload prefix instead of N nested copies.
A 49k-token prefill that previously wrote 2.1 GiB of snapshots, all
evicted within the request, ends with a single 666 MiB entry on disk
that the next same-prompt request can hit. Cross-conversation hits
on explicit cold checkpoints are preserved, and shorter pre-
divergence snapshots survive evict stores so a live-miss followed by
a near-match request still loads from the longest available prefix.

Adds four unit tests: strict-prefix continued unlink, unrelated
entry kept, self-skip, cold-checkpoint kept. test_kv_text_stub_file
grows a reason parameter; both existing callers pass KV_REASON_COLD.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disk KV cache: continued checkpoints evict each other during a long prefill

1 participant