Skip to content

gc: shared delete budget + pagination-robust prefix discovery#58

Merged
jaredLunde merged 1 commit into
mainfrom
jared/gc-bugs
May 25, 2026
Merged

gc: shared delete budget + pagination-robust prefix discovery#58
jaredLunde merged 1 commit into
mainfrom
jared/gc-bugs

Conversation

@jaredLunde
Copy link
Copy Markdown
Contributor

Summary

Two independent bugs in the S3 GC path, each with a proof-by-regression-test:

  • discover_s3_prefixes truncated at the first list_with_delimiter page. The ObjectStore trait makes no pagination guarantee for list_with_delimiter — it's per-impl. A non-paginating backend silently dropped every export alphabetically past the first page. Switched to a list / list_with_offset walk that jumps past each export's subtree by setting the offset to {name}0 ('/' = 0x2F, '0' = 0x30). O(exports) round-trips on real S3 (start-after is server-side), correct by trait contract on any backend.
  • per_prefix_budget = max_deletes / prefixes.len() collapsed to zero. When prefixes outnumber the global delete cap (e.g. 100K+ exports with default max_deletes), GC found dead packs and deleted nothing. Replaced with a single Arc<AtomicUsize> claimed across all prefix tasks via try_claim_delete_slot. Side benefit: budget never gets stranded on prefixes with nothing to delete.

Proofs

Both regression tests were run against the original buggy code to confirm the bug, then against the fix to confirm it passes:

  • test_gc_shared_budget_caps_total_deletes_across_prefixes — 3 prefixes × 1 dead pack, max_deletes=2. Old division: 2/3 = 0 → 0 deletes. New shared budget: 2 deletes, 1 survivor.
  • test_discover_s3_prefixes_robust_to_non_paginating_backend — uses a new TruncatingDelimiterStore that caps list_with_delimiter at 2 entries with no continuation. Old code: 2 of 5 prefixes returned. New code: all 5.
  • test_discover_s3_prefixes_many_exports — sanity check against InMemory with 1500 exports.

Test plan

  • cargo test --lib -p glidefs --features test-utils cli::gc — 27/27 passing
  • Each new test exercised against the original buggy code path and verified failing

🤖 Generated with Claude Code

Two independent bugs in the S3 GC path:

* discover_s3_prefixes called list_with_delimiter once and relied on the
  backend to paginate. The trait makes no such guarantee; any non-paginating
  impl would silently truncate at the first page and skip exports
  alphabetically past it. Switch to a list / list_with_offset walk that
  jumps past each export's subtree (start-after on real S3), so it's
  O(exports) round-trips and correct by trait contract.

* run_gc divided max_deletes evenly across all discovered prefixes
  (per_prefix_budget = max_deletes / prefixes.len()). With more prefixes
  than the budget allows, this collapsed to zero and GC deleted nothing
  despite finding dead packs. Replace with a single Arc<AtomicUsize>
  claimed across all prefix tasks via try_claim_delete_slot.

Both bugs are covered by regression tests, including a TruncatingDelimiterStore
wrapper that proves robustness to non-paginating backends.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jaredLunde jaredLunde merged commit ad2bb39 into main May 25, 2026
21 checks passed
@jaredLunde jaredLunde deleted the jared/gc-bugs branch May 25, 2026 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant