fix: prevent compactor from deleting blocks on transient upload failures #8554
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
When block upload fails mid-operation with a transient error (e.g., S3 500 Internal Server Error):
BestEffortCleanAbortedPartialUploads()deletes it permanently 💥Root Cause:
upload()function inpkg/block/block.gohad a comment saying "It makes sure cleanup is done on error to avoid partial block uploads" but NO cleanup code actually existed.Solution
Added defer-based cleanup logic that:
uploadStartedflagDelete()to clean partial block from object storageChanges
File:
pkg/block/block.goupload()function signature to use named return(err error)uploadStartedflag trackingDelete()with isolated contextFile:
pkg/block/block_test.goTestUploadCleanupexpectations:Testing
TestUploadCleanup- verifies partial block cleanup on failurepkg/blocktests passingImpact
Related Issues
Fixes #8548