-
Notifications
You must be signed in to change notification settings - Fork 30
Handle orphaned blobs better in blob eviction logic for blob storage #1712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ag-eitilt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a couple questions, but mostly about how things work rather than why they're this way. Definitely a good catch for why the blobs were getting orphaned; let's get this in and see if there's any other leaks!
| should_sleep = candidate_count == 0; | ||
|
|
||
| should_sleep = deleted == 0; | ||
| if candidate_count == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A) Does should_sleep get used later on in this scope, and B) whether or not it does, should we use that in the if?
|
|
||
| tracing::info!(%deleted, "N blobs deleted for eviction"); | ||
| // Delete from blob storage first to ensure no untracked files are left behind | ||
| let mut successfully_deleted_blob_ids = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know Rust. Is the mut saying that the contents of the vector might change, or is it overkill and indicates that the vector container itself might be switched out?
|
|
||
| for chunk in chunked { | ||
| let thread_store = blob_stores.clone(); | ||
| let mut chunk_success_ids = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here.
|
|
||
| store.delete_key(blob.key.clone()).await.unwrap_or_else(|err| { | ||
| let blob = blob.clone(); | ||
| // Process chunk synchronously to collect successful deletions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How critical is the synchronous nature? Unless there's a contents-comparison or something which depends or order, it seems like we could still allow async deletion. Assuming Rust allows objects to be shared between threads, of course, which now that I say it seems like there's a chance it breaks the ownership module or something. Or were we paying extra complexity for no real time/compute savings?
Currently there is an issue with the RSC where blobs are not evicted properly and remain orphaned in the blob store. This can lead to disk space issues over time. To properly handle this, we should modify the blob eviction logic to delete from the blob store first before deleting from the database, so that we can handle retries if deletion fails