Skip to content

Conversation

@Shubhranshu153
Copy link
Contributor

@Shubhranshu153 Shubhranshu153 commented Sep 12, 2025

Issue #, if available:

  1. Spancache and httpcache are not cleanedup
  2. Background fetcher doesn't stop when layer is getting cleanedup

Description of changes:

  1. Httpcache is not used in the codebase, Removed it.
  2. Spancache eviction was not called as the done was a idempotent call instead of reducing the count. And we dont call remove of the cache for requesting eviction.
  3. When all references of spancache is removed then the spanmanager has a finalizer is called which releases the cache resources. This is non-deterministic and scheduled by go runtime.
  4. Background fetcher doesnt fetch spans on restarts.

TODO:

  1. Add test to verify background fetcher is stopped
  2. Add cache cleanup on immediately after eviction rather than depending on finalizer, as it doesnt work well on restart once the context is lost.
  3. Cleanup of resources on restarts
  4. Add test to verify resource cleanup

Testing performed:

  1. Verification that cache gets cleared
Before cleanup
sudo du -chx --max-depth=1 /var/lib/soci-snap
shotter-grpc/soci/
200M    /var/lib/soci-snapshotter-grpc/soci/spancache
200M    /var/lib/soci-snapshotter-grpc/soci/
200M    total
After removing of the image (before the fix this won't get cleanedup
sudo du -chx --max-depth=1 /var/lib/soci-snapshotter-grpc/soci/
4.0K    /var/lib/soci-snapshotter-grpc/soci/spancache
8.0K    /var/lib/soci-snapshotter-grpc/soci/
8.0K    total

  1. Verification that bg fetcher doesnt get called on restarts

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@Shubhranshu153 Shubhranshu153 requested a review from a team as a code owner September 12, 2025 18:01
@github-actions github-actions bot added go Pull requests that update Go code testing Unit and/or integration tests labels Sep 12, 2025
@Shubhranshu153 Shubhranshu153 force-pushed the fix-resource-cleanup branch 2 times, most recently from 1bd8d4a to 25c712f Compare September 12, 2025 21:41
@sondavidb
Copy link
Contributor

Merging this should close #951

@Shubhranshu153 Shubhranshu153 force-pushed the fix-resource-cleanup branch 2 times, most recently from 5be8401 to 00edc4f Compare September 15, 2025 16:43
Copy link
Contributor

@sondavidb sondavidb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a unit test? Not sure how useful a mock of this would be though.

@wswsmao
Copy link

wswsmao commented Nov 5, 2025

It appears the changes haven't taken effect. Testing with commit 00edc4f

Steps:

  1. Pull the image
nerdctl pull --snapshotter soci example.com/test:soci
  1. Wait for background process
Every 1.0s: du -h --max-depth=1 /var/lib/soci-snapshotter-grpc/soci/spancache                   VM-33-248-tlinux: Wed Nov  5 02:42:00 2025

33M     /var/lib/soci-snapshotter-grpc/soci/spancache/1231516239
33M     /var/lib/soci-snapshotter-grpc/soci/spancache
  1. Remove the image
nerdctl images -q | xargs nerdctl rmi -f

spancache status:

Every 1.0s: du -h --max-depth=1 /var/lib/soci-snapshotter-grpc/soci/spancache                   VM-33-248-tlinux: Wed Nov  5 02:42:16 2025

33M     /var/lib/soci-snapshotter-grpc/soci/spancache/1231516239
33M     /var/lib/soci-snapshotter-grpc/soci/spancache

@wswsmao
Copy link

wswsmao commented Nov 5, 2025

It appears the changes haven't taken effect. Testing with commit 00edc4f

Steps:

  1. Pull the image
nerdctl pull --snapshotter soci example.com/test:soci
  1. Wait for background process
Every 1.0s: du -h --max-depth=1 /var/lib/soci-snapshotter-grpc/soci/spancache                   VM-33-248-tlinux: Wed Nov  5 02:42:00 2025

33M     /var/lib/soci-snapshotter-grpc/soci/spancache/1231516239
33M     /var/lib/soci-snapshotter-grpc/soci/spancache
  1. Remove the image
nerdctl images -q | xargs nerdctl rmi -f

spancache status:

Every 1.0s: du -h --max-depth=1 /var/lib/soci-snapshotter-grpc/soci/spancache                   VM-33-248-tlinux: Wed Nov  5 02:42:16 2025

33M     /var/lib/soci-snapshotter-grpc/soci/spancache/1231516239
33M     /var/lib/soci-snapshotter-grpc/soci/spancache

maybe we can use this #1745

@Shubhranshu153
Copy link
Contributor Author

Shubhranshu153 commented Nov 5, 2025

@wswsmao

A finalizer is called by golang to clean up the span cache. It is not a deterministic call. So we need to wait for sometime to see the cache getting removed. As a result we dont immediately see a cleanup of the cache. We need to make sure all the raw pointers are released before we evict the span.

For testing with nerdctl after you do a cleanup wait for 10~20 seconds to check if the spancache is cleaned up.

Also thanks for working on it. Will review your PR.

Signed-off-by: Shubhranshu Mahapatra <[email protected]>
Signed-off-by: Shubhranshu Mahapatra <[email protected]>
@Shubhranshu153
Copy link
Contributor Author

Shubhranshu153 commented Nov 6, 2025

Pull the image

Nov 06 04:43:26 lima-finch soci-snapshotter-grpc[8070]: {"level":"debug","msg":"Spancache finalizer called. Cleaning up spancache","time":"2025-11-06T04:43:26.268460162-08:00"}

Remove the image

sudo nerdctl rmi $(sudo nerdctl image ls -q)

Finalizer call, finalizer calls are made async and handled by golang. (To see this log soci-snapshooter needs to be run in debug mode)

Nov 06 04:43:26 lima-finch soci-snapshotter-grpc[8070]: {"level":"debug","msg":"Spancache finalizer called. Cleaning up spancache","time":"2025-11-06T04:43:26.268460162-08:00"}

Cache

[shubhum@lima-finch soci-snapshotter]$ sudo du -chx --max-depth=1 /var/lib/soci-snapshotter-grpc/soci/
85M     /var/lib/soci-snapshotter-grpc/soci/spancache
85M     /var/lib/soci-snapshotter-grpc/soci/
85M     total
[shubhum@lima-finch soci-snapshotter]$ sudo du -chx --max-depth=1 /var/lib/soci-snapshotter-grpc/soci/
4.0K    /var/lib/soci-snapshotter-grpc/soci/spancache
8.0K    /var/lib/soci-snapshotter-grpc/soci/
8.0K    total
[shubhum@lima-finch soci-snapshotter]$ 

The case it doesnt handle at the moment is restarts the snapshotter. From my testings the case you are seeing may be a case when we didnt actually wait for the cache to get filled up initially during the test and hence we dont see the cleanup compared to it. @wswsmao

@Shubhranshu153
Copy link
Contributor Author

This fix has been addressed by a different PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update Go code testing Unit and/or integration tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants