Skip to content

Conversation

@ChengyuZhu6
Copy link
Member

@ChengyuZhu6 ChengyuZhu6 commented Jan 24, 2025

Propose the implementation of a hardlink feature in the caching mechanism to optimize memory usage, improve performance and save disk space.

Fixes: #1953

@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 3 times, most recently from ff5ecf3 to 3e921b6 Compare January 24, 2025 02:44
@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 2 times, most recently from b879c95 to 457b45b Compare February 5, 2025 11:17
@ChengyuZhu6 ChengyuZhu6 requested a review from ktock February 5, 2025 11:18
@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 2 times, most recently from 8829e66 to 92702af Compare February 10, 2025 06:20
@AkihiroSuda
Copy link
Member

Needs rebase

@ChengyuZhu6
Copy link
Member Author

Needs rebase

Done.

@ChengyuZhu6
Copy link
Member Author

Just retrigger ci, no code change.

@ChengyuZhu6
Copy link
Member Author

@ktock I conducted experiments with several basic images, converting them to the estargz format and running them in containers with a simple 'echo "hello"' command. These tests used only background threads of stargz to pull images to the local machine. By measuring the overall memory and disk usage, I observed that implementing hardlinks resulted in a 20-30% reduction in both memory consumption and disk space requirements.

image image

@ChengyuZhu6
Copy link
Member Author

When working with different versions of an image, implementing hardlinks can achieve a memory and disk deduplication effect of nearly 50%. I verified this by conducting tests with various development versions of Golang images.

image

@ChengyuZhu6
Copy link
Member Author

kindly ping @ktock

@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 3 times, most recently from efbcc75 to 01b70ae Compare June 4, 2025 09:55
@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 3 times, most recently from 1d0858b to 756bb86 Compare June 24, 2025 03:19
@ChengyuZhu6
Copy link
Member Author

cc @ktock

@JulienBalestra
Copy link

We're very interested by this PR, is there any plan for review @ktock ?

@AkihiroSuda
Copy link
Member

Needs rebase

}

// ChunkDigest option allows specifying a chunk digest for the cache
func ChunkDigest(digest string) Option {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use digest.Digest

@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 4 times, most recently from 9669477 to 349ad77 Compare July 28, 2025 03:21
@ChengyuZhu6
Copy link
Member Author

@ktock

@ktock
Copy link
Member

ktock commented Jul 28, 2025

CI failure in Optimize will be fixed in #2094.

@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 5 times, most recently from 170ded5 to d62a0aa Compare August 15, 2025 08:12
@ChengyuZhu6 ChengyuZhu6 requested a review from ktock August 15, 2025 08:12
@ChengyuZhu6 ChengyuZhu6 force-pushed the hardlink branch 6 times, most recently from e85935b to f39d6a2 Compare August 15, 2025 09:23
@ChengyuZhu6
Copy link
Member Author

ping @ktock

Implement hardlink management for Stargz Snapshotter cache to reduce
storage usage by deduplicating identical content chunks.

Implementation:
- Store canonical files in {root}/hardlinks/ directory
- Use nlink-based garbage collection (remove when nlink == 1)
- Track chunk digests with LRU cache for efficient lookups
- Create hardlinks from canonical files to cache locations

Configuration:
- Add HardlinkRoot option for hardlink storage directory

API:
- Enroll(): Register canonical files by digest
- Get(): Retrieve canonical file paths
- Add(): Create hardlinks to target locations

Signed-off-by: ChengyuZhu6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Hardlink Feature for Cache Optimization and Data Deduplication

5 participants