-
Notifications
You must be signed in to change notification settings - Fork 23
fix: resolveByProviderId doesnt use getClientDataSetsWithDetails #487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
synapse-dev | e251dd5 | Commit Preview URL Branch Preview URL |
Dec 19 2025, 09:15 AM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the resolveByProviderId method to significantly reduce RPC calls and improve performance when selecting existing datasets. The change replaces the expensive getClientDataSetsWithDetails call (which makes ~5 calls per dataset) with a lightweight getClientDataSets call, followed by targeted metadata and validation checks only for datasets that match.
Key Changes
- Replaced
getClientDataSetsWithDetailswithgetClientDataSetsfor initial dataset fetching to reduce RPC overhead - Implemented batched sequential search with early exit when a matching dataset with pieces is found
- Added
dataSetIdfield togetClientDataSetsreturn value and createdgetNextPieceIdhelper method in WarmStorageService
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/synapse-sdk/src/warm-storage/service.ts | Added dataSetId field to getClientDataSets return value and introduced getNextPieceId method to support the optimized selection logic |
| packages/synapse-sdk/src/storage/context.ts | Refactored resolveByProviderId to use lightweight getClientDataSets with batched parallel metadata/validation checks and deterministic oldest-first selection with early exit |
| packages/synapse-sdk/src/test/storage.test.ts | Updated test mock to use getClientDataSets instead of clientDataSets, and simplified test data structure for better maintainability |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I ran some benchmarks with hyperfine. When there are only two datasets, the old method is usually faster for the common usecase when an account has few (2) data sets. That makes sense, because i'm trying to balance rpc call reduction and speed by not just spamming for details on everything. however, for worst-case scenarios (non-matching dataset metadata, so it checks all datasets) its roughly the same speed. for an account with a large number of datasets (dealbot, filecoin-pin-website, and potentially future users who don't follow our expectations), the changes in this PR are significantly faster with a serious reduction in RPC calls.
View actual hyperfine resultsOLD vs NEW
|
|
I quickly changed the code to run all dataset calls at once, and I'm thinking the RPC calls are still reduced enough to where we should probably just call them all at once. 2 dataSet accountcommon caseSpeed: 1.59x faster worst caseSpeed: 1.11x faster Avg resultsAverage speedup: 1.35x and then hyperfine show the speedup is consistent (as much as 10 runs and networking can show): so.. it still reduces RPC calls by almost 50%, and is also faster 639 dataSet account:common caseSpeed: 3.51x faster worst caseSpeed: 2.39x faster Avg resultsAverage speedup: 2.95x |
SgtPooki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self review
| pdpEndEpoch: Number(ds.pdpEndEpoch), | ||
| providerId: Number(ds.providerId), | ||
| dataSetId: Number(ds.dataSetId), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what guarantees do we have that these are lower than MAX_SAFE_INTEGER?
|
@SgtPooki could you rebase this or merge master please? |
b371086 to
5253489
Compare
5253489 to
a96a2ed
Compare
|
looks like i need to merge master again.. on it |
rvagg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to do my remaining critique (that I just noticed during this pass) in a separate PR because it involves a breaking change. #517, that's on top of this branch currently because it changes some code in here.
BREAKING: EnhancedDataSetInfo.nextPieceId removed, s/currentPieceCount/activePieceCount This exposes getActivePieceCount from PDPVerifier, replaces the inaccurate nextPieceId-based counting with actual active piece count, and removes nextPieceId from the data set details struct to avoid extra eth_call for a low- value piece of information.
Summary
Fix
resolveByProviderIdperformance and selection logic to match existing behavior while avoiding expensivegetClientDataSetsWithDetailscalls.Problem
Issue #435: getClientDataSetsWithDetails makes ~5 calls per dataset (3 phases). With 639 datasets that’s ~3,195+ calls just to select one.
Changes
getClientDataSetsinstead ofgetClientDataSetsWithDetailsPerformance
Benchmarks on a 639-dataset fixture; both paths pick the same dataSetId (84):
Common case (metadata matches at dataset 84):
Worst case (metadata never matches, dataSetId=-1):
Testing
See testing results and code used at https://gist.github.com/SgtPooki/987fe31d3a8991ec73490d5df385b381
Related
FilOzone/dealbot#66
#438
#485 (for metadata match handling)