Skip to content

Conversation

@SgtPooki
Copy link
Collaborator

@SgtPooki SgtPooki commented Dec 5, 2025

Summary

Fix resolveByProviderId performance and selection logic to match existing behavior while avoiding expensive getClientDataSetsWithDetails calls.

Problem

Issue #435: getClientDataSetsWithDetails makes ~5 calls per dataset (3 phases). With 639 datasets that’s ~3,195+ calls just to select one.

Changes

  • Use lightweight getClientDataSets instead of getClientDataSetsWithDetails
  • Fixed sort: ascending by dataSetId (oldest first) to match existing behavior
  • Sequential batched search (max(batchSize = max(50, ceil(n/3))) with early exit
  • Metadata check first, bail early if no match (avoids unnecessary pieceCount/validate calls)
  • Parallel pieceCount + validateDataSet only for metadata matches

Performance

Benchmarks on a 639-dataset fixture; both paths pick the same dataSetId (84):

Common case (metadata matches at dataset 84):

  • Old: 6,431 calls, ~13.0s
  • New: 365 calls, ~9.6s
  • Result: 1.36x faster, 94.3% fewer calls ✅

Worst case (metadata never matches, dataSetId=-1):

  • Old: 6,429 calls, 27.6s
  • New: 335 calls, 18s
  • Result: 1.5x faster, 94.8% fewer calls ✅

Testing

See testing results and code used at https://gist.github.com/SgtPooki/987fe31d3a8991ec73490d5df385b381

Related

FilOzone/dealbot#66
#438
#485 (for metadata match handling)

@github-project-automation github-project-automation bot moved this to 📌 Triage in FS Dec 5, 2025
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 5, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
synapse-dev e251dd5 Commit Preview URL

Branch Preview URL
Dec 19 2025, 09:15 AM

Base automatically changed from fix/435 to master December 5, 2025 19:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the resolveByProviderId method to significantly reduce RPC calls and improve performance when selecting existing datasets. The change replaces the expensive getClientDataSetsWithDetails call (which makes ~5 calls per dataset) with a lightweight getClientDataSets call, followed by targeted metadata and validation checks only for datasets that match.

Key Changes

  • Replaced getClientDataSetsWithDetails with getClientDataSets for initial dataset fetching to reduce RPC overhead
  • Implemented batched sequential search with early exit when a matching dataset with pieces is found
  • Added dataSetId field to getClientDataSets return value and created getNextPieceId helper method in WarmStorageService

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
packages/synapse-sdk/src/warm-storage/service.ts Added dataSetId field to getClientDataSets return value and introduced getNextPieceId method to support the optimized selection logic
packages/synapse-sdk/src/storage/context.ts Refactored resolveByProviderId to use lightweight getClientDataSets with batched parallel metadata/validation checks and deterministic oldest-first selection with early exit
packages/synapse-sdk/src/test/storage.test.ts Updated test mock to use getClientDataSets instead of clientDataSets, and simplified test data structure for better maintainability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SgtPooki
Copy link
Collaborator Author

SgtPooki commented Dec 5, 2025

I ran some benchmarks with hyperfine. When there are only two datasets, the old method is usually faster for the common usecase when an account has few (2) data sets. That makes sense, because i'm trying to balance rpc call reduction and speed by not just spamming for details on everything. however, for worst-case scenarios (non-matching dataset metadata, so it checks all datasets) its roughly the same speed.

for an account with a large number of datasets (dealbot, filecoin-pin-website, and potentially future users who don't follow our expectations), the changes in this PR are significantly faster with a serious reduction in RPC calls.

Key Scenario Old mean ± σ (s) New mean ± σ (s) Speedup (new vs old)
SUB common 2.590 ± 0.107 2.917 ± 0.349 0.89x (old ~13% faster)
SUB worst 2.497 ± 0.227 2.428 ± 0.286 1.03x (new)
MAIN common 15.583 ± 0.398 5.371 ± 1.233 2.90x (new)
MAIN worst 17.245 s ± 2.734 6.211 ± 1.554 2.78x (new)
View actual hyperfine results

OLD vs NEW common case (2 data sets):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.590 s ±  0.107 s    [User: 0.749 s, System: 0.150 s]
  Range (min … max):    2.487 s …  2.879 s    10 runs

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.917 s ±  0.349 s    [User: 0.732 s, System: 0.151 s]
  Range (min … max):    2.372 s …  3.355 s    10 runs

Summary
  PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    1.13 ± 0.14 times faster than PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

OLD vs NEW worst case (2 data sets):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.497 s ±  0.227 s    [User: 0.719 s, System: 0.155 s]
  Range (min … max):    2.289 s …  2.943 s    10 runs

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.428 s ±  0.286 s    [User: 0.715 s, System: 0.148 s]
  Range (min … max):    2.070 s …  2.747 s    10 runs

Summary
  PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    1.03 ± 0.15 times faster than PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

OLD vs NEW common case (639 data sets):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):     15.583 s ±  0.398 s    [User: 3.642 s, System: 0.615 s]
  Range (min … max):   14.991 s … 16.192 s    10 runs

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      5.371 s ±  1.233 s    [User: 1.489 s, System: 0.236 s]
  Range (min … max):    4.854 s …  8.863 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    2.90 ± 0.67 times faster than PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

OLD vs NEW worst case (639 data sets):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=true RUN_NEW=false SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):     17.245 s ±  2.734 s    [User: 3.890 s, System: 0.639 s]
  Range (min … max):   15.209 s … 22.113 s    10 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (22.113 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=false RUN_NEW=true SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      6.211 s ±  1.554 s    [User: 1.544 s, System: 0.229 s]
  Range (min … max):    5.609 s … 10.627 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=false RUN_NEW=true SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    2.78 ± 0.82 times faster than PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=true RUN_NEW=false SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

Question for @rvagg and @hugomrdias: Is rpc call reduction or latency the priority? Do we need to try to balance them? I can just promise.all the getDataSetMetadata, getNextPieceId, and validateDataSet calls to speed things up while still reducing RPC calls, but not by as much.

I think the approach I took here is worth considering, but open to feedback.

My concern is the account with 2 datasets (PRIVATE_KEY_SUB) is slower for the common use-case, and I know that wouldn't be true if I made all the calls at once, but then we're doing RPC calls unnecessarily.

@SgtPooki
Copy link
Collaborator Author

SgtPooki commented Dec 5, 2025

I quickly changed the code to run all dataset calls at once, and I'm thinking the RPC calls are still reduced enough to where we should probably just call them all at once.

2 dataSet account

common case

Speed: 1.59x faster
RPC calls: 48.3% reduction

worst case

Speed: 1.11x faster
RPC calls: 44.4% reduction

Avg results

Average speedup: 1.35x
Average RPC call reduction: 46.4%

and then hyperfine show the speedup is consistent (as much as 10 runs and networking can show):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.556 s ±  0.043 s    [User: 0.747 s, System: 0.152 s]
  Range (min … max):    2.463 s …  2.612 s    10 runs

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.315 s ±  0.313 s    [User: 0.730 s, System: 0.151 s]
  Range (min … max):    2.082 s …  2.797 s    10 runs

Summary
  PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    1.10 ± 0.15 times faster than PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

so.. it still reduces RPC calls by almost 50%, and is also faster

639 dataSet account:

common case

Speed: 3.51x faster
RPC calls: 93.0% reduction

worst case

Speed: 2.39x faster
RPC calls: 79.5% reduction

Avg results

Average speedup: 2.95x
Average RPC call reduction: 86.3%

@SgtPooki SgtPooki self-assigned this Dec 5, 2025
@SgtPooki SgtPooki requested review from hugomrdias and rvagg December 5, 2025 20:28
@SgtPooki SgtPooki marked this pull request as ready for review December 5, 2025 20:28
Copy link
Collaborator Author

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self review

Comment on lines 326 to +328
pdpEndEpoch: Number(ds.pdpEndEpoch),
providerId: Number(ds.providerId),
dataSetId: Number(ds.dataSetId),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what guarantees do we have that these are lower than MAX_SAFE_INTEGER?

@rvagg
Copy link
Collaborator

rvagg commented Dec 15, 2025

@SgtPooki could you rebase this or merge master please?

@SgtPooki SgtPooki force-pushed the fix/435-byProviderId branch from b371086 to 5253489 Compare December 15, 2025 13:53
@SgtPooki SgtPooki force-pushed the fix/435-byProviderId branch from 5253489 to a96a2ed Compare December 15, 2025 14:48
@BigLep BigLep moved this from 📌 Triage to 🔎 Awaiting review in FS Dec 17, 2025
@BigLep BigLep requested a review from rvagg December 17, 2025 07:00
@BigLep
Copy link
Contributor

BigLep commented Dec 17, 2025

@SgtPooki : I've re-requested review from @rvagg as I assume that's the next step.

(Also, there is one failing check.)

@SgtPooki
Copy link
Collaborator Author

looks like i need to merge master again.. on it

Copy link
Collaborator

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to do my remaining critique (that I just noticed during this pass) in a separate PR because it involves a breaking change. #517, that's on top of this branch currently because it changes some code in here.

SgtPooki and others added 2 commits December 18, 2025 10:37
BREAKING: EnhancedDataSetInfo.nextPieceId removed, s/currentPieceCount/activePieceCount

This exposes getActivePieceCount from PDPVerifier, replaces the inaccurate
nextPieceId-based counting with actual active piece count, and removes
nextPieceId from the data set details struct to avoid extra eth_call for a low-
value piece of information.
@rvagg
Copy link
Collaborator

rvagg commented Dec 19, 2025

oh darn, I jumped the gun on #517 and merged that into here. @SgtPooki would you mind when you land this, squashing all your commits down into one with the msg matching the title of this issue and then rebase-merge this with the 2 commits intact? some important info in that commit I merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🔎 Awaiting review

Development

Successfully merging this pull request may close these issues.

4 participants