fix: resolveByProviderId doesnt use getClientDataSetsWithDetails #487

SgtPooki · 2025-12-05T19:12:54Z

Summary

Fix resolveByProviderId performance and selection logic to match existing behavior while avoiding expensive getClientDataSetsWithDetails calls.

Problem

Issue #435: getClientDataSetsWithDetails makes ~5 calls per dataset (3 phases). With 639 datasets that’s ~3,195+ calls just to select one.

Changes

Use lightweight getClientDataSets instead of getClientDataSetsWithDetails
Fixed sort: ascending by dataSetId (oldest first) to match existing behavior
Sequential batched search (max(batchSize = max(50, ceil(n/3))) with early exit
Metadata check first, bail early if no match (avoids unnecessary pieceCount/validate calls)
Parallel pieceCount + validateDataSet only for metadata matches

Performance

Benchmarks on a 639-dataset fixture; both paths pick the same dataSetId (84):

Common case (metadata matches at dataset 84):

Old: 6,431 calls, ~13.0s
New: 365 calls, ~9.6s
Result: 1.36x faster, 94.3% fewer calls ✅

Worst case (metadata never matches, dataSetId=-1):

Old: 6,429 calls, 27.6s
New: 335 calls, 18s
Result: 1.5x faster, 94.8% fewer calls ✅

Testing

See testing results and code used at https://gist.github.com/SgtPooki/987fe31d3a8991ec73490d5df385b381

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	synapse-dev	`e251dd5`	Commit Preview URL Branch Preview URL	Dec 19 2025, 09:15 AM

Copilot

Pull request overview

This PR optimizes the resolveByProviderId method to significantly reduce RPC calls and improve performance when selecting existing datasets. The change replaces the expensive getClientDataSetsWithDetails call (which makes ~5 calls per dataset) with a lightweight getClientDataSets call, followed by targeted metadata and validation checks only for datasets that match.

Key Changes

Replaced getClientDataSetsWithDetails with getClientDataSets for initial dataset fetching to reduce RPC overhead
Implemented batched sequential search with early exit when a matching dataset with pieces is found
Added dataSetId field to getClientDataSets return value and created getNextPieceId helper method in WarmStorageService

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
packages/synapse-sdk/src/warm-storage/service.ts	Added `dataSetId` field to `getClientDataSets` return value and introduced `getNextPieceId` method to support the optimized selection logic
packages/synapse-sdk/src/storage/context.ts	Refactored `resolveByProviderId` to use lightweight `getClientDataSets` with batched parallel metadata/validation checks and deterministic oldest-first selection with early exit
packages/synapse-sdk/src/test/storage.test.ts	Updated test mock to use `getClientDataSets` instead of `clientDataSets`, and simplified test data structure for better maintainability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/synapse-sdk/src/storage/context.ts

SgtPooki · 2025-12-05T20:01:34Z

I ran some benchmarks with hyperfine. When there are only two datasets, the old method is usually faster for the common usecase when an account has few (2) data sets. That makes sense, because i'm trying to balance rpc call reduction and speed by not just spamming for details on everything. however, for worst-case scenarios (non-matching dataset metadata, so it checks all datasets) its roughly the same speed.

for an account with a large number of datasets (dealbot, filecoin-pin-website, and potentially future users who don't follow our expectations), the changes in this PR are significantly faster with a serious reduction in RPC calls.

Key	Scenario	Old mean ± σ (s)	New mean ± σ (s)	Speedup (new vs old)
SUB	common	2.590 ± 0.107	2.917 ± 0.349	0.89x (old ~13% faster)
SUB	worst	2.497 ± 0.227	2.428 ± 0.286	1.03x (new)
MAIN	common	15.583 ± 0.398	5.371 ± 1.233	2.90x (new)
MAIN	worst	17.245 s ± 2.734	6.211 ± 1.554	2.78x (new)

View actual hyperfine results

OLD vs NEW `common` case (2 data sets):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.590 s ±  0.107 s    [User: 0.749 s, System: 0.150 s]
  Range (min … max):    2.487 s …  2.879 s    10 runs

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.917 s ±  0.349 s    [User: 0.732 s, System: 0.151 s]
  Range (min … max):    2.372 s …  3.355 s    10 runs

Summary
  PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    1.13 ± 0.14 times faster than PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

OLD vs NEW `worst` case (2 data sets):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.497 s ±  0.227 s    [User: 0.719 s, System: 0.155 s]
  Range (min … max):    2.289 s …  2.943 s    10 runs

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.428 s ±  0.286 s    [User: 0.715 s, System: 0.148 s]
  Range (min … max):    2.070 s …  2.747 s    10 runs

Summary
  PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    1.03 ± 0.15 times faster than PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

OLD vs NEW `common` case (639 data sets):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):     15.583 s ±  0.398 s    [User: 3.642 s, System: 0.615 s]
  Range (min … max):   14.991 s … 16.192 s    10 runs

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      5.371 s ±  1.233 s    [User: 1.489 s, System: 0.236 s]
  Range (min … max):    4.854 s …  8.863 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    2.90 ± 0.67 times faster than PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

OLD vs NEW `worst` case (639 data sets):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=true RUN_NEW=false SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):     17.245 s ±  2.734 s    [User: 3.890 s, System: 0.639 s]
  Range (min … max):   15.209 s … 22.113 s    10 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (22.113 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=false RUN_NEW=true SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      6.211 s ±  1.554 s    [User: 1.544 s, System: 0.229 s]
  Range (min … max):    5.609 s … 10.627 s    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=false RUN_NEW=true SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    2.78 ± 0.82 times faster than PRIVATE_KEY=$PRIVATE_KEY_MAIN RUN_OLD=true RUN_NEW=false SCENARIO=worst RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

Question for @rvagg and @hugomrdias: Is rpc call reduction or latency the priority? Do we need to try to balance them? I can just promise.all the getDataSetMetadata, getNextPieceId, and validateDataSet calls to speed things up while still reducing RPC calls, but not by as much.

I think the approach I took here is worth considering, but open to feedback.

My concern is the account with 2 datasets (PRIVATE_KEY_SUB) is slower for the common use-case, and I know that wouldn't be true if I made all the calls at once, but then we're doing RPC calls unnecessarily.

SgtPooki · 2025-12-05T20:24:36Z

I quickly changed the code to run all dataset calls at once, and I'm thinking the RPC calls are still reduced enough to where we should probably just call them all at once.

2 dataSet account

common case

Speed: 1.59x faster
RPC calls: 48.3% reduction

worst case

Speed: 1.11x faster
RPC calls: 44.4% reduction

Avg results

Average speedup: 1.35x
Average RPC call reduction: 46.4%

and then hyperfine show the speedup is consistent (as much as 10 runs and networking can show):

Benchmark 1: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.556 s ±  0.043 s    [User: 0.747 s, System: 0.152 s]
  Range (min … max):    2.463 s …  2.612 s    10 runs

Benchmark 2: PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js
  Time (mean ± σ):      2.315 s ±  0.313 s    [User: 0.730 s, System: 0.151 s]
  Range (min … max):    2.082 s …  2.797 s    10 runs

Summary
  PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=false RUN_NEW=true SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js ran
    1.10 ± 0.15 times faster than PRIVATE_KEY=$PRIVATE_KEY_SUB RUN_OLD=true RUN_NEW=false SCENARIO=common RPC_TYPE=wss PROVIDER_ID=2 node scripts/benchmark-provider-resolve.js

so.. it still reduces RPC calls by almost 50%, and is also faster

639 dataSet account:

common case

Speed: 3.51x faster
RPC calls: 93.0% reduction

worst case

Speed: 2.39x faster
RPC calls: 79.5% reduction

Avg results

Average speedup: 2.95x
Average RPC call reduction: 86.3%

SgtPooki

self review

packages/synapse-sdk/src/warm-storage/service.ts

SgtPooki · 2025-12-05T20:31:44Z

packages/synapse-sdk/src/warm-storage/service.ts

        pdpEndEpoch: Number(ds.pdpEndEpoch),
        providerId: Number(ds.providerId),
+        dataSetId: Number(ds.dataSetId),


what guarantees do we have that these are lower than MAX_SAFE_INTEGER?

packages/synapse-sdk/src/storage/context.ts

rvagg · 2025-12-15T10:11:19Z

@SgtPooki could you rebase this or merge master please?

BigLep · 2025-12-17T07:00:59Z

@SgtPooki : I've re-requested review from @rvagg as I assume that's the next step.

(Also, there is one failing check.)

SgtPooki · 2025-12-17T18:12:03Z

looks like i need to merge master again.. on it

rvagg

I decided to do my remaining critique (that I just noticed during this pass) in a separate PR because it involves a breaking change. #517, that's on top of this branch currently because it changes some code in here.

BREAKING: EnhancedDataSetInfo.nextPieceId removed, s/currentPieceCount/activePieceCount This exposes getActivePieceCount from PDPVerifier, replaces the inaccurate nextPieceId-based counting with actual active piece count, and removes nextPieceId from the data set details struct to avoid extra eth_call for a low- value piece of information.

rvagg · 2025-12-19T09:13:57Z

oh darn, I jumped the gun on #517 and merged that into here. @SgtPooki would you mind when you land this, squashing all your commits down into one with the msg matching the title of this issue and then rebase-merge this with the 2 commits intact? some important info in that commit I merged.

SgtPooki added 5 commits November 13, 2025 14:58

fix: createStorageContext without getClientDataSetsWithDetails

b14e663

Merge branch 'master' into fix/435

e23097b

fix: cleanup resolveByDataSetId

db23829

chore: remove createContextFromDataSetId

0a3d8ec

fix: resolveByProviderId doesnt use getClientDataSetsWithDetails

e68a1d5

github-project-automation bot added this to FS Dec 5, 2025

github-project-automation bot moved this to 📌 Triage in FS Dec 5, 2025

Base automatically changed from fix/435 to master December 5, 2025 19:17

Merge branch 'master' into fix/435-byProviderId

d1e6434

SgtPooki requested a review from Copilot December 5, 2025 19:17

Copilot started reviewing on behalf of SgtPooki December 5, 2025 19:18 View session

Copilot AI reviewed Dec 5, 2025

View reviewed changes

packages/synapse-sdk/src/storage/context.ts Outdated Show resolved Hide resolved

packages/synapse-sdk/src/storage/context.ts Outdated Show resolved Hide resolved

SgtPooki added 2 commits December 5, 2025 15:25

fix: resolveByProviderId is faster

ca7e5f3

fix: promise.all generic

24cd563

SgtPooki self-assigned this Dec 5, 2025

SgtPooki requested review from hugomrdias and rvagg December 5, 2025 20:28

SgtPooki marked this pull request as ready for review December 5, 2025 20:28

SgtPooki commented Dec 5, 2025

View reviewed changes

chore: address self-comments on pr

b56f566

SgtPooki mentioned this pull request Dec 5, 2025

Don't rely (just) on cdnRailId for withCDN check #184

Closed

rvagg reviewed Dec 8, 2025

View reviewed changes

packages/synapse-sdk/src/storage/context.ts Outdated Show resolved Hide resolved

SgtPooki mentioned this pull request Dec 9, 2025

All dealbot deals are failing with Timeout Errors During Deal Creation FilOzone/dealbot#66

Closed

SgtPooki force-pushed the fix/435-byProviderId branch from b371086 to 5253489 Compare December 15, 2025 13:53

Merge branch 'master' into fix/435-byProviderId

a96a2ed

SgtPooki force-pushed the fix/435-byProviderId branch from 5253489 to a96a2ed Compare December 15, 2025 14:48

fix: cap batch sizes

c3ca53d

BigLep moved this from 📌 Triage to 🔎 Awaiting review in FS Dec 17, 2025

BigLep requested a review from rvagg December 17, 2025 07:00

Merge branch 'master' into fix/435-byProviderId

7ff064f

rvagg mentioned this pull request Dec 18, 2025

feat!: use activePieceCount for accurate piece tracking #517

Merged

rvagg approved these changes Dec 18, 2025

View reviewed changes

SgtPooki and others added 2 commits December 18, 2025 10:37

Merge branch 'master' into fix/435-byProviderId

16e24de

fix: resolveByProviderId doesnt use getClientDataSetsWithDetails #487

Are you sure you want to change the base?

fix: resolveByProviderId doesnt use getClientDataSetsWithDetails #487

Uh oh!

Conversation

SgtPooki commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Performance

Testing

Related

Uh oh!

cloudflare-workers-and-pages bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

SgtPooki commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OLD vs NEW common case (2 data sets):

OLD vs NEW worst case (2 data sets):

OLD vs NEW common case (639 data sets):

OLD vs NEW worst case (639 data sets):

Uh oh!

SgtPooki commented Dec 5, 2025

2 dataSet account

common case

worst case

Avg results

639 dataSet account:

common case

worst case

Avg results

Uh oh!

SgtPooki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SgtPooki Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rvagg commented Dec 15, 2025

Uh oh!

BigLep commented Dec 17, 2025

Uh oh!

SgtPooki commented Dec 17, 2025

Uh oh!

rvagg left a comment

Choose a reason for hiding this comment

Uh oh!

rvagg commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SgtPooki commented Dec 5, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Dec 5, 2025 •

edited

Loading

SgtPooki commented Dec 5, 2025 •

edited

Loading

OLD vs NEW `common` case (2 data sets):

OLD vs NEW `worst` case (2 data sets):

OLD vs NEW `common` case (639 data sets):

OLD vs NEW `worst` case (639 data sets):