-
Notifications
You must be signed in to change notification settings - Fork 49
feat: incorporate sds-r into reliable channels #2701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
size-limit report 📦
|
90fddd2 to
3ba50d4
Compare
3ba50d4 to
9d82564
Compare
| const log = new Logger("sdk:reliable-channel"); | ||
|
|
||
| const DEFAULT_SYNC_MIN_INTERVAL_MS = 30 * 1000; // 30 seconds | ||
| const DEFAULT_SYNC_MIN_INTERVAL_WITH_REPAIRS_MS = 10 * 1000; // 10 seconds when repairs pending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting the review so it may not hold, but I wonder if instead we should use the "multiplier" concept for this, instead of different values.
| options?.processTaskMinElapseMs ?? DEFAULT_PROCESS_TASK_MIN_ELAPSE_MS; | ||
|
|
||
| if (this._retrieve) { | ||
| this.retrievalStrategy = options?.retrievalStrategy ?? "both"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are applying default twice, which is not idea. Make the retrievalStrategy a mandatory argument here, so it can be defaulted only once in create function.
| // Only start repair sweep if SDS-R is enabled | ||
| if (this.shouldUseSdsR()) { | ||
| this.startRepairSweepLoop(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's better to put the check inside startRepairSweepLoop to be simpler and safer.
| } | ||
|
|
||
| private stopRepairSweepLoop(): void { | ||
| if (this.sweepRepairInterval) clearInterval(this.sweepRepairInterval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (this.sweepRepairInterval) clearInterval(this.sweepRepairInterval); | |
| if (this.sweepRepairInterval) { | |
| clearInterval(this.sweepRepairInterval); | |
| this.sweepInBufInterval = undefined; | |
| } |
It's apparently better.
| const baseInterval = hasPendingRepairs | ||
| ? DEFAULT_SYNC_MIN_INTERVAL_WITH_REPAIRS_MS | ||
| : this.syncMinIntervalMs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about doing a multiplier of 0.3 or 0.5 instead?
| // Store retrieval (for 'both' and 'store-only' strategies) | ||
| // SDS-R repair happens automatically via RepairManager for 'both' and 'sds-r-only' | ||
| if ( | ||
| this.shouldUseStore() && | ||
| retrievalHint && | ||
| this.missingMessageRetriever | ||
| ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary, as missingMessageRetriever should not be instantiated if this.shouldUseStore is false.
Add SDS-R into reliable channels as optional parallel or singular strategy for recovery.
There are three main design choices that need review here:
SDS-R by default functions as a parallel mechanism to store retrieval: there is no lockstep mechanism where the one retrieval strategy serves as fallback if the other fails. This seems to me the most natural way of integrating SDS-R, after some consideration. In general, a reliable channels instance will attempt store retrieval after detecting a missed dependency without delay. SDS-R will kick in with some delay (generally min 30s). If by that point the missing dependency has been received (via relay or via Store retrieval), the pending repair will be removed from the buffer and never enter an SDS-R exchange. There is however a possible race condition that can cause a missing dependency to be requested (and retrieved) from a Store node and via SDS-R. This is not optimal, but I think a reasonable tradeoff for simplicity.
Store retrieval can now be disabled completely, adding some configuration complexity i.t.o.
retrievalStrategy("store only", "sds-r only", "both", "none"). I'm not quite convinced this level of configurability is necessary, but there seems to be some use cases for a store-less reliable channel based on recent Discord discussions.SDS-R works better if the sync period is shorter, so I've added a simple adaptive strategy that triggers Sync more often if there are pending repair requests. This has an impact on bandwidth and message rates.