fix: prevent Firo sync stall at ~65% by adding timeouts and retry logic#1296
Draft
reubenyap wants to merge 1 commit intocypherstack:stagingfrom
Draft
fix: prevent Firo sync stall at ~65% by adding timeouts and retry logic#1296reubenyap wants to merge 1 commit intocypherstack:stagingfrom
reubenyap wants to merge 1 commit intocypherstack:stagingfrom
Conversation
d66d2c2 to
3d943be
Compare
Two root issues fixed: 1. Sync stall at ~65% requiring force-close: When any sub-operation during _refresh() hangs, refreshMutex is held forever. Added a 5-minute master timeout that guarantees mutex release. Fixed progress to fire 0.65 after updateUTXOs() completes. 2. Anonymity set downloads restart from scratch on interruption: All sectors were accumulated in memory and written to SQLite only after the entire download completed. Now each sector is persisted immediately. On resume, only remaining sectors are fetched. Key design decisions verified against firod source (firoorg/firo@ccaf130): - API uses absolute indices (0 = newest coin, counting backwards) - blockHash pins the iteration start point (stable indices) - Same-block resume: offset indices by prevSize to skip saved coins - Cross-block resume: use `complete` flag on SparkSet to detect whether the previous download finished. Complete → use delta. Partial → full re-download (indices shifted, gap is unavoidable). - INSERT OR IGNORE handles crash-recovery and cross-block overlap - Progress uses consistent (indexOffset + fetched, meta.size) - All groupIds processed every sync (removed skip optimization) - Removed old all-or-nothing writer (dead code) - Schema: added `complete` column + UNIQUE index on SparkSetCoins https://claude.ai/code/session_01GF78pBWxrpN9rfsLEEwbMR
3d943be to
9248798
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Firo wallet sync frequently stalls at 65% and requires force-closing the app to recover. This PR fixes both symptoms with two targeted changes to
_refresh()inwallet.dart.Root cause
When any sub-operation during sync hangs (network issue, unresponsive server, OS suspending sockets),
refreshMutexis never released because thecatch/finallyblocks only run when the future completes or throws — a permanently pending future does neither. All future sync attempts bail out immediately at theif (refreshMutex.isLocked) returncheck (line 620), so the wallet can never sync again until force-closed.The 65% number specifically comes from progress being fired to 0.65 before
updateUTXOs()is awaited, making it appear stuck at that number while the real work is still happening.Changes
1. Master timeout on refresh (prevents permanent mutex lock)
Wrap the
_refresh()body in a 5-minute timeout. If any sub-operation hangs,TimeoutExceptionis caught by the existingcatchblock →completer.completeError()→finallyreleasesrefreshMutex. The next periodic sync (every 150s) gets a fresh attempt.This is an intentionally blunt safety net — not a per-call timeout. Connection-level stall detection is handled by the electrum adapter's existing
connectionTimeoutandaliveTimerDuration(both 60s). The master timeout only fires when something slips through those layers.2. Fix misleading progress reporting
Before:
0.6 → 0.65 → [wait UTXOs] → 0.70 → [wait txns]After:
0.6 → [wait UTXOs] → 0.65 → [wait txns] → 0.70Progress now reflects actual completion rather than jumping ahead.
Test plan
wallet.dartis changed