improve(dataworker): treat SVM refund-leaf already-claimed races as benign#3423
Open
droplet-rl wants to merge 1 commit into
Conversation
…enign `Dataworker._executeRelayerRefundLeafSvm` was rethrowing every catch — including the case where the on-chain `is_claimed` check at `programs/svm-spoke/src/instructions/bundle.rs` returned `CommonError::ClaimedMerkleLeaf` because another actor (concurrent dataworker, manual execution) landed the leaf between our pre-flight `getRelayerRefundExecutions()` filter and the simulate-and-send. The race window is small but real: cluster of failures on 2026-05-23 and one today were all consistent with it, surfacing as `Process exited with error 🚨`. EVM parity already exists via `MultiCallerClient.knownRevertReasons` (`"ClaimedMerkleLeaf"`, `"RelayFilled"`). SVM lacked an analog. Changes: - New `isSvmLeafAlreadyClaimedError(err)` helper in `LogUtils.ts`. Matches by the Anchor program-log line `Error Code: ClaimedMerkleLeaf` rather than by numeric code, because the SVM SpokePool declares two `#[error_code]` enums (`CommonError`, `SvmError`) that both default-start at offset 6000 and collide numerically — the deployed IDL only exposes `SvmError` constants, so name matching via the simulation logs is the only reliable disambiguator. The fact that `describeSolanaError` already surfaces `err.context.logs` makes this cheap to check. - In the `_executeRelayerRefundLeafSvm` catch: if the helper returns true, log at debug instead of error, still deactivate the LUT, and return `undefined` (the signature) so the caller knows nothing landed on our behalf. - Caller (`_executeRelayerRefundLeaves` SVM branch) skips the `"Executed RelayerRefundLeaf"` info log when the signature is `undefined`. Out of scope (follow-ups): - Same treatment for the deferred-refund `claimRelayerRefund` path — has its own potential races on `claimAccount`. - Moving the helper into `@across-protocol/sdk` next to `describeSolanaError` (deferred until the SDK bump that includes describeSolanaError lands). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #3422 (the
describeSolanaErrorlog surfacing). Target the stack-base branch for now; switch base tomasterafter #3422 merges.Makes
Dataworker._executeRelayerRefundLeafSvmrobust to the race where another actor (concurrent dataworker instance, manual execution, etc.) lands the relayer-refund merkle leaf between our pre-flightgetRelayerRefundExecutions()filter (Dataworker.ts:2284) and the simulate-and-send. On-chain this surfaces asprograms/svm-spoke/src/instructions/bundle.rs:124returningCommonError::ClaimedMerkleLeaf; today the dataworker treats it like any other failure —logger.error+ rethrow +Process exited with code 1. EVM has the analog handled already viaMultiCallerClient.knownRevertReasons("ClaimedMerkleLeaf","RelayFilled").Incident pattern this fixes: cluster of failures on 2026-05-23 and one today (run
3bc42214…, bundle 10444 leaf 30) were all consistent with this race.Change
isSvmLeafAlreadyClaimedError(err)insrc/utils/LogUtils.ts. Returns true ifferris a SolanaError with__code === SVM_TRANSACTION_PREFLIGHT_FAILUREand its simulationlogs[]contain Anchor's"Error Code: ClaimedMerkleLeaf"line._executeRelayerRefundLeafSvmcatch (Dataworker.ts:3287): branch on the helper. If benign →logger.debugwith the same structured payload, still deactivate the LUT, returnundefined. If not → unchanged (stilllogger.error+ rethrow)._executeRelayerRefundLeavesSVM branch (Dataworker.ts:2544): skip the"Executed RelayerRefundLeaf"info log when the returned signature isundefined(we'd be claiming we executed a leaf we didn't).Promise<string>toPromise<string | undefined>.Why match by log line instead of numeric code
The SVM SpokePool program declares two
#[error_code]enums (CommonError,SvmError) inprograms/svm-spoke/src/error.rs. Both default-start at offset 6000 in Anchor 0.31, so they collide numerically —ClaimedMerkleLeafandSvmError::InvalidRefundboth serialize to 6010 on the wire. The published@across-protocol/contractsIDL further compounds this:anchor idl buildonly emits the last declared enum, so the generated client (dist/src/svm/clients/SvmSpoke/errors/svmSpoke.js) ships onlySvmErrorconstants — noSVM_SPOKE_ERROR__CLAIMED_MERKLE_LEAFexists. The only reliable disambiguator is the Anchor program-log line, whichdescribeSolanaError(added in #3422) now surfaces.(A separate
across-protocol/contractsPR has been proposed to post-process the IDL so both enums ship — once that lands, this matcher can swap to a numeric code without breaking on disambiguation.)Out of scope (follow-ups)
claimRelayerRefundpath (Dataworker.ts:3232-3255) — has its own race onclaimAccount.isSvmLeafAlreadyClaimedErrorinto@across-protocol/sdknext todescribeSolanaError(deferred until the SDK PR withdescribeSolanaErrorlands + bumps).knownRevertReasons; no change here.Doc updates
No
README.md/AGENTS.mdupdates needed — internal behavior tweak to a private catch path; no interface or operator-facing contract change.Test plan
yarn typecheckyarn lintnpx hardhat test test/LogUtils.ts(6 new tests forisSvmLeafAlreadyClaimedError: non-Solana, non-preflight, preflight without ClaimedMerkleLeaf, preflight with it, JSON round-tripped error, missinglogsarray)npx hardhat test test/Dataworker.executeRelayerRefunds.ts(existing EVM coverage still green)debugrather thanerror(rare; will surface naturally next time two dataworkers contend)Thread: #bot-monitoring 1779869895.829509
🤖 Generated with Claude Code