Skip to content

improve(dataworker): treat SVM refund-leaf already-claimed races as benign#3423

Open
droplet-rl wants to merge 1 commit into
droplet/T90K0AL22-C03GHT4RV42-1779869895-829509from
droplet/T90K0AL22-C03GHT4RV42-1779869895-829509-leaf-claimed
Open

improve(dataworker): treat SVM refund-leaf already-claimed races as benign#3423
droplet-rl wants to merge 1 commit into
droplet/T90K0AL22-C03GHT4RV42-1779869895-829509from
droplet/T90K0AL22-C03GHT4RV42-1779869895-829509-leaf-claimed

Conversation

@droplet-rl
Copy link
Copy Markdown
Contributor

Summary

Stacked on #3422 (the describeSolanaError log surfacing). Target the stack-base branch for now; switch base to master after #3422 merges.

Makes Dataworker._executeRelayerRefundLeafSvm robust to the race where another actor (concurrent dataworker instance, manual execution, etc.) lands the relayer-refund merkle leaf between our pre-flight getRelayerRefundExecutions() filter (Dataworker.ts:2284) and the simulate-and-send. On-chain this surfaces as programs/svm-spoke/src/instructions/bundle.rs:124 returning CommonError::ClaimedMerkleLeaf; today the dataworker treats it like any other failure — logger.error + rethrow + Process exited with code 1. EVM has the analog handled already via MultiCallerClient.knownRevertReasons ("ClaimedMerkleLeaf", "RelayFilled").

Incident pattern this fixes: cluster of failures on 2026-05-23 and one today (run 3bc42214…, bundle 10444 leaf 30) were all consistent with this race.

Change

  • New isSvmLeafAlreadyClaimedError(err) in src/utils/LogUtils.ts. Returns true iff err is a SolanaError with __code === SVM_TRANSACTION_PREFLIGHT_FAILURE and its simulation logs[] contain Anchor's "Error Code: ClaimedMerkleLeaf" line.
  • In _executeRelayerRefundLeafSvm catch (Dataworker.ts:3287): branch on the helper. If benign → logger.debug with the same structured payload, still deactivate the LUT, return undefined. If not → unchanged (still logger.error + rethrow).
  • Caller in _executeRelayerRefundLeaves SVM branch (Dataworker.ts:2544): skip the "Executed RelayerRefundLeaf" info log when the returned signature is undefined (we'd be claiming we executed a leaf we didn't).
  • Return type widened from Promise<string> to Promise<string | undefined>.

Why match by log line instead of numeric code

The SVM SpokePool program declares two #[error_code] enums (CommonError, SvmError) in programs/svm-spoke/src/error.rs. Both default-start at offset 6000 in Anchor 0.31, so they collide numerically — ClaimedMerkleLeaf and SvmError::InvalidRefund both serialize to 6010 on the wire. The published @across-protocol/contracts IDL further compounds this: anchor idl build only emits the last declared enum, so the generated client (dist/src/svm/clients/SvmSpoke/errors/svmSpoke.js) ships only SvmError constants — no SVM_SPOKE_ERROR__CLAIMED_MERKLE_LEAF exists. The only reliable disambiguator is the Anchor program-log line, which describeSolanaError (added in #3422) now surfaces.

(A separate across-protocol/contracts PR has been proposed to post-process the IDL so both enums ship — once that lands, this matcher can swap to a numeric code without breaking on disambiguation.)

Out of scope (follow-ups)

  • Same treatment for the deferred-refund claimRelayerRefund path (Dataworker.ts:3232-3255) — has its own race on claimAccount.
  • Moving isSvmLeafAlreadyClaimedError into @across-protocol/sdk next to describeSolanaError (deferred until the SDK PR with describeSolanaError lands + bumps).
  • EVM-side already covered via knownRevertReasons; no change here.

Doc updates

No README.md / AGENTS.md updates needed — internal behavior tweak to a private catch path; no interface or operator-facing contract change.

Test plan

  • yarn typecheck
  • yarn lint
  • npx hardhat test test/LogUtils.ts (6 new tests for isSvmLeafAlreadyClaimedError: non-Solana, non-preflight, preflight without ClaimedMerkleLeaf, preflight with it, JSON round-tripped error, missing logs array)
  • npx hardhat test test/Dataworker.executeRelayerRefunds.ts (existing EVM coverage still green)
  • Deploy + observe a real production race fire as debug rather than error (rare; will surface naturally next time two dataworkers contend)

Thread: #bot-monitoring 1779869895.829509

🤖 Generated with Claude Code

…enign

`Dataworker._executeRelayerRefundLeafSvm` was rethrowing every catch — including
the case where the on-chain `is_claimed` check at
`programs/svm-spoke/src/instructions/bundle.rs` returned
`CommonError::ClaimedMerkleLeaf` because another actor (concurrent dataworker,
manual execution) landed the leaf between our pre-flight
`getRelayerRefundExecutions()` filter and the simulate-and-send. The race window
is small but real: cluster of failures on 2026-05-23 and one today were all
consistent with it, surfacing as `Process exited with error 🚨`.

EVM parity already exists via `MultiCallerClient.knownRevertReasons`
(`"ClaimedMerkleLeaf"`, `"RelayFilled"`). SVM lacked an analog.

Changes:
- New `isSvmLeafAlreadyClaimedError(err)` helper in `LogUtils.ts`. Matches by
  the Anchor program-log line `Error Code: ClaimedMerkleLeaf` rather than by
  numeric code, because the SVM SpokePool declares two `#[error_code]` enums
  (`CommonError`, `SvmError`) that both default-start at offset 6000 and collide
  numerically — the deployed IDL only exposes `SvmError` constants, so name
  matching via the simulation logs is the only reliable disambiguator. The fact
  that `describeSolanaError` already surfaces `err.context.logs` makes this
  cheap to check.
- In the `_executeRelayerRefundLeafSvm` catch: if the helper returns true, log
  at debug instead of error, still deactivate the LUT, and return `undefined`
  (the signature) so the caller knows nothing landed on our behalf.
- Caller (`_executeRelayerRefundLeaves` SVM branch) skips the
  `"Executed RelayerRefundLeaf"` info log when the signature is `undefined`.

Out of scope (follow-ups):
- Same treatment for the deferred-refund `claimRelayerRefund` path — has its
  own potential races on `claimAccount`.
- Moving the helper into `@across-protocol/sdk` next to `describeSolanaError`
  (deferred until the SDK bump that includes describeSolanaError lands).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant