Skip to content

improve(dataworker): surface SolanaError context + cause on refund-leaf failures#3422

Open
droplet-rl wants to merge 2 commits into
masterfrom
droplet/T90K0AL22-C03GHT4RV42-1779869895-829509
Open

improve(dataworker): surface SolanaError context + cause on refund-leaf failures#3422
droplet-rl wants to merge 2 commits into
masterfrom
droplet/T90K0AL22-C03GHT4RV42-1779869895-829509

Conversation

@droplet-rl
Copy link
Copy Markdown
Contributor

Summary

Dataworker#_executeRelayerRefundLeafSvm catches SolanaErrors from @solana/kit and logs them via error: err. The @risk-labs/logger formatter collapses error to err.stack, which drops the two fields you actually need to diagnose a Solana refund-leaf failure:

  • err.context — for SEND_TRANSACTION_PREFLIGHT_FAILURE, this is the RpcSimulateTransactionResult with logs[], accounts, returnData, unitsConsumed.
  • err.cause — the wrapped TransactionError / InstructionError (e.g. InstructionError__CUSTOM with the program error code, INSUFFICIENT_FUNDS, ALREADY_PROCESSED, etc.).

Production logs today read only SolanaError: Transaction simulation failed at /across-relayer/node_modules/@solana/rpc-transformers/dist/index.node.cjs:332:22 … — no way to tell whether it was an already-executed leaf, a stale blockhash, a custom program error, or RPC flake. The exec error from earlier today (run 3bc42214…, bundle 10444 leaf 30) is one example; there were a dozen more on 2026-05-23.

Change

  • Add describeSolanaError(err): { solanaError?: { name, message?, code, context, cause? } } in src/utils/LogUtils.ts. No-op for non-Solana errors. Recursively unwraps nested SolanaError causes; falls back to { message } for non-Solana Error causes.
  • Spread it into the existing logger.error and the deactivateLut cleanup logger.warn in Dataworker._executeRelayerRefundLeafSvm. error: err is preserved (still emits the stack), and the new solanaError: {...} field carries the structured diagnostic.

Why not in the formatter / a top-level catch

The risk-labs errorStackTracerFormatter only special-cases info.error and is consumed by every winston log; touching it is a much bigger blast radius than I want for a logging tweak. Adding solanaError as a sibling key passes through bigNumberFormatter and JsonTransport cleanly without touching the shared formatter contract. The dataworker is also the first catch in the chain — the top-level handler in index.ts rethrows via stringifyThrownValue which would also flatten the context, so adding fields here is the right layer.

Related to #3409 (which renamed the same log's errorcause to similar effect — that rename was reverted in #3410). This change is additive, so it composes either way.

Sample shape

With the change, a preflight failure now logs (in addition to the stack under error):

{
  "solanaError": {
    "name": "SolanaError",
    "message": "Transaction simulation failed",
    "code": -32002,
    "context": {
      "__code": -32002,
      "logs": [
        "Program log: refund leaf already executed"
      ],
      "accounts": null,
      "unitsConsumed": "4321"
    },
    "cause": {
      "name": "SolanaError",
      "code": 4615001,
      "context": { "__code": 4615001, "index": 3 }
    }
  }
}

Doc updates

No README.md / AGENTS.md updates needed — this is a log-payload tweak, no interface or behavioral change visible to operators or other bots.

Test plan

  • yarn typecheck
  • yarn lint
  • npx hardhat test test/LogUtils.ts (new unit test for describeSolanaError: non-Solana error returns {}; SolanaError extracts code, context, message)
  • Deploy + wait for next refund-leaf failure to confirm the new payload shape in bots-across-3839 Cloud Run logs.

Thread: #bot-monitoring 1779869895.829509

🤖 Generated with Claude Code

…af failures

`Dataworker#_executeRelayerRefundLeafSvm` catches `SolanaError`s from the SDK
and logs them via `error: err`. The @risk-labs/logger formatter collapses
`error` to `err.stack`, which drops:
- `err.context` — for preflight failures this is the
  `RpcSimulateTransactionResult`, with `logs[]`, `accounts`, `unitsConsumed`.
- `err.cause` — the wrapped `TransactionError` / `InstructionError`.

That left production logs reading only `SolanaError: Transaction simulation
failed at ...` with no way to tell whether it was an already-executed leaf,
a stale blockhash, a custom program error, or RPC flake.

Add `describeSolanaError(err)` in `LogUtils.ts` that returns a structured
`{ solanaError: { code, context, cause? } }` payload (no-op for non-Solana
errors) and spread it into both the error log and the `deactivateLut`
cleanup warn log.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 73a47fec8e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/utils/LogUtils.ts
const solanaError: SolanaErrorDescription = {
name: err.name,
code: err.context.__code,
context: err.context,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Serialize Solana context before logging

When a Solana preflight failure includes unitsConsumed (typed as bigint by @solana/rpc-api, and even represented as 4321n in the new test), copying err.context directly leaves a BigInt in the log payload. The configured JSON/persistent alert transports call JSON.stringify(info) (for example node_modules/@risk-labs/logger/dist/logger/JsonTransport.js:25), which throws on BigInt, so this new logging path can raise while handling the original refund-leaf error and skip the cleanup/rethrow flow instead of producing diagnostics. Sanitize the context recursively, or at least stringify bigint values, before returning it.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants