improve(rebalancer): warn when pending order is pruned without finalization#3435
Conversation
…zation `BaseAdapter._redisCleanupPendingOrders` previously logged at `debug` when it discarded a status-set entry whose order-details key had aged past its `REBALANCER_PENDING_ORDER_TTL`. That path is the only way an in-flight order can be dropped without the adapter emitting its usual finalization log, so it should be surfaced rather than buried. Bump the log to `warn` and include the cloid, status name, account, and status-set key as structured fields so operators can detect and alert on abandoned orders. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad21c13c8f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
_redisDeleteOrder removes the status-set member and details key via Promise.all, so a concurrent _redisCleanupPendingOrders could observe the details deletion before the set removal and emit a false-positive "pruned without finalization" warn. Re-check set membership before warning, and add an sIsMember helper on RedisCache for the lookup.
|
@codex addressed the false-positive concern in fbfa392 by re-checking |
|
Addressed the Codex feedback on Race Codex flagged: Fix: when details are Posted an |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fbfa392f9c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // (e.g. from another rebalancer instance sharing this signer address) can land the details `del` before the | ||
| // set `sRem` is observed here, which would falsely flag a finalized order as abandoned. Re-check set | ||
| // membership against the live state to suppress that race; a genuine TTL prune still has the member present. | ||
| const stillMember = await this.redisCache.sIsMember(statusSetKey, cloid); |
There was a problem hiding this comment.
Avoid warning while finalization removal is still in flight
In the concurrent-finalization scenario this check is meant to suppress, the status member can still be present when the details key has already disappeared: cleanup can capture sMembers, the finalizer's del can become visible, and its sRem may not have landed by this sIsMember re-check yet. In that window stillMember is true, so we still emit the "without finalization" warning for an order that is actively finalizing; consider basing the warning on the result of the actual removal or otherwise waiting/confirming that no competing removal occurred.
Useful? React with 👍 / 👎.
Make _redisDeleteOrder sequential (sRem then del) and return whether the set member was actually removed. _redisCleanupPendingOrders uses that boolean as the authoritative signal: a finalize racing us will land its sRem before we even queue ours (sequential ordering means an observer seeing the details key gone has also seen sRem land), so our sRem returns 0 and we suppress; a true TTL prune still has the member present, sRem returns 1, and we warn. Drops the sIsMember helper that the previous re-check approach needed.
|
@codex took a stronger pass in 8245e4d. |
|
Reworked the race-resolution in 8245e4d to remove the residual window Codex flagged. Previous approach (insufficient): New approach:
This eliminates the race window: there's no path where details are observed deleted but sRem hasn't landed yet. Typecheck + lint pass; |
Review ResultNo further findings from my re-review of commit The updated approach addresses the race I flagged: No README/AGENTS updates are needed for this review-only follow-up because I did not change runtime behavior or configuration beyond validating the existing PR changes. Testing
I did not make code changes, so I did not create a new commit or PR. |
|
Acknowledged — Codex re-review on 8245e4d came back clean with no further findings. Nothing to do. |
|
Thanks for the approval, @nicholaspai — no inline comments to address. |
Summary
BaseAdapter._redisCleanupPendingOrdersis the path that drops a pending swap-rebalancer order whoseREBALANCER_PENDING_ORDER_TTL(default 1h) elapsed before the adapter saw it finalize. Until now it logged atdebug, so abandoned orders went silent — no✨ has finalizedlog and no warn either.This bumps the log to
warnand adds structuredcloid,status(name),account, andstatusSetKeyfields so operators can filter on it.Why this matters
Multi-hop Binance swap routes that bridge funds in via OFT/CCTP before the deposit (e.g.
USDT0HyperEVM → Arbitrum → Binance →USDC-BNBBNB) can exceed the 1h TTL when the first bridge leg is slow. When that happens today the only signal in Slack is a missing finalization message — surfaced by Nick in the originating thread (https://risklabs.slack.com/archives/C0AFB0BP3SB/p1780080752016549).Concrete recent example from
bots-across-3839logs (cloid0xa7baaec4e0bdc6effd2c833033b057e1, May 29 2026):11:46:38Z— order created inPENDING_BRIDGE_PRE_DEPOSIT(OFT 136,668 USDT0 HyperEVM → Arbitrum).12:46:14Z— Arbitrum balance11,186 USDTvs required136,668 USDT; OFT still in flight.12:47:00Z—_redisCleanupPendingOrdersran (debug log only). No finalization message ever followed.With this change that prune fires a
warninstead, so Datadog/Slack alerting can flag it.Caveats
getPendingRebalances(account)is called from several bots (zion-across-relayer-primary,zion-across-inventory-manager-primary,zion-across-finalizer-binance,zion-across-reporter,zion-across-swap-rebalancer,zion-across-relayer-sweeper,zion-across-fast-relayer-rebalancer). Each one races to call_redisCleanupPendingOrders, so the warn can fire from more than one bot per pruned cloid. Operators should dedupe oncloidwhen alerting.Test plan
⏰ Pruning expired pending order ...warn appears inbots-across-3839logs the next time a multi-hop Binance order's TTL expires._redisDeleteOrderdirectly, not via cleanup) still produce no warn.🤖 Generated with Claude Code