Skip to content

improve(rebalancer): warn when pending order is pruned without finalization#3435

Merged
nicholaspai merged 6 commits into
masterfrom
droplet/T90K0AL22-C0AFB0BP3SB-1780080752-016549
May 29, 2026
Merged

improve(rebalancer): warn when pending order is pruned without finalization#3435
nicholaspai merged 6 commits into
masterfrom
droplet/T90K0AL22-C0AFB0BP3SB-1780080752-016549

Conversation

@droplet-rl
Copy link
Copy Markdown
Contributor

Summary

BaseAdapter._redisCleanupPendingOrders is the path that drops a pending swap-rebalancer order whose REBALANCER_PENDING_ORDER_TTL (default 1h) elapsed before the adapter saw it finalize. Until now it logged at debug, so abandoned orders went silent — no ✨ has finalized log and no warn either.

This bumps the log to warn and adds structured cloid, status (name), account, and statusSetKey fields so operators can filter on it.

Why this matters

Multi-hop Binance swap routes that bridge funds in via OFT/CCTP before the deposit (e.g. USDT0 HyperEVM → Arbitrum → Binance → USDC-BNB BNB) can exceed the 1h TTL when the first bridge leg is slow. When that happens today the only signal in Slack is a missing finalization message — surfaced by Nick in the originating thread (https://risklabs.slack.com/archives/C0AFB0BP3SB/p1780080752016549).

Concrete recent example from bots-across-3839 logs (cloid 0xa7baaec4e0bdc6effd2c833033b057e1, May 29 2026):

  • 11:46:38Z — order created in PENDING_BRIDGE_PRE_DEPOSIT (OFT 136,668 USDT0 HyperEVM → Arbitrum).
  • 12:46:14Z — Arbitrum balance 11,186 USDT vs required 136,668 USDT; OFT still in flight.
  • 12:47:00Z_redisCleanupPendingOrders ran (debug log only). No finalization message ever followed.

With this change that prune fires a warn instead, so Datadog/Slack alerting can flag it.

Caveats

getPendingRebalances(account) is called from several bots (zion-across-relayer-primary, zion-across-inventory-manager-primary, zion-across-finalizer-binance, zion-across-reporter, zion-across-swap-rebalancer, zion-across-relayer-sweeper, zion-across-fast-relayer-rebalancer). Each one races to call _redisCleanupPendingOrders, so the warn can fire from more than one bot per pruned cloid. Operators should dedupe on cloid when alerting.

Test plan

  • Confirm the ⏰ Pruning expired pending order ... warn appears in bots-across-3839 logs the next time a multi-hop Binance order's TTL expires.
  • Confirm normal finalization paths (which go through _redisDeleteOrder directly, not via cleanup) still produce no warn.

🤖 Generated with Claude Code

…zation

`BaseAdapter._redisCleanupPendingOrders` previously logged at `debug` when
it discarded a status-set entry whose order-details key had aged past its
`REBALANCER_PENDING_ORDER_TTL`. That path is the only way an in-flight
order can be dropped without the adapter emitting its usual finalization
log, so it should be surfaced rather than buried. Bump the log to `warn`
and include the cloid, status name, account, and status-set key as
structured fields so operators can detect and alert on abandoned orders.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ad21c13c8f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/rebalancer/adapters/baseAdapter.ts Outdated
Comment thread src/rebalancer/adapters/baseAdapter.ts Outdated
Comment thread src/rebalancer/adapters/baseAdapter.ts Outdated
Comment thread src/rebalancer/adapters/baseAdapter.ts Outdated
nicholaspai
nicholaspai previously approved these changes May 29, 2026
_redisDeleteOrder removes the status-set member and details key via Promise.all,
so a concurrent _redisCleanupPendingOrders could observe the details deletion
before the set removal and emit a false-positive "pruned without finalization"
warn. Re-check set membership before warning, and add an sIsMember helper on
RedisCache for the lookup.
@droplet-rl
Copy link
Copy Markdown
Contributor Author

@codex addressed the false-positive concern in fbfa392 by re-checking sIsMember on the status set before emitting the warn. A concurrent _redisDeleteOrder will have removed the cloid from the set by the time we observe undefined order details, so the re-check suppresses the warn; a genuine TTL prune still has the member present and warns as intended. Please take another look.

@droplet-rl
Copy link
Copy Markdown
Contributor Author

Addressed the Codex feedback on src/rebalancer/adapters/baseAdapter.ts:273 in commit fbfa392.

Race Codex flagged: _redisDeleteOrder removes the status-set member and details key via Promise.all, so a concurrent _redisCleanupPendingOrders can capture sMembers before sRem lands and then read undefined after del lands → false-positive "pruned without finalization" warn.

Fix: when details are undefined, re-check sIsMember on the status set before warning. A concurrent finalize will have already removed the cloid (or will momentarily), so the re-check returns false and we suppress. A genuine TTL prune still has the member present, so the warn fires as intended. Added a lightweight sIsMember helper on RedisCache for the lookup.

Posted an @codex follow-up on the PR (#issuecomment-4579128391) so it can re-review. Typecheck + lint pass locally.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fbfa392f9c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/rebalancer/adapters/baseAdapter.ts Outdated
// (e.g. from another rebalancer instance sharing this signer address) can land the details `del` before the
// set `sRem` is observed here, which would falsely flag a finalized order as abandoned. Re-check set
// membership against the live state to suppress that race; a genuine TTL prune still has the member present.
const stillMember = await this.redisCache.sIsMember(statusSetKey, cloid);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Avoid warning while finalization removal is still in flight

In the concurrent-finalization scenario this check is meant to suppress, the status member can still be present when the details key has already disappeared: cleanup can capture sMembers, the finalizer's del can become visible, and its sRem may not have landed by this sIsMember re-check yet. In that window stillMember is true, so we still emit the "without finalization" warning for an order that is actively finalizing; consider basing the warning on the result of the actual removal or otherwise waiting/confirming that no competing removal occurred.

Useful? React with 👍 / 👎.

Make _redisDeleteOrder sequential (sRem then del) and return whether the set
member was actually removed. _redisCleanupPendingOrders uses that boolean as
the authoritative signal: a finalize racing us will land its sRem before we
even queue ours (sequential ordering means an observer seeing the details key
gone has also seen sRem land), so our sRem returns 0 and we suppress; a true
TTL prune still has the member present, sRem returns 1, and we warn. Drops
the sIsMember helper that the previous re-check approach needed.
@droplet-rl
Copy link
Copy Markdown
Contributor Author

@codex took a stronger pass in 8245e4d. _redisDeleteOrder is now sequential (sRem then del) and returns whether sRem actually removed the member. _redisCleanupPendingOrders uses that boolean as the authority: a concurrent finalize lands its sRem before its del, so any observer that sees the details key gone has also seen sRem land — our sRem then returns 0 and we suppress. A genuine TTL prune still has the member present, sRem returns 1, and we warn. The previous sIsMember re-check is dropped. Please re-review.

@droplet-rl
Copy link
Copy Markdown
Contributor Author

Reworked the race-resolution in 8245e4d to remove the residual window Codex flagged.

Previous approach (insufficient): sIsMember re-check still depended on _redisDeleteOrder's Promise.all([sRem, del]) landing sRem before del at the server. If del was visible but sRem wasn't yet, sIsMember returned true and we still warned.

New approach:

  • _redisDeleteOrder is now sequential: await sRem; await del;. This makes the ordering explicit at the JS level, not dependent on pipelining semantics — once del has landed at the server, sRem has necessarily already returned to the finalizer.
  • _redisDeleteOrder now returns whether the set member was actually removed (sRem result > 0).
  • _redisCleanupPendingOrders uses that boolean as the authoritative signal. A concurrent finalize beats our sRem to it → returns 0 → suppress. A TTL prune has the member still present → returns 1 → warn. The previous sIsMember helper is removed.

This eliminates the race window: there's no path where details are observed deleted but sRem hasn't landed yet. Typecheck + lint pass; @codex notified at #issuecomment-4579224363.

@chatgpt-codex-connector
Copy link
Copy Markdown

Review Result

No further findings from my re-review of commit 8245e4d.

The updated approach addresses the race I flagged: _redisDeleteOrder now awaits sRem before del, and returns whether the status-set member was actually removed by that call. That gives cleanup an authoritative signal: if another finalizer already removed the set member, cleanup suppresses the warning; if cleanup itself removes the stale member, it emits the TTL-prune warning. src/rebalancer/adapters/baseAdapter.tsL240-L254 src/rebalancer/adapters/baseAdapter.tsL271-L286

No README/AGENTS updates are needed for this review-only follow-up because I did not change runtime behavior or configuration beyond validating the existing PR changes.

Testing

  • yarn typecheck
  • yarn eslint src/rebalancer/adapters/baseAdapter.ts

I did not make code changes, so I did not create a new commit or PR.

View task →

@droplet-rl
Copy link
Copy Markdown
Contributor Author

Acknowledged — Codex re-review on 8245e4d came back clean with no further findings. Nothing to do.

@droplet-rl
Copy link
Copy Markdown
Contributor Author

Thanks for the approval, @nicholaspai — no inline comments to address.

@nicholaspai nicholaspai merged commit b383f13 into master May 29, 2026
5 checks passed
@nicholaspai nicholaspai deleted the droplet/T90K0AL22-C0AFB0BP3SB-1780080752-016549 branch May 29, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants