fix(rebalancer): isolate per-adapter failures in getPendingRebalances by droplet-rl · Pull Request #3437 · across-protocol/relayer

droplet-rl · 2026-05-30T08:52:30Z

Summary

A transient outage in one adapter's read API used to propagate up through BaseRebalancerClient.getPendingRebalances -> InventoryClient.update and crash the relayer process. This change catches per-adapter errors, logs them at error level with the adapter name, and continues aggregating from the others so the relayer keeps filling on chains served by healthy adapters and operates on a partial view of pending rebalances for that cycle.

Motivation

We just saw zion-across-relayer-primary exit with:

HttpRequestError: 502 Bad Gateway - nginx/1.22.1
    at HttpTransport.request (@nktkas/hyperliquid/.../http/mod.js:109)
    at getUserNonFundingLedgerUpdates (src/utils/HyperliquidUtils.ts)
    at HyperliquidStablecoinSwapAdapter._getInitiatedWithdrawalsFromHypercore (src/rebalancer/adapters/hyperliquid.ts:1006)
    at HyperliquidStablecoinSwapAdapter._getUnfinalizedWithdrawalAmountFromHypercore (src/rebalancer/adapters/hyperliquid.ts:1040)
    at HyperliquidStablecoinSwapAdapter.getPendingRebalances (src/rebalancer/adapters/hyperliquid.ts:550)
    at BaseRebalancerClient.getPendingRebalances (src/rebalancer/clients/BaseRebalancerClient.ts:56)

The call chain bottoms out in InventoryClient.update(), which has no catch around the rebalancer-client call. A transient HL read-API outage gates every fill the relayer would otherwise do on Ethereum, Arbitrum, Base, etc. That's the root problem: treating one venue's read API as a hard precondition for the whole relay loop.

Change

BaseRebalancerClient.getPendingRebalances now wraps each adapter.getPendingRebalances(account) in try/catch:

on success: same aggregation as before;
on failure: log at error (at: "BaseRebalancerClient#getPendingRebalances", includes the adapter name and stringified error), skip that adapter for the cycle.

The forEachAsync loop was switched from Object.values(this.adapters) to Object.entries(this.adapters) so the adapter name is available for logging.

Behavioral effect

The exact 502 from HyperliquidStablecoinSwapAdapter.getPendingRebalances that crashed the bot now logs and skips, leaving InventoryClient.pendingRebalances partial (HL contribution absent) for that update cycle. The next cycle retries.

For a sustained HL outage, the relayer will still attempt to initiate new HL swap rebalances since rebalanceInventory doesn't gate on read-side health. That's intentionally out of scope here; will likely follow up with read-side health gating on HL rebalance initiation in a separate PR.

Ops note

Silent-by-default is by design here (relayer-up beats accounting-perfect), but operators should add a Datadog alert on the new BaseRebalancerClient#getPendingRebalances error log so a persistent adapter outage doesn't quietly degrade inventory accounting.

Test plan

tsc --noEmit clean
eslint + prettier clean on touched files
hardhat test test/RebalancerClient.cumulativeRebalancing.ts - 17/17 passing
New unit test "Isolates per-adapter failures when aggregating pending rebalances" constructs one healthy + one throwing adapter and asserts the healthy adapter's pending rebalances still come through and aggregation does not throw
Operator: confirm new error log surfaces in Datadog under the expected at: value and wire an alert

🤖 Generated with Claude Code

A transient outage in one adapter's read API (e.g. a 502 from Hyperliquid's InfoClient when fetching non-funding ledger updates) used to propagate up through BaseRebalancerClient.getPendingRebalances to InventoryClient.update and crash the relayer process. Catch the error per adapter, log it, and continue aggregating from the others so the relayer keeps filling on chains served by healthy adapters and operates on a partial view of pending rebalances for that cycle. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f43cf50182

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-30T08:55:37Z

+        // Isolate per-adapter failures: a transient outage in one venue's read API (e.g. a 502
+        // from Hyperliquid) must not propagate up to callers like InventoryClient.update and take
+        // the relayer down. The caller operates on a partial view for this cycle.


Document partial pending-rebalance reads

This changes getPendingRebalances from fail-fast aggregation to returning a partial view when any adapter read fails, but the rebalancer README still describes the client as aggregating the account's pending state without noting this failure mode. The repo's AGENTS.md requires relevant README/AGENTS updates whenever behavior or runtime flow changes; without that update, operators and consumers of ReadOnlyRebalancerClient can assume the returned accounting is complete during an adapter outage when it is intentionally incomplete.

Useful? React with 👍 / 👎.

…nces Addresses Codex review on #3437: AGENTS.md requires README updates when runtime behavior changes, and the new per-adapter failure isolation in BaseRebalancerClient.getPendingRebalances now returns a partial view during an adapter outage. Note the contract for ReadOnlyRebalancerClient consumers (InventoryClient, Monitor) and clarify that direct adapter callers (Binance finalizer) are unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

droplet-rl · 2026-05-30T08:58:10Z

@codex thanks — fair point. The repo's AGENTS.md (line 15) explicitly requires README.md/AGENTS.md updates when runtime behavior changes, and the new partial-aggregation semantics weren't reflected in src/rebalancer/README.md.

Pushed a0d60ba adding a "Partial-view semantics on adapter failure" subsection under Read-only mode: ReadOnlyRebalancerClient. It records:

BaseRebalancerClient.getPendingRebalances now isolates per-adapter failures, logs at error with at: "BaseRebalancerClient#getPendingRebalances" + adapter name, and returns a partial aggregate from the remaining adapters.
ReadOnlyRebalancerClient consumers (InventoryClient.update, Monitor) see an intentionally partial view during an adapter outage and operators should alert on the new error log so persistent outages don't quietly degrade inventory accounting.
Direct adapter callers that bypass BaseRebalancerClient (notably the Binance finalizer in src/finalizer/utils/binance.ts, which calls the Binance adapter directly) are unaffected — their errors still propagate, so the existing "errors surface normally" contract for that path is preserved.

Please re-review.

chatgpt-codex-connector · 2026-05-30T08:59:50Z

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

droplet-rl · 2026-05-30T09:00:23Z

Acknowledged — Codex re-review on a0d60ba came back clean. No further changes.

pxrl · 2026-06-01T08:30:19Z

+          message: `Failed to fetch pending rebalances from ${adapterName} adapter; skipping for this cycle`,
+          error: (error as Error)?.toString() ?? String(error),
+        });
+        return;


Is it correct to return undefined here? Should we instead be returning 0 or something, in case the value is dereferenced?

Good question — it's safe: the return; exits only the per-iteration forEachAsync callback, not getPendingRebalances itself.

forEachAsync in the SDK is await mapAsync(array, callback) which is Promise.all(array.map(callback)) — the callback's return value is discarded, so returning early just skips aggregation for that adapter.

The outer getPendingRebalances always returns pendingRebalances, which is initialized to {} on entry and only ever gets keys added via pendingRebalances[chainId][token] = ... for adapters whose read succeeded. It is never undefined, and for the skipped adapter the corresponding chainId/token entry is simply absent (rather than zero) — i.e. a partial map.

Consumers handle that shape naturally:

InventoryClient (src/clients/InventoryClient.ts:307-317) reads via pendingRebalances[chainId]?.[l1TokenSymbol] with isDefined guards, so a missing chain/token entry contributes 0 to the balance computation — exactly the same effect a literal bnZero would have, without forcing us to enumerate which chain/token pairs the failed adapter would have reported.

Monitor similarly indexes via pendingRebalances[chainId]?.[token] with optional chaining.

So from a caller's perspective the choice is between "missing key" and "key present with value 0" — both are dereferenced safely, and "missing key" has the advantage that we don't have to guess the set of chain/token pairs the failed adapter would have populated (it differs per adapter and per route config). Happy to switch to explicit zero-fill if you'd rather have the dictionary shape be uniform across success/failure cases, but the current form keeps the failure semantics minimal — "this adapter contributed nothing" rather than "this adapter contributed zero to chains X, Y, Z."

droplet-rl · 2026-06-01T08:32:20Z

Thanks for the review @pxrl — replied to your inline question on the return;. TL;DR: it exits only the per-iteration forEachAsync callback (the SDK's forEachAsync discards callback return values), so the outer getPendingRebalances always returns the initialized pendingRebalances map. Skipped adapters leave their chain/token keys absent rather than zero, and existing consumers (InventoryClient, Monitor) already index with isDefined/optional chaining so a missing entry contributes 0. Happy to switch to explicit zero-fill if you'd prefer uniform shape — flagged that as the alternative in the inline reply.

droplet-rl requested review from bmzig, dijanin-brat, mrice32, nicholaspai and pxrl as code owners May 30, 2026 08:52

chatgpt-codex-connector Bot reviewed May 30, 2026

View reviewed changes

pxrl reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rebalancer): isolate per-adapter failures in getPendingRebalances#3437

fix(rebalancer): isolate per-adapter failures in getPendingRebalances#3437
droplet-rl wants to merge 2 commits into
masterfrom
droplet/T90K0AL22-C03GHT4RV42-1780130120-316509

droplet-rl commented May 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 30, 2026

Uh oh!

droplet-rl commented May 30, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 30, 2026

Uh oh!

droplet-rl commented May 30, 2026

Uh oh!

pxrl Jun 1, 2026

Uh oh!

droplet-rl Jun 1, 2026

Uh oh!

droplet-rl commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

droplet-rl commented May 30, 2026

Summary

Motivation

Change

Behavioral effect

Ops note

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

droplet-rl commented May 30, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 30, 2026

Uh oh!

droplet-rl commented May 30, 2026

Uh oh!

pxrl Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

droplet-rl Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

droplet-rl commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants