Skip to content

feat(proxy): resilient relay retries with per-host adaptive cooldown#33

Open
mehrad-mz wants to merge 1 commit intomasterking32:python_testingfrom
mehrad-mz:python_testing
Open

feat(proxy): resilient relay retries with per-host adaptive cooldown#33
mehrad-mz wants to merge 1 commit intomasterking32:python_testingfrom
mehrad-mz:python_testing

Conversation

@mehrad-mz
Copy link
Copy Markdown

Summary

Improves reliability when the Apps Script relay returns transient failures (e.g. short-lived throttling / overload). The proxy now retries safe idempotent requests and applies per-host back-pressure so burst traffic (many sequential GETs, like image galleries) is less likely to trigger repeated failures.

Motivation

Users observed intermittent 502 responses from the local relay path that often succeed on a quick manual refresh. Sequential downloads amplified the issue due to request bursts hitting the same host.

What changed

  • Wrapped relay dispatch in _relay_smart_with_transient_retry() so _relay_smart() behavior is preserved, with an outer retry/cooldown layer.
  • Retries are limited to safe methods (GET, HEAD, OPTIONS) and empty bodies (no retry for mutating requests).
  • Treats transient statuses: relay-style 502 payloads plus 429 / 503.
  • Adds adaptive per-host cooldown with exponential growth up to a max, plus jitter to avoid synchronized retry storms across concurrent clients.
  • Resets per-host streak after a successful response.

New / existing config knobs (optional overrides)

All keys are optional; defaults are embedded in code for out-of-the-box behavior:

  • relay_retry_on_502 (bool)
  • relay_retry_max_attempts (int)
  • relay_retry_backoff_seconds (float)
  • relay_retry_cooldown_seconds (float)
  • relay_retry_cooldown_max_seconds (float)

Trade-offs

Successful paths are unchanged, but temporarily failing hosts may see slightly higher latency due to backoff/cooldown (intentional to reduce error rate).

Testing

  • Manual browsing under bursty loads
  • Sequential image download scenario (100+ assets) — reported fewer transient failures

Notes for reviewers

  • Please sanity-check that retry scope (methods + transient detection) matches project safety expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant