Skip to content

v0.8.66: Throttle Zhipu/GLM-5.2 request concurrency to avoid SSE stream timeouts (esp. under Fleet/sub-agents) #3496

Description

@Hmbown

Problem

CodeWhale running against the Zhipu/GLM coding plan (GLM-5.2) hits repeated provider failures under concurrent load:

Transient provider failure after 3 API attempt(s): SSE stream request did not receive response headers after 45s.

The Zhipu coding plan advertises a concurrency limit of 10, but sustained parallel SSE streams (e.g. a parent session + 3 Fleet/sub-agent children all streaming GLM-5.2) trip header timeouts and "waiting for model" stalls well before 10. Single-stream requests stay responsive.

Evidence

During v0.8.65 work, fanning out 3 implementer sub-agents alongside the parent (4 concurrent GLM-5.2 streams) caused two sub-agents to be interrupted by the provider after ~18-21 steps with the SSE-headers timeout above; agents stalled "waiting for model." Reducing to a single stream restored responsiveness. Sub-agent fan-out itself is fine — the ceiling is the provider's effective concurrency under SSE.

Proposal (v0.8.66)

  • Per-provider max-concurrency setting on the provider/route config, with a conservative default for Zhipu/GLM (~3) below the advertised 10.
  • Apply it across the request path AND Fleet/sub-agent fan-out so parent + children share ONE bounded in-flight budget per provider (not independent unbounded queues).
  • Surface effective limit + active in-flight count in the /provider readiness dashboard (v0.8.65: /provider readiness dashboard from route/catalog projections #3083).
  • Optional: adaptive backoff when SSE header latency rises.

Ties into

#3439 (Zhipu first-class descriptor — the throttle lives on provider config), Fleet concurrency (#3154/#3205), provider routing (#2608/#3084).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or requestreliabilityReliability, flaky behavior, retries, fallbacks, and robustnessv0.8.66Targeting v0.8.66

    Projects

    Status
    Backlog

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions