You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CodeWhale running against the Zhipu/GLM coding plan (GLM-5.2) hits repeated provider failures under concurrent load:
Transient provider failure after 3 API attempt(s): SSE stream request did not receive response headers after 45s.
The Zhipu coding plan advertises a concurrency limit of 10, but sustained parallel SSE streams (e.g. a parent session + 3 Fleet/sub-agent children all streaming GLM-5.2) trip header timeouts and "waiting for model" stalls well before 10. Single-stream requests stay responsive.
Evidence
During v0.8.65 work, fanning out 3 implementer sub-agents alongside the parent (4 concurrent GLM-5.2 streams) caused two sub-agents to be interrupted by the provider after ~18-21 steps with the SSE-headers timeout above; agents stalled "waiting for model." Reducing to a single stream restored responsiveness. Sub-agent fan-out itself is fine — the ceiling is the provider's effective concurrency under SSE.
Proposal (v0.8.66)
Per-provider max-concurrency setting on the provider/route config, with a conservative default for Zhipu/GLM (~3) below the advertised 10.
Apply it across the request path AND Fleet/sub-agent fan-out so parent + children share ONE bounded in-flight budget per provider (not independent unbounded queues).
Problem
CodeWhale running against the Zhipu/GLM coding plan (GLM-5.2) hits repeated provider failures under concurrent load:
The Zhipu coding plan advertises a concurrency limit of 10, but sustained parallel SSE streams (e.g. a parent session + 3 Fleet/sub-agent children all streaming GLM-5.2) trip header timeouts and "waiting for model" stalls well before 10. Single-stream requests stay responsive.
Evidence
During v0.8.65 work, fanning out 3 implementer sub-agents alongside the parent (4 concurrent GLM-5.2 streams) caused two sub-agents to be interrupted by the provider after ~18-21 steps with the SSE-headers timeout above; agents stalled "waiting for model." Reducing to a single stream restored responsiveness. Sub-agent fan-out itself is fine — the ceiling is the provider's effective concurrency under SSE.
Proposal (v0.8.66)
/providerreadiness dashboard (v0.8.65: /provider readiness dashboard from route/catalog projections #3083).Ties into
#3439 (Zhipu first-class descriptor — the throttle lives on provider config), Fleet concurrency (#3154/#3205), provider routing (#2608/#3084).