v0.8.66: Throttle Zhipu/GLM-5.2 request concurrency to avoid SSE stream timeouts (esp. under Fleet/sub-agents)

## Problem
CodeWhale running against the Zhipu/GLM coding plan (GLM-5.2) hits repeated provider failures under concurrent load:

> Transient provider failure after 3 API attempt(s): SSE stream request did not receive response headers after 45s.

The Zhipu coding plan advertises a concurrency limit of 10, but sustained parallel SSE streams (e.g. a parent session + 3 Fleet/sub-agent children all streaming GLM-5.2) trip header timeouts and "waiting for model" stalls well before 10. Single-stream requests stay responsive.

## Evidence
During v0.8.65 work, fanning out 3 implementer sub-agents alongside the parent (4 concurrent GLM-5.2 streams) caused two sub-agents to be interrupted by the provider after ~18-21 steps with the SSE-headers timeout above; agents stalled "waiting for model." Reducing to a single stream restored responsiveness. Sub-agent fan-out itself is fine — the ceiling is the provider's effective concurrency under SSE.

## Proposal (v0.8.66)
- Per-provider **max-concurrency** setting on the provider/route config, with a conservative default for Zhipu/GLM (~3) below the advertised 10.
- Apply it across the request path AND Fleet/sub-agent fan-out so parent + children share ONE bounded in-flight budget per provider (not independent unbounded queues).
- Surface effective limit + active in-flight count in the `/provider` readiness dashboard (#3083).
- Optional: adaptive backoff when SSE header latency rises.

## Ties into
#3439 (Zhipu first-class descriptor — the throttle lives on provider config), Fleet concurrency (#3154/#3205), provider routing (#2608/#3084).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.66: Throttle Zhipu/GLM-5.2 request concurrency to avoid SSE stream timeouts (esp. under Fleet/sub-agents) #3496

Problem

Evidence

Proposal (v0.8.66)

Ties into

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v0.8.66: Throttle Zhipu/GLM-5.2 request concurrency to avoid SSE stream timeouts (esp. under Fleet/sub-agents) #3496

Description

Problem

Evidence

Proposal (v0.8.66)

Ties into

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions