perf: trim API payloads and fix client hot paths by gary149 · Pull Request #2408 · huggingface/chat-ui

gary149 · 2026-07-02T15:14:20Z

Payload and hot-path fixes from profiling hf.co/chat user journeys (initial load, conversation switching, new conversation, streaming) with Chrome instrumentation, then verifying each candidate fix against the code.

Conversation GET payload trim

GET /api/v2/conversations/[id] returned everything MongoDB stores. Measured on prod: a fresh conversation with a single 4.5 KB assistant answer shipped as 20.9 KB of JSON, and the same payload is embedded a second time in the SSR document.

The response now drops, per message:

reasoning: server-side accumulator, zero client reads (rendering parses <think> blocks from message.content)
reasoning stream token updates: never rendered from updates
for tool-less messages only: per-token stream markers and the FinalAnswer.text duplicate of message.content. ChatMessage rebuilds tool-less messages from message.content via its existing fallbacks; messages with tool calls keep their markers and final text since those drive text/tool interleaving.

Status and FinalAnswer entries stay (blanked text) because generationState checks their presence. This is read-path only: stored documents are untouched, and the legacy /api/conversation/[id] endpoint still returns full updates. Notably the trim is deliberately not in resolveConversation, because the message-delete endpoint writes conversation.messages back to MongoDB after passing through it.

Two new tests cover the trim (tool-less) and the preservation (tool messages).

Models API trim

/api/v2/models is fetched in the root layout and embedded in every page (181 KB of the 318 KB prod HTML document). Removed:

promptExamples: never read client-side, the landing chips come from hardcoded constants
parameters: only used server-side (buildPrompt)

providers, preprompt and multimodalAcceptedMimetypes stay (settings page, ChatWindow). The legacy /api/models endpoint is unchanged for external consumers. One server file (endpoints.ts) typed generateSettings against the frontend Model type and now uses BackendModel.

Hot-path fixes

Session hook: the rolling-expiry sessions.updateOne was awaited before resolve() on every POST, adding a serial MongoDB round-trip to message sends. Now fire-and-forget with error logging.
InfiniteScroll: the 250 ms setInterval could trigger duplicate page fetches while one was in flight (handleVisible has no concurrency guard). Replaced with a busy-guarded, once-per-intersection observer that re-arms after each load, so pagination still continues when the sentinel remains visible after new items render.
Markdown worker pool: navigating away mid-render terminated the in-flight worker, forcing a full respawn (bundle re-eval, katex/hljs init) on the next conversation. The existing cancelled flag already discards stale results, so the worker is now recycled instead.

Verification

npm run check (0 errors), npm run lint, npm test: 392 tests pass including the 2 new ones.

Conversation GET: stop shipping fields the client never reads. Drops the server-side reasoning accumulator and raw reasoning stream tokens; for tool-less messages also drops per-token stream markers and blanks the FinalAnswer text duplicate of message.content. Read path only, MongoDB documents and the legacy /api/conversation/[id] endpoint are unchanged. A 4.5KB assistant answer measured on prod shipped as 20.9KB of JSON. Models API: remove promptExamples and parameters from /api/v2/models. promptExamples is never read client-side (landing chips are hardcoded constants) and parameters is only used server-side. This payload is embedded in every page via the root layout. Session hook: the rolling-expiry updateOne no longer blocks POSTs, it was a serial MongoDB round-trip before resolve() on every message send. InfiniteScroll: replace the 250ms interval with a busy-guarded once-per-intersection observer that re-arms after each load. The interval could fire duplicate page fetches while one was in flight. Markdown worker pool: cancelling a client no longer terminates the in-flight worker. The cancelled flag already discards stale results, so recycling avoids a full worker respawn on every conversation switch.

gary149 · 2026-07-02T23:26:23Z

Superseded by the perf series #2410 (models payload), #2412 (conversation switching and create-response seeding), #2413 (stream update batching) and #2414 (send handler round trips), which land the same goals split into independently mergeable changes. All seven PRs in the series are now merged.

gary149 closed this Jul 2, 2026

gary149 deleted the perf/trim-payloads-hot-paths branch July 2, 2026 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: trim API payloads and fix client hot paths#2408

perf: trim API payloads and fix client hot paths#2408
gary149 wants to merge 1 commit into
mainfrom
perf/trim-payloads-hot-paths

gary149 commented Jul 2, 2026

Uh oh!

gary149 commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant