Skip to content

perf: trim API payloads and fix client hot paths#2408

Closed
gary149 wants to merge 1 commit into
mainfrom
perf/trim-payloads-hot-paths
Closed

perf: trim API payloads and fix client hot paths#2408
gary149 wants to merge 1 commit into
mainfrom
perf/trim-payloads-hot-paths

Conversation

@gary149

@gary149 gary149 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Payload and hot-path fixes from profiling hf.co/chat user journeys (initial load, conversation switching, new conversation, streaming) with Chrome instrumentation, then verifying each candidate fix against the code.

Conversation GET payload trim

GET /api/v2/conversations/[id] returned everything MongoDB stores. Measured on prod: a fresh conversation with a single 4.5 KB assistant answer shipped as 20.9 KB of JSON, and the same payload is embedded a second time in the SSR document.

The response now drops, per message:

  • reasoning: server-side accumulator, zero client reads (rendering parses <think> blocks from message.content)
  • reasoning stream token updates: never rendered from updates
  • for tool-less messages only: per-token stream markers and the FinalAnswer.text duplicate of message.content. ChatMessage rebuilds tool-less messages from message.content via its existing fallbacks; messages with tool calls keep their markers and final text since those drive text/tool interleaving.

Status and FinalAnswer entries stay (blanked text) because generationState checks their presence. This is read-path only: stored documents are untouched, and the legacy /api/conversation/[id] endpoint still returns full updates. Notably the trim is deliberately not in resolveConversation, because the message-delete endpoint writes conversation.messages back to MongoDB after passing through it.

Two new tests cover the trim (tool-less) and the preservation (tool messages).

Models API trim

/api/v2/models is fetched in the root layout and embedded in every page (181 KB of the 318 KB prod HTML document). Removed:

  • promptExamples: never read client-side, the landing chips come from hardcoded constants
  • parameters: only used server-side (buildPrompt)

providers, preprompt and multimodalAcceptedMimetypes stay (settings page, ChatWindow). The legacy /api/models endpoint is unchanged for external consumers. One server file (endpoints.ts) typed generateSettings against the frontend Model type and now uses BackendModel.

Hot-path fixes

  • Session hook: the rolling-expiry sessions.updateOne was awaited before resolve() on every POST, adding a serial MongoDB round-trip to message sends. Now fire-and-forget with error logging.
  • InfiniteScroll: the 250 ms setInterval could trigger duplicate page fetches while one was in flight (handleVisible has no concurrency guard). Replaced with a busy-guarded, once-per-intersection observer that re-arms after each load, so pagination still continues when the sentinel remains visible after new items render.
  • Markdown worker pool: navigating away mid-render terminated the in-flight worker, forcing a full respawn (bundle re-eval, katex/hljs init) on the next conversation. The existing cancelled flag already discards stale results, so the worker is now recycled instead.

Verification

npm run check (0 errors), npm run lint, npm test: 392 tests pass including the 2 new ones.

Conversation GET: stop shipping fields the client never reads. Drops the
server-side reasoning accumulator and raw reasoning stream tokens; for
tool-less messages also drops per-token stream markers and blanks the
FinalAnswer text duplicate of message.content. Read path only, MongoDB
documents and the legacy /api/conversation/[id] endpoint are unchanged.
A 4.5KB assistant answer measured on prod shipped as 20.9KB of JSON.

Models API: remove promptExamples and parameters from /api/v2/models.
promptExamples is never read client-side (landing chips are hardcoded
constants) and parameters is only used server-side. This payload is
embedded in every page via the root layout.

Session hook: the rolling-expiry updateOne no longer blocks POSTs, it
was a serial MongoDB round-trip before resolve() on every message send.

InfiniteScroll: replace the 250ms interval with a busy-guarded
once-per-intersection observer that re-arms after each load. The
interval could fire duplicate page fetches while one was in flight.

Markdown worker pool: cancelling a client no longer terminates the
in-flight worker. The cancelled flag already discards stale results, so
recycling avoids a full worker respawn on every conversation switch.
@gary149

gary149 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Superseded by the perf series #2410 (models payload), #2412 (conversation switching and create-response seeding), #2413 (stream update batching) and #2414 (send handler round trips), which land the same goals split into independently mergeable changes. All seven PRs in the series are now merged.

@gary149 gary149 closed this Jul 2, 2026
@gary149 gary149 deleted the perf/trim-payloads-hot-paths branch July 2, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant