perf: trim API payloads and fix client hot paths#2408
Closed
gary149 wants to merge 1 commit into
Closed
Conversation
Conversation GET: stop shipping fields the client never reads. Drops the server-side reasoning accumulator and raw reasoning stream tokens; for tool-less messages also drops per-token stream markers and blanks the FinalAnswer text duplicate of message.content. Read path only, MongoDB documents and the legacy /api/conversation/[id] endpoint are unchanged. A 4.5KB assistant answer measured on prod shipped as 20.9KB of JSON. Models API: remove promptExamples and parameters from /api/v2/models. promptExamples is never read client-side (landing chips are hardcoded constants) and parameters is only used server-side. This payload is embedded in every page via the root layout. Session hook: the rolling-expiry updateOne no longer blocks POSTs, it was a serial MongoDB round-trip before resolve() on every message send. InfiniteScroll: replace the 250ms interval with a busy-guarded once-per-intersection observer that re-arms after each load. The interval could fire duplicate page fetches while one was in flight. Markdown worker pool: cancelling a client no longer terminates the in-flight worker. The cancelled flag already discards stale results, so recycling avoids a full worker respawn on every conversation switch.
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Payload and hot-path fixes from profiling hf.co/chat user journeys (initial load, conversation switching, new conversation, streaming) with Chrome instrumentation, then verifying each candidate fix against the code.
Conversation GET payload trim
GET /api/v2/conversations/[id]returned everything MongoDB stores. Measured on prod: a fresh conversation with a single 4.5 KB assistant answer shipped as 20.9 KB of JSON, and the same payload is embedded a second time in the SSR document.The response now drops, per message:
reasoning: server-side accumulator, zero client reads (rendering parses<think>blocks frommessage.content)updatesFinalAnswer.textduplicate ofmessage.content.ChatMessagerebuilds tool-less messages frommessage.contentvia its existing fallbacks; messages with tool calls keep their markers and final text since those drive text/tool interleaving.StatusandFinalAnswerentries stay (blanked text) becausegenerationStatechecks their presence. This is read-path only: stored documents are untouched, and the legacy/api/conversation/[id]endpoint still returns full updates. Notably the trim is deliberately not inresolveConversation, because the message-delete endpoint writesconversation.messagesback to MongoDB after passing through it.Two new tests cover the trim (tool-less) and the preservation (tool messages).
Models API trim
/api/v2/modelsis fetched in the root layout and embedded in every page (181 KB of the 318 KB prod HTML document). Removed:promptExamples: never read client-side, the landing chips come from hardcoded constantsparameters: only used server-side (buildPrompt)providers,prepromptandmultimodalAcceptedMimetypesstay (settings page, ChatWindow). The legacy/api/modelsendpoint is unchanged for external consumers. One server file (endpoints.ts) typedgenerateSettingsagainst the frontendModeltype and now usesBackendModel.Hot-path fixes
sessions.updateOnewas awaited beforeresolve()on every POST, adding a serial MongoDB round-trip to message sends. Now fire-and-forget with error logging.setIntervalcould trigger duplicate page fetches while one was in flight (handleVisiblehas no concurrency guard). Replaced with a busy-guarded, once-per-intersection observer that re-arms after each load, so pagination still continues when the sentinel remains visible after new items render.cancelledflag already discards stale results, so the worker is now recycled instead.Verification
npm run check(0 errors),npm run lint,npm test: 392 tests pass including the 2 new ones.