Skip to content

Performance: [#7756 follow-up] Room-broadcast NEW_CHANGES fan-out to drop ~5-7% per-recipient packet construction #7780

@JohnMcLear

Description

@JohnMcLear

Background

CPU profiling against develop (and against #7775 + #7776 stacked, which removes the upstream log4js cost) attributes ~10% of total process CPU at the dive cliff to socket.io's per-recipient packet construction chain inside PadMessageHandler.updatePadClients:

3.36% emit                [socket.io/dist/socket.js]
3.56% packet              [socket.io/dist/socket.js]
3.31% _packet             [socket.io/dist/client.js]
2.49% writeToEngine       [socket.io/dist/client.js]
2.23% sendPacket          [engine.io/build/socket.js]

The fan-out loop today does socket.emit('message', msg) once per recipient per missed revision:

// src/node/handler/PadMessageHandler.ts:950-1008
exports.updatePadClients = async (pad: PadType) => {
  const roomSockets = _getRoomSockets(pad.id);
  // ...
  await Promise.all(roomSockets.map(async (socket) => {
    while (sessioninfo.rev < pad.getHeadRevisionNumber()) {
      // ... build msg ...
      socket.emit('message', msg);  // packet construction per recipient
      sessioninfo.rev = r;
    }
  }));
};

In steady state every socket is at `head - 1`, so for N sockets behind by 1 rev that's N packet constructions of essentially identical content. The only per-recipient differing fields in `msg` are `timeDelta` and `currentTime` — and those fields are only consumed by the timeslider (`broadcast.ts`), not the live editor (`collab_client.ts:191-209` reads `{newRev, changeset, author, apool}` and nothing else).

The opportunity

socket.io's `room.emit()` / `io.in(roomId).emit()` uses the adapter `broadcast()` path, which calls the encoder once and then loops `writeToEngine` per recipient:

// socket.io-adapter/dist/in-memory-adapter.js:112
broadcast(packet, opts) {
  const encodedPackets = this._encode(packet, packetOpts);  // ONCE
  // ...
  socket.client.writeToEngine(encodedPackets, packetOpts);  // per socket
}

Switching the steady-state path from `socket.emit` per recipient to `io.in(padId).emit` would collapse N packet constructions into 1. The realistic CPU savings at the dive cliff are 5-7% (some portion of the ~10% socket.io overhead — encoding and packet construction are shared, but the per-socket transport write still happens N times).

Why this needs design before implementing

Naively swapping `socket.emit` for `io.in(padId).emit` in the loop breaks the catch-up case:

  • Steady state: all sockets at `head - 1`, one new rev → `room.emit(head)` works cleanly.
  • Catch-up: one socket at `head - 5`, others at `head - 1` → a single `room.emit(head)` would be silently dropped by the lagger (`collab_client.ts:202-205` enforces `newRev === rev + 1` and `return`s on mismatch).

Two viable shapes:

Shape A — split steady-state from catch-up.

  • For the most-recent rev only: `io.in(padId).except(stragglers).emit(...)` to all sockets at `head - 1`.
  • For each straggler: individual `socket.emit` loop catches them up.
  • Track stragglers by walking sessioninfos at fan-out entry.

Shape B — push catch-up out of `updatePadClients`.

  • updatePadClients only ever broadcasts the latest rev to the room.
  • A client that detects it's behind (`newRev !== rev + 1`) requests resend via a new `CLIENT_REQUEST_RESEND` message, which the server handles in a separate code path.
  • More invasive but cleaner long-term: keeps the hot path simple and pushes catch-up to a slow path.

I'd favour Shape A as a first cut — local change, no protocol additions, the straggler-tracking is cheap (`sessioninfo.rev !== headBefore` check per socket once per fan-out).

Acceptance

  • Settings flag (settings.roomBroadcastNewChanges default false) so it can be A/B tested in the dive.
  • N=3 dive measurement before merging.
  • No client-side change required for Shape A.

Out of scope

  • The historicalAuthorData / pool fan-out path (covered by #7769 — closed, but the design problem is the same).
  • Worker-thread offload of OT — falsified, see experiment/worker-thread-applytotext branch.
  • engine.io transport-level packing — see #7774 for the existing flush-deferral lever.

Discovered as part of the #7756 scaling dive (dive doc PR #7765).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions