Performance: [#7756 follow-up] Room-broadcast NEW_CHANGES fan-out to drop ~5-7% per-recipient packet construction

## Background

CPU profiling against develop (and against #7775 + #7776 stacked, which removes the upstream log4js cost) attributes ~10% of total process CPU at the dive cliff to socket.io's per-recipient packet construction chain inside `PadMessageHandler.updatePadClients`:

```
3.36% emit                [socket.io/dist/socket.js]
3.56% packet              [socket.io/dist/socket.js]
3.31% _packet             [socket.io/dist/client.js]
2.49% writeToEngine       [socket.io/dist/client.js]
2.23% sendPacket          [engine.io/build/socket.js]
```

The fan-out loop today does `socket.emit('message', msg)` once per recipient per missed revision:

```ts
// src/node/handler/PadMessageHandler.ts:950-1008
exports.updatePadClients = async (pad: PadType) => {
  const roomSockets = _getRoomSockets(pad.id);
  // ...
  await Promise.all(roomSockets.map(async (socket) => {
    while (sessioninfo.rev < pad.getHeadRevisionNumber()) {
      // ... build msg ...
      socket.emit('message', msg);  // packet construction per recipient
      sessioninfo.rev = r;
    }
  }));
};
```

In steady state every socket is at \`head - 1\`, so for N sockets behind by 1 rev that's **N packet constructions of essentially identical content**. The only per-recipient differing fields in \`msg\` are \`timeDelta\` and \`currentTime\` — and those fields are **only consumed by the timeslider** (\`broadcast.ts\`), not the live editor (\`collab_client.ts:191-209\` reads \`{newRev, changeset, author, apool}\` and nothing else).

## The opportunity

socket.io's \`room.emit()\` / \`io.in(roomId).emit()\` uses the adapter \`broadcast()\` path, which calls the encoder **once** and then loops \`writeToEngine\` per recipient:

```js
// socket.io-adapter/dist/in-memory-adapter.js:112
broadcast(packet, opts) {
  const encodedPackets = this._encode(packet, packetOpts);  // ONCE
  // ...
  socket.client.writeToEngine(encodedPackets, packetOpts);  // per socket
}
```

Switching the steady-state path from \`socket.emit\` per recipient to \`io.in(padId).emit\` would collapse N packet constructions into 1. The realistic CPU savings at the dive cliff are 5-7% (some portion of the ~10% socket.io overhead — encoding and packet construction are shared, but the per-socket transport write still happens N times).

## Why this needs design before implementing

Naively swapping \`socket.emit\` for \`io.in(padId).emit\` in the loop breaks the **catch-up case**:

- Steady state: all sockets at \`head - 1\`, one new rev → \`room.emit(head)\` works cleanly.
- Catch-up: one socket at \`head - 5\`, others at \`head - 1\` → a single \`room.emit(head)\` would be silently dropped by the lagger (\`collab_client.ts:202-205\` enforces \`newRev === rev + 1\` and \`return\`s on mismatch).

Two viable shapes:

**Shape A — split steady-state from catch-up.**
- For the most-recent rev only: \`io.in(padId).except(stragglers).emit(...)\` to all sockets at \`head - 1\`.
- For each straggler: individual \`socket.emit\` loop catches them up.
- Track stragglers by walking sessioninfos at fan-out entry.

**Shape B — push catch-up out of \`updatePadClients\`.**
- ``updatePadClients`` only ever broadcasts the latest rev to the room.
- A client that detects it's behind (\`newRev !== rev + 1\`) requests resend via a new \`CLIENT_REQUEST_RESEND\` message, which the server handles in a separate code path.
- More invasive but cleaner long-term: keeps the hot path simple and pushes catch-up to a slow path.

I'd favour **Shape A** as a first cut — local change, no protocol additions, the straggler-tracking is cheap (\`sessioninfo.rev !== headBefore\` check per socket once per fan-out).

## Acceptance

- Settings flag (``settings.roomBroadcastNewChanges`` default false) so it can be A/B tested in the dive.
- N=3 dive measurement before merging.
- No client-side change required for Shape A.

## Out of scope

- The historicalAuthorData / pool fan-out path (covered by [#7769](https://github.com/ether/etherpad/pull/7769) — closed, but the design problem is the same).
- Worker-thread offload of OT — falsified, see ``experiment/worker-thread-applytotext`` branch.
- engine.io transport-level packing — see [#7774](https://github.com/ether/etherpad/pull/7774) for the existing flush-deferral lever.

Discovered as part of the #7756 scaling dive ([dive doc PR #7765](https://github.com/ether/etherpad/pull/7765)).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance: [#7756 follow-up] Room-broadcast NEW_CHANGES fan-out to drop ~5-7% per-recipient packet construction #7780

Background

The opportunity

Why this needs design before implementing

Acceptance

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Performance: [#7756 follow-up] Room-broadcast NEW_CHANGES fan-out to drop ~5-7% per-recipient packet construction #7780

Description

Background

The opportunity

Why this needs design before implementing

Acceptance

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions