Original Request
Add graceful shutdown handler for SIGTERM/SIGINT
Context: The server in packages/server/src/index.ts doesn't handle SIGTERM or SIGINT signals. On Toolforge, when the container is restarted or scaled, connections drop abruptly — SSE clients lose their stream mid-event, MongoDB connections may not close cleanly, and in-flight requests are terminated without response. Adding signal handlers to: (1) stop accepting new connections, (2) close SSE streams with a "server-shutdown" event so clients can reconnect, (3) flush pending bulk writes, and (4) close MongoDB connection pool — would improve reliability during deployments and container restarts.
Agent's Two Cents (could be wrong)
Everything below is the AI agent's best guess based on the current codebase.
Take with a grain of salt — the original request above is the only thing that came from a human.
Problem / Motivation
The standalone server (packages/server/src/index.ts) currently calls serve() and never captures the returned server handle, nor registers any signal handlers. When Toolforge sends SIGTERM during a container restart or scale-down, Node.js terminates immediately — dropping all active SSE streams in events.ts, leaving Mongoose connections in a dirty state, and abandoning any in-flight HTTP responses. This causes client-visible errors and potential data loss on every deployment.
Proposed Solution
Register process.on("SIGTERM", ...) and process.on("SIGINT", ...) handlers in the server startup block. On signal receipt the handler should:
- Stop accepting new connections — call
server.close() on the handle returned by @hono/node-server's serve().
- Notify SSE clients — emit a
server-shutdown event on the eventBus so connected SSE streams in events.ts can send a final event and close gracefully, giving clients a cue to reconnect.
- Close the revert-risk EventSource stream — shut down the Wikimedia EventSource opened by
startRevertRiskStream() so the process isn't held open by it.
- Flush pending writes — if any bulk-write batches are pending, flush them before closing.
- Close the MongoDB connection pool — call
mongoose.connection.close().
- Exit —
process.exit(0) after cleanup (with a hard timeout of ~10 s as a safety net).
Dependencies & Potential Blockers
@hono/node-server's serve() returns a Node http.Server — need to confirm it's captured and that server.close() stops new connections as expected.
- The
eventBus in packages/server/src/lib/eventBus.ts currently has no "shutdown" event type — a new event type will need to be added.
startRevertRiskStream() does not currently export a stop() function — it will need to expose one (the eventSource variable is module-scoped but not exported).
How to Validate
Scope Estimate
Small — the change is confined to the server startup block and a few supporting modules; no API surface changes.
Key Files/Modules Likely Involved
packages/server/src/index.ts — main startup block where signal handlers will be registered
packages/server/src/routes/events.ts — SSE endpoint; needs to listen for shutdown event
packages/server/src/lib/eventBus.ts — needs a new shutdown event type
packages/server/src/lib/revertRiskStream.ts — needs to export a stopRevertRiskStream() function
packages/server/src/db/connection.ts — may add a disconnectDB() helper
Rough Implementation Sketch
- Capture the return value of
serve() in a variable (const server = serve(...))
- Create an
async function gracefulShutdown(signal: string) that:
- Logs the signal
- Calls
server.close()
- Emits
shutdown on eventBus
- Calls a new
stopRevertRiskStream()
- Calls
mongoose.connection.close()
- Sets a
setTimeout(() => process.exit(1), 10_000) safety net
- Calls
process.exit(0) on success
- Register
process.on("SIGTERM", () => gracefulShutdown("SIGTERM")) and same for SIGINT
- In
events.ts, listen for the shutdown event on eventBus and write a final server-shutdown SSE event before resolving the stream promise
- In
revertRiskStream.ts, export stopRevertRiskStream() that calls eventSource?.close() and clears the retry timer
Open Questions
- Should the shutdown handler wait for in-flight HTTP requests to finish (drain), or just close after SSE/DB cleanup? A drain timeout of a few seconds might be appropriate.
- Is there a bulk-write buffer anywhere beyond the revert-risk ring buffer that needs flushing? (The ring buffer is in-memory and read-only, so it can be dropped safely.)
- Should the health endpoint return a "shutting down" status during the grace period to help load balancers route away?
Potential Risks or Gotchas
- If
serve() from @hono/node-server doesn't return a standard http.Server, the server.close() approach may need adjustment.
- The safety-net
setTimeout must call process.exit(1) with .unref() so it doesn't itself keep the process alive if everything else closes cleanly.
- On Toolforge, Kubernetes sends SIGTERM and then SIGKILL after 30 s by default. The 10 s internal timeout is well within that window, but worth documenting.
Original Request
Agent's Two Cents (could be wrong)
Problem / Motivation
The standalone server (
packages/server/src/index.ts) currently callsserve()and never captures the returned server handle, nor registers any signal handlers. When Toolforge sends SIGTERM during a container restart or scale-down, Node.js terminates immediately — dropping all active SSE streams inevents.ts, leaving Mongoose connections in a dirty state, and abandoning any in-flight HTTP responses. This causes client-visible errors and potential data loss on every deployment.Proposed Solution
Register
process.on("SIGTERM", ...)andprocess.on("SIGINT", ...)handlers in the server startup block. On signal receipt the handler should:server.close()on the handle returned by@hono/node-server'sserve().server-shutdownevent on theeventBusso connected SSE streams inevents.tscan send a final event and close gracefully, giving clients a cue to reconnect.startRevertRiskStream()so the process isn't held open by it.mongoose.connection.close().process.exit(0)after cleanup (with a hard timeout of ~10 s as a safety net).Dependencies & Potential Blockers
@hono/node-server'sserve()returns a Nodehttp.Server— need to confirm it's captured and thatserver.close()stops new connections as expected.eventBusinpackages/server/src/lib/eventBus.tscurrently has no "shutdown" event type — a new event type will need to be added.startRevertRiskStream()does not currently export astop()function — it will need to expose one (theeventSourcevariable is module-scoped but not exported).How to Validate
server-shutdownevent before the connection closes.Scope Estimate
Small — the change is confined to the server startup block and a few supporting modules; no API surface changes.
Key Files/Modules Likely Involved
packages/server/src/index.ts— main startup block where signal handlers will be registeredpackages/server/src/routes/events.ts— SSE endpoint; needs to listen for shutdown eventpackages/server/src/lib/eventBus.ts— needs a newshutdownevent typepackages/server/src/lib/revertRiskStream.ts— needs to export astopRevertRiskStream()functionpackages/server/src/db/connection.ts— may add adisconnectDB()helperRough Implementation Sketch
serve()in a variable (const server = serve(...))async function gracefulShutdown(signal: string)that:server.close()shutdownoneventBusstopRevertRiskStream()mongoose.connection.close()setTimeout(() => process.exit(1), 10_000)safety netprocess.exit(0)on successprocess.on("SIGTERM", () => gracefulShutdown("SIGTERM"))and same for SIGINTevents.ts, listen for theshutdownevent oneventBusand write a finalserver-shutdownSSE event before resolving the stream promiserevertRiskStream.ts, exportstopRevertRiskStream()that callseventSource?.close()and clears the retry timerOpen Questions
Potential Risks or Gotchas
serve()from@hono/node-serverdoesn't return a standardhttp.Server, theserver.close()approach may need adjustment.setTimeoutmust callprocess.exit(1)with.unref()so it doesn't itself keep the process alive if everything else closes cleanly.