Add graceful shutdown handler for SIGTERM/SIGINT

## Original Request

> Add graceful shutdown handler for SIGTERM/SIGINT
>
> Context: The server in packages/server/src/index.ts doesn't handle SIGTERM or SIGINT signals. On Toolforge, when the container is restarted or scaled, connections drop abruptly — SSE clients lose their stream mid-event, MongoDB connections may not close cleanly, and in-flight requests are terminated without response. Adding signal handlers to: (1) stop accepting new connections, (2) close SSE streams with a "server-shutdown" event so clients can reconnect, (3) flush pending bulk writes, and (4) close MongoDB connection pool — would improve reliability during deployments and container restarts.

## Agent's Two Cents (could be wrong)

> Everything below is the AI agent's best guess based on the current codebase.
> Take with a grain of salt — the original request above is the only thing that came from a human.

### Problem / Motivation

The standalone server (`packages/server/src/index.ts`) currently calls `serve()` and never captures the returned server handle, nor registers any signal handlers. When Toolforge sends SIGTERM during a container restart or scale-down, Node.js terminates immediately — dropping all active SSE streams in `events.ts`, leaving Mongoose connections in a dirty state, and abandoning any in-flight HTTP responses. This causes client-visible errors and potential data loss on every deployment.

### Proposed Solution

Register `process.on("SIGTERM", ...)` and `process.on("SIGINT", ...)` handlers in the server startup block. On signal receipt the handler should:

1. **Stop accepting new connections** — call `server.close()` on the handle returned by `@hono/node-server`'s `serve()`.
2. **Notify SSE clients** — emit a `server-shutdown` event on the `eventBus` so connected SSE streams in `events.ts` can send a final event and close gracefully, giving clients a cue to reconnect.
3. **Close the revert-risk EventSource stream** — shut down the Wikimedia EventSource opened by `startRevertRiskStream()` so the process isn't held open by it.
4. **Flush pending writes** — if any bulk-write batches are pending, flush them before closing.
5. **Close the MongoDB connection pool** — call `mongoose.connection.close()`.
6. **Exit** — `process.exit(0)` after cleanup (with a hard timeout of ~10 s as a safety net).

### Dependencies & Potential Blockers

- `@hono/node-server`'s `serve()` returns a Node `http.Server` — need to confirm it's captured and that `server.close()` stops new connections as expected.
- The `eventBus` in `packages/server/src/lib/eventBus.ts` currently has no "shutdown" event type — a new event type will need to be added.
- `startRevertRiskStream()` does not currently export a `stop()` function — it will need to expose one (the `eventSource` variable is module-scoped but not exported).

### How to Validate

- [ ] Send SIGTERM to the running server; verify it logs a shutdown message and exits with code 0 within ~10 s.
- [ ] While an SSE client is connected, send SIGTERM; verify the client receives a `server-shutdown` event before the connection closes.
- [ ] After SIGTERM, confirm no new HTTP requests are accepted (connection refused).
- [ ] Check MongoDB logs or Mongoose debug output to confirm the connection pool closes cleanly (no "topology was destroyed" warnings on next startup).
- [ ] Start the server, trigger SIGINT (Ctrl-C); same graceful behavior as SIGTERM.

### Scope Estimate

Small — the change is confined to the server startup block and a few supporting modules; no API surface changes.

### Key Files/Modules Likely Involved

- `packages/server/src/index.ts` — main startup block where signal handlers will be registered
- `packages/server/src/routes/events.ts` — SSE endpoint; needs to listen for shutdown event
- `packages/server/src/lib/eventBus.ts` — needs a new `shutdown` event type
- `packages/server/src/lib/revertRiskStream.ts` — needs to export a `stopRevertRiskStream()` function
- `packages/server/src/db/connection.ts` — may add a `disconnectDB()` helper

### Rough Implementation Sketch

- Capture the return value of `serve()` in a variable (`const server = serve(...)`)
- Create an `async function gracefulShutdown(signal: string)` that:
  - Logs the signal
  - Calls `server.close()`
  - Emits `shutdown` on `eventBus`
  - Calls a new `stopRevertRiskStream()`
  - Calls `mongoose.connection.close()`
  - Sets a `setTimeout(() => process.exit(1), 10_000)` safety net
  - Calls `process.exit(0)` on success
- Register `process.on("SIGTERM", () => gracefulShutdown("SIGTERM"))` and same for SIGINT
- In `events.ts`, listen for the `shutdown` event on `eventBus` and write a final `server-shutdown` SSE event before resolving the stream promise
- In `revertRiskStream.ts`, export `stopRevertRiskStream()` that calls `eventSource?.close()` and clears the retry timer

### Open Questions

- Should the shutdown handler wait for in-flight HTTP requests to finish (drain), or just close after SSE/DB cleanup? A drain timeout of a few seconds might be appropriate.
- Is there a bulk-write buffer anywhere beyond the revert-risk ring buffer that needs flushing? (The ring buffer is in-memory and read-only, so it can be dropped safely.)
- Should the health endpoint return a "shutting down" status during the grace period to help load balancers route away?

### Potential Risks or Gotchas

- If `serve()` from `@hono/node-server` doesn't return a standard `http.Server`, the `server.close()` approach may need adjustment.
- The safety-net `setTimeout` must call `process.exit(1)` with `.unref()` so it doesn't itself keep the process alive if everything else closes cleanly.
- On Toolforge, Kubernetes sends SIGTERM and then SIGKILL after 30 s by default. The 10 s internal timeout is well within that window, but worth documenting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add graceful shutdown handler for SIGTERM/SIGINT #9

Original Request

Agent's Two Cents (could be wrong)

Problem / Motivation

Proposed Solution

Dependencies & Potential Blockers

How to Validate

Scope Estimate

Key Files/Modules Likely Involved

Rough Implementation Sketch

Open Questions

Potential Risks or Gotchas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add graceful shutdown handler for SIGTERM/SIGINT #9

Description

Original Request

Agent's Two Cents (could be wrong)

Problem / Motivation

Proposed Solution

Dependencies & Potential Blockers

How to Validate

Scope Estimate

Key Files/Modules Likely Involved

Rough Implementation Sketch

Open Questions

Potential Risks or Gotchas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions