MCP server leaking over long running clients with multiple sessions

Hey, 
So i've been debugging an issue where i've seen MCP server processes build up on my long lived `copilot-sdk` pods.
More context as to what I built is [here](https://karlstoney.com/from-pr-review-bot-atai-cli-to-an-autonomous-work-queue/), but at a high level we pop work of a queue and process it with `copilot-cli` via `copilot-sdk`.  

 This is the symptom:
```
[atcloud@atai-765cbcc889-wzlxs atai]$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
atcloud        1  0.3  1.6 74733124 262104 ?     Ssl  10:58   0:17 /usr/local/autotrader/atai/atai start-worker
atcloud       15  0.4  1.6 74737288 269632 ?     Ssl  10:58   0:22 /usr/local/autotrader/atai/atai start-worker-child
atcloud       27  0.1  1.9 18088604 320432 ?     Sl   10:58   0:08 node /usr/local/autotrader/atai/node_modules/@github/copilot/index.js --allow-all-tools --allow-all-urls --allow-all-paths --disable-mcp-server=gi
atcloud      140  0.2  2.2 74770060 360172 ?     Sl   11:12   0:09 /usr/local/autotrader/atai/atai mcp
atcloud     3434  0.2  2.0 74737356 332484 ?     Sl   11:37   0:06 /usr/local/autotrader/atai/atai mcp
atcloud     9684  0.2  1.9 74737292 314188 ?     Sl   11:57   0:04 /usr/local/autotrader/atai/atai mcp
atcloud    13970  0.0  0.0   9028  3328 pts/0    R+   12:27   0:00 ps aux
```

So as you can see, bunch of a mcp servers.

Our flow is basically:
- start copilot
- received a work item
- create a session
- process request
- stop and delete session

The copilot process lives on but sessions are per request.  Seeing as `mcp` configuration is tied to the session, i'd expect session teardown to stop the MCP server but it does not.

I actually asked copilot to investigate this across my app, the sdk and the cli, here's its report which I think explains it better than I!

## Summary

`client.deleteSession()` / `session.disconnect()` leave session-scoped stdio MCP subprocesses running until `client.stop()` / `client.forceStop()`

When a session is created with `mcpServers`, calling the SDK's session cleanup APIs does not tear down the stdio MCP subprocesses started for that session.

In my case the MCP server is `atai mcp`, but I do not think this is specific to that server. The important part is that the MCP server is configured per session via `mcpServers` and started as a stdio subprocess.

`client.deleteSession(sessionId)` succeeds, but the MCP child process remains alive under the long-lived Copilot CLI process. The child only disappears when `client.stop()` or `client.forceStop()` terminates the CLI process itself.

## Why I am filing this in `@github/copilot-sdk`

This is observable through the SDK's public lifecycle APIs:

- `session.disconnect()` sends `session.destroy`
- `client.deleteSession(sessionId)` sends `session.delete`
- both look like the correct session-scoped cleanup APIs for session-owned resources
- neither actually tears down the session-owned MCP subprocesses

So from an SDK consumer's point of view, the session lifecycle contract is incomplete or at least surprising.

That said, after tracing the installed code, the likely implementation fix is in the bundled `@github/copilot` CLI, because that is where `session.destroy` / `session.delete` are handled and where MCP teardown appears to be missing.

## Environment

- `@github/copilot-sdk` `0.3.0`
- `@github/copilot` `1.0.36`
- `useStdio: true`
- session-scoped stdio MCP server configured via `mcpServers`
- reproduced in a long-lived worker and in a standalone local repro script

## Usage Pattern

My real application has:

- one long-lived `CopilotClient`
- one new session per job
- a session-scoped stdio MCP server configured through `mcpServers`
- per-job cleanup of `session.abort()` followed by `client.deleteSession(session.sessionId)`
- whole-worker cleanup of `client.forceStop()` only when the worker exits

That initially looked like it might be an application bug, but the same leak reproduces in a standalone script that only creates a client, creates one session, sends one prompt, deletes the session, and inspects the child processes.

## Minimal Repro

```ts
import { spawnSync } from "node:child_process";
import { CopilotClient, approveAll } from "@github/copilot-sdk";

const client = new CopilotClient({
  useLoggedInUser: true,
  useStdio: true,
  cwd: process.cwd(),
  cliArgs: [
    "--allow-all-tools",
    "--allow-all-urls",
    "--allow-all-paths",
    "--autopilot",
    "--no-ask-user",
    "--no-auto-update"
  ]
});

const session = await client.createSession({
  clientName: "repro",
  model: "gpt-5.4-mini",
  onPermissionRequest: approveAll,
  streaming: false,
  workingDirectory: process.cwd(),
  infiniteSessions: { enabled: true },
  mcpServers: {
    atai: {
      type: "stdio",
      ...whatever,
      tools: ["*"]
    }
  }
});

await session.sendAndWait({ prompt: "Reply with exactly ok." }, 30000);

// At this point the MCP subprocess is running under the Copilot CLI process.

await client.deleteSession(session.sessionId);

// Expected: the session-scoped MCP subprocess exits here.
// Actual: the MCP subprocess is still alive.

await client.forceStop();

// Only here does the MCP subprocess disappear.
```

## Observed Behavior

1. Create a session with a stdio MCP server in `mcpServers`.
2. Send a prompt so the MCP server is actually started.
3. Call `client.deleteSession(sessionId)`.
4. Observe that the MCP subprocess is still alive.
5. Call `client.stop()` or `client.forceStop()`.
6. Observe that the MCP subprocess finally disappears when the entire Copilot CLI process is terminated.

I also checked `session.disconnect()`. It behaves the same way, which makes sense if both `session.destroy` and `session.delete` share the same CLI cleanup path.

## Expected Behavior

I would expect one of these to be true:

1. `session.disconnect()` and `client.deleteSession()` should fully tear down session-owned MCP servers and subprocesses.
2. Or the SDK documentation should explicitly state that these APIs only perform logical session cleanup, and that process cleanup for session-owned MCP servers requires client-level teardown.

I think the first behavior is the correct one, because `mcpServers` are configured per session and feel like session-owned resources.

## What I Found In The Installed Code

### SDK Side

From the installed SDK code:

- `session.disconnect()` sends `session.destroy`
- `session.abort()` sends `session.abort`
- `client.deleteSession()` sends `session.delete`
- `client.stop()` and `client.forceStop()` are the only SDK paths I found that terminate the spawned Copilot CLI process itself

So on the SDK side, the lifecycle methods are thin RPC wrappers around CLI behavior.

### CLI Side (`@github/copilot`)

From the installed bundled CLI code:

- `session.destroy` and `session.delete` both route through the same session cleanup path
- that cleanup path removes listeners, exporters, and bookkeeping
- I could not find session-owned MCP teardown in that cleanup path
- the concrete session `dispose()` implementation I found shuts down shell context and telemetry, but I could not find MCP teardown there either
- there is also a separate `mcpHostCache.cleanup()` helper for cached MCP hosts, and I could not find a caller for that either

So the user-visible problem appears at the SDK API boundary, but the likely code change belongs in `@github/copilot` session teardown.

## Why This Matters

In a long-lived worker or daemon process, this leaks one or more MCP subprocesses per completed session.

Over time that can mean:

- stale MCP servers remaining alive after session completion
- unnecessary process and memory growth
- a risk of cross-session state surviving longer than expected
- pressure to recreate the entire `CopilotClient` per job, which defeats the point of keeping a long-lived client around

## Temporary Workaround

The only reliable workaround I found is to tear down the whole client process:

- `await client.stop()`
- or `await client.forceStop()`

That works because it terminates the parent Copilot CLI process, which also takes the leaked MCP subprocess with it. But it is a much heavier operation than session cleanup and changes client lifetime semantics.

## Overlap With `@github/copilot`

I think this belongs in both places, but for different reasons:

- `@github/copilot-sdk`: the problem is exposed through the public SDK lifecycle APIs, and the current behavior is surprising for SDK consumers
- `@github/copilot`: the implementation fix probably needs to happen in the CLI bundle's session teardown path, because that is where `session.destroy` / `session.delete` are handled and where MCP teardown appears to be missing

If you would prefer, I can file the companion issue in `@github/copilot` instead and link the two.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCP server leaking over long running clients with multiple sessions #1163

Summary

Why I am filing this in `@github/copilot-sdk`

Environment

Usage Pattern

Minimal Repro

Observed Behavior

Expected Behavior

What I Found In The Installed Code

SDK Side

CLI Side (`@github/copilot`)

Why This Matters

Temporary Workaround

Overlap With `@github/copilot`

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MCP server leaking over long running clients with multiple sessions #1163

Description

Summary

Why I am filing this in @github/copilot-sdk

Environment

Usage Pattern

Minimal Repro

Observed Behavior

Expected Behavior

What I Found In The Installed Code

SDK Side

CLI Side (@github/copilot)

Why This Matters

Temporary Workaround

Overlap With @github/copilot

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why I am filing this in `@github/copilot-sdk`

CLI Side (`@github/copilot`)

Overlap With `@github/copilot`