Skip to content

MCP server leaking over long running clients with multiple sessions #1163

@Stono

Description

@Stono

Hey,
So i've been debugging an issue where i've seen MCP server processes build up on my long lived copilot-sdk pods.
More context as to what I built is here, but at a high level we pop work of a queue and process it with copilot-cli via copilot-sdk.

This is the symptom:

[atcloud@atai-765cbcc889-wzlxs atai]$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
atcloud        1  0.3  1.6 74733124 262104 ?     Ssl  10:58   0:17 /usr/local/autotrader/atai/atai start-worker
atcloud       15  0.4  1.6 74737288 269632 ?     Ssl  10:58   0:22 /usr/local/autotrader/atai/atai start-worker-child
atcloud       27  0.1  1.9 18088604 320432 ?     Sl   10:58   0:08 node /usr/local/autotrader/atai/node_modules/@github/copilot/index.js --allow-all-tools --allow-all-urls --allow-all-paths --disable-mcp-server=gi
atcloud      140  0.2  2.2 74770060 360172 ?     Sl   11:12   0:09 /usr/local/autotrader/atai/atai mcp
atcloud     3434  0.2  2.0 74737356 332484 ?     Sl   11:37   0:06 /usr/local/autotrader/atai/atai mcp
atcloud     9684  0.2  1.9 74737292 314188 ?     Sl   11:57   0:04 /usr/local/autotrader/atai/atai mcp
atcloud    13970  0.0  0.0   9028  3328 pts/0    R+   12:27   0:00 ps aux

So as you can see, bunch of a mcp servers.

Our flow is basically:

  • start copilot
  • received a work item
  • create a session
  • process request
  • stop and delete session

The copilot process lives on but sessions are per request. Seeing as mcp configuration is tied to the session, i'd expect session teardown to stop the MCP server but it does not.

I actually asked copilot to investigate this across my app, the sdk and the cli, here's its report which I think explains it better than I!

Summary

client.deleteSession() / session.disconnect() leave session-scoped stdio MCP subprocesses running until client.stop() / client.forceStop()

When a session is created with mcpServers, calling the SDK's session cleanup APIs does not tear down the stdio MCP subprocesses started for that session.

In my case the MCP server is atai mcp, but I do not think this is specific to that server. The important part is that the MCP server is configured per session via mcpServers and started as a stdio subprocess.

client.deleteSession(sessionId) succeeds, but the MCP child process remains alive under the long-lived Copilot CLI process. The child only disappears when client.stop() or client.forceStop() terminates the CLI process itself.

Why I am filing this in @github/copilot-sdk

This is observable through the SDK's public lifecycle APIs:

  • session.disconnect() sends session.destroy
  • client.deleteSession(sessionId) sends session.delete
  • both look like the correct session-scoped cleanup APIs for session-owned resources
  • neither actually tears down the session-owned MCP subprocesses

So from an SDK consumer's point of view, the session lifecycle contract is incomplete or at least surprising.

That said, after tracing the installed code, the likely implementation fix is in the bundled @github/copilot CLI, because that is where session.destroy / session.delete are handled and where MCP teardown appears to be missing.

Environment

  • @github/copilot-sdk 0.3.0
  • @github/copilot 1.0.36
  • useStdio: true
  • session-scoped stdio MCP server configured via mcpServers
  • reproduced in a long-lived worker and in a standalone local repro script

Usage Pattern

My real application has:

  • one long-lived CopilotClient
  • one new session per job
  • a session-scoped stdio MCP server configured through mcpServers
  • per-job cleanup of session.abort() followed by client.deleteSession(session.sessionId)
  • whole-worker cleanup of client.forceStop() only when the worker exits

That initially looked like it might be an application bug, but the same leak reproduces in a standalone script that only creates a client, creates one session, sends one prompt, deletes the session, and inspects the child processes.

Minimal Repro

import { spawnSync } from "node:child_process";
import { CopilotClient, approveAll } from "@github/copilot-sdk";

const client = new CopilotClient({
  useLoggedInUser: true,
  useStdio: true,
  cwd: process.cwd(),
  cliArgs: [
    "--allow-all-tools",
    "--allow-all-urls",
    "--allow-all-paths",
    "--autopilot",
    "--no-ask-user",
    "--no-auto-update"
  ]
});

const session = await client.createSession({
  clientName: "repro",
  model: "gpt-5.4-mini",
  onPermissionRequest: approveAll,
  streaming: false,
  workingDirectory: process.cwd(),
  infiniteSessions: { enabled: true },
  mcpServers: {
    atai: {
      type: "stdio",
      ...whatever,
      tools: ["*"]
    }
  }
});

await session.sendAndWait({ prompt: "Reply with exactly ok." }, 30000);

// At this point the MCP subprocess is running under the Copilot CLI process.

await client.deleteSession(session.sessionId);

// Expected: the session-scoped MCP subprocess exits here.
// Actual: the MCP subprocess is still alive.

await client.forceStop();

// Only here does the MCP subprocess disappear.

Observed Behavior

  1. Create a session with a stdio MCP server in mcpServers.
  2. Send a prompt so the MCP server is actually started.
  3. Call client.deleteSession(sessionId).
  4. Observe that the MCP subprocess is still alive.
  5. Call client.stop() or client.forceStop().
  6. Observe that the MCP subprocess finally disappears when the entire Copilot CLI process is terminated.

I also checked session.disconnect(). It behaves the same way, which makes sense if both session.destroy and session.delete share the same CLI cleanup path.

Expected Behavior

I would expect one of these to be true:

  1. session.disconnect() and client.deleteSession() should fully tear down session-owned MCP servers and subprocesses.
  2. Or the SDK documentation should explicitly state that these APIs only perform logical session cleanup, and that process cleanup for session-owned MCP servers requires client-level teardown.

I think the first behavior is the correct one, because mcpServers are configured per session and feel like session-owned resources.

What I Found In The Installed Code

SDK Side

From the installed SDK code:

  • session.disconnect() sends session.destroy
  • session.abort() sends session.abort
  • client.deleteSession() sends session.delete
  • client.stop() and client.forceStop() are the only SDK paths I found that terminate the spawned Copilot CLI process itself

So on the SDK side, the lifecycle methods are thin RPC wrappers around CLI behavior.

CLI Side (@github/copilot)

From the installed bundled CLI code:

  • session.destroy and session.delete both route through the same session cleanup path
  • that cleanup path removes listeners, exporters, and bookkeeping
  • I could not find session-owned MCP teardown in that cleanup path
  • the concrete session dispose() implementation I found shuts down shell context and telemetry, but I could not find MCP teardown there either
  • there is also a separate mcpHostCache.cleanup() helper for cached MCP hosts, and I could not find a caller for that either

So the user-visible problem appears at the SDK API boundary, but the likely code change belongs in @github/copilot session teardown.

Why This Matters

In a long-lived worker or daemon process, this leaks one or more MCP subprocesses per completed session.

Over time that can mean:

  • stale MCP servers remaining alive after session completion
  • unnecessary process and memory growth
  • a risk of cross-session state surviving longer than expected
  • pressure to recreate the entire CopilotClient per job, which defeats the point of keeping a long-lived client around

Temporary Workaround

The only reliable workaround I found is to tear down the whole client process:

  • await client.stop()
  • or await client.forceStop()

That works because it terminates the parent Copilot CLI process, which also takes the leaked MCP subprocess with it. But it is a much heavier operation than session cleanup and changes client lifetime semantics.

Overlap With @github/copilot

I think this belongs in both places, but for different reasons:

  • @github/copilot-sdk: the problem is exposed through the public SDK lifecycle APIs, and the current behavior is surprising for SDK consumers
  • @github/copilot: the implementation fix probably needs to happen in the CLI bundle's session teardown path, because that is where session.destroy / session.delete are handled and where MCP teardown appears to be missing

If you would prefer, I can file the companion issue in @github/copilot instead and link the two.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions