Skip to content

Orphaned eca server processes after eca-stop/eca-restart (non-blocking SIGTERM, no kill-emacs-hook) #243

@agzam

Description

@agzam

Describe the bug

eca-process-stop sends SIGTERM via kill-process and proceeds immediately without verifying the underlying eca server actually exited. If the server hangs or stalls during shutdown, Emacs cleans up its session state via eca-delete-session while the CLI process keeps running indefinitely.

There is also no kill-emacs-hook registered to stop sessions on Emacs exit, so daemon-mode users leak every session they ever opened.

Over time, orphaned eca server processes accumulate. They continue to hold OAuth state in memory, periodically attempt token refresh, and compete with other ECA processes for the shared ~/.cache/eca/db.transit.json. This compounds the OAuth refresh race condition described in editor-code-assistant/eca#462.

Diagnostic data

$ ps -eo pid,etime,command | grep "[e]ca server"
99996  1-23:55  /opt/homebrew/bin/eca server
97576  2-00:35  /opt/homebrew/bin/eca server
10027    04:35  /opt/homebrew/bin/eca server
29094       59  /opt/homebrew/bin/eca server
29434       55  /opt/homebrew/bin/eca server

At the time I captured this, my live Emacs (daemon mode) had only 3 sessions in eca--sessions:

(mapcar (lambda (pair)
          (let ((s (cdr pair)))
            (list :id (eca--session-id s)
                  :workspaces (eca--session-workspace-folders s)
                  :pid (when (process-live-p (eca--session-process s))
                         (process-id (eca--session-process s))))))
        eca--sessions)
;; => ((:id 8 :workspaces (".../eca-emacs/master/") :pid 29434)
;;     (:id 7 :workspaces (".../prisma.el/")        :pid 29094)
;;     (:id 6 :workspaces (".../singer-io/")        :pid 10027))

The two 2-day-old processes (PIDs 97576 and 99996) had no live Emacs session referencing them. They were orphaned.

Root cause

In eca-process.el:

(defun eca-process-stop (session)
  "Stop the eca process for SESSION if running."
  (when session
    (kill-process (eca--session-process session))   ; SIGTERM, non-blocking
    (kill-buffer (eca-process--buffer-name session))
    ...))

kill-process is non-blocking on POSIX. Combined with eca-stop sending shutdown request and exit notification first, the protocol relies entirely on the server's graceful-exit path. If the server hangs in cleanup (e.g. waiting on an MCP server, blocked HTTP, deadlock), nothing escalates the kill.

eca.el does not install any kill-emacs-hook, so daemon-mode Emacs leaks all sessions on Emacs shutdown.

Proposed fix

  1. In eca-process-stop, after kill-process, poll process-live-p for a short timeout (e.g. 2 seconds). If still alive, escalate via SIGKILL:

    (defun eca-process-stop (session)
      (when session
        (let* ((proc (eca--session-process session))
               (pid  (and proc (process-id proc))))
          (when (process-live-p proc)
            (kill-process proc))
          ;; Wait briefly for graceful exit
          (let ((deadline (+ (float-time) 2.0)))
            (while (and (process-live-p proc) (< (float-time) deadline))
              (sleep-for 0.1)))
          ;; Escalate if still alive
          (when (and pid (process-live-p proc))
            (signal-process pid 'SIGKILL))
          ...)))
  2. Register a kill-emacs-hook (on first eca invocation or in eca-mode initialization) that calls eca-stop on every entry in eca--sessions. This makes ECA a good citizen on Emacs exit and reduces orphan accumulation for non-daemon users.

  3. Optional: provide M-x eca-kill-orphaned-servers, which lists running eca server processes not referenced by any session in eca--sessions and offers interactive cleanup. This is the only practical recovery for users who already have orphans.

Reproduction (rough)

The bug is timing-dependent. Conceptual repro:

  1. Start an ECA session.
  2. Cause the eca-server's shutdown handler to hang (attach a debugger, or kill -STOP <pid> the server).
  3. Call M-x eca-stop.
  4. Observe: Emacs side cleans up immediately; the CLI process continues running.

In real-world usage, this manifests as long-uptime eca server processes that no Emacs session references. I cannot reliably reproduce the underlying server hang, only the symptom.

Severity

By itself this is resource leakage (memory, file descriptors, periodic background HTTP calls). Combined with editor-code-assistant/eca#462, every orphan becomes a competitor in the OAuth refresh race, directly causing the daily auth failures described there.

Environment

  • macOS, emacs --daemon
  • ECA latest (eca-emacs master, ECA server installed via Homebrew)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions