feat: send SIGHUP to MCP container after config updates#316
Conversation
After MCPConfigResource.Update() writes a new config.yaml, send SIGHUP to the running MCP container so it hot-reloads without a full restart. - Add ContainerSignal() to the Docker client wrapper, using the ContainerKill API with an arbitrary signal - Add signalConfigReload() to MCPConfigResource, called after config write on Update only (not Create, since the container doesn't exist yet) - Signal failure is non-fatal: the config file is already correct on disk and will be picked up on next container restart
📝 WalkthroughWalkthroughAdds a Docker client method to send arbitrary signals to containers and changes the MCP config resource to regenerate Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
server/internal/docker/docker.go (1)
536-542: Consider wrapping errors for consistency with other methods.Other methods in this file (e.g.,
ContainerStop,ContainerRemove,ContainerInspect) wrap errors witherrTranslateand contextual messages. While the caller treats errors as non-fatal warnings, consistent error handling would improve debuggability.♻️ Suggested improvement
func (d *Docker) ContainerSignal(ctx context.Context, containerID string, signal string) error { - return d.client.ContainerKill(ctx, containerID, signal) + if err := d.client.ContainerKill(ctx, containerID, signal); err != nil { + return fmt.Errorf("failed to signal container: %w", errTranslate(err)) + } + return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/docker/docker.go` around lines 536 - 542, ContainerSignal currently returns the raw error from d.client.ContainerKill; update it to mirror other methods by capturing the error, wrapping it with errTranslate and a short contextual message (e.g., "failed to signal container <id> with <signal>") before returning; locate the ContainerSignal function and replace the direct return of d.client.ContainerKill with error handling that calls errTranslate(err, containerID) (and includes the containerID and signal in the context message) so error reporting is consistent with ContainerStop, ContainerRemove, and ContainerInspect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@server/internal/docker/docker.go`:
- Around line 536-542: ContainerSignal currently returns the raw error from
d.client.ContainerKill; update it to mirror other methods by capturing the
error, wrapping it with errTranslate and a short contextual message (e.g.,
"failed to signal container <id> with <signal>") before returning; locate the
ContainerSignal function and replace the direct return of d.client.ContainerKill
with error handling that calls errTranslate(err, containerID) (and includes the
containerID and signal in the context message) so error reporting is consistent
with ContainerStop, ContainerRemove, and ContainerInspect.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c044bef3-cf90-46fb-8156-6900c8ccbef1
📒 Files selected for processing (2)
server/internal/docker/docker.goserver/internal/orchestrator/swarm/mcp_config_resource.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
server/internal/orchestrator/swarm/mcp_config_resource.go (1)
282-329: Well-documented best-effort implementation.The method correctly implements the best-effort semantics with clear differentiation between systemic errors (injector failures) and transient issues (container not found, signal failed). The TODO referencing PLAT-527 for the SIGTERM→SIGHUP transition is helpful for tracking.
Consider adding
ServiceInstanceIDto log entries for observability.Adding the service instance ID as a structured field would improve traceability in multi-service environments, especially for the debug log at line 310 where no container ID is available.
♻️ Optional: Add ServiceInstanceID to log entries
container, err := GetServiceContainer(ctx, dockerClient, r.ServiceInstanceID) if err != nil { if errors.Is(err, ErrNoServiceContainer) { - logger.Debug().Msg("no running MCP container found, skipping config reload signal") + logger.Debug(). + Str("service_instance_id", r.ServiceInstanceID). + Msg("no running MCP container found, skipping config reload signal") return nil } - logger.Warn().Err(err).Msg("failed to find service container for config reload signal") + logger.Warn().Err(err). + Str("service_instance_id", r.ServiceInstanceID). + Msg("failed to find service container for config reload signal") return nil } if err := dockerClient.ContainerSignal(ctx, container.ID, "SIGTERM"); err != nil { logger.Warn().Err(err). Str("container_id", container.ID). + Str("service_instance_id", r.ServiceInstanceID). Msg("failed to send SIGTERM to MCP container") return nil } logger.Info(). Str("container_id", container.ID). + Str("service_instance_id", r.ServiceInstanceID). Msg("sent SIGTERM to MCP container for config reload")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/internal/orchestrator/swarm/mcp_config_resource.go` around lines 282 - 329, The logs in signalConfigReload omit the ServiceInstanceID, reducing traceability; update all logger calls in signalConfigReload (notably the logger.Debug() when ErrNoServiceContainer, the logger.Warn() on GetServiceContainer failure, the logger.Warn() on ContainerSignal failure, and the logger.Info() after sending the signal) to include the ServiceInstanceID (r.ServiceInstanceID) as a structured field so every branch emits the service instance identifier for observability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@server/internal/orchestrator/swarm/mcp_config_resource.go`:
- Around line 282-329: The logs in signalConfigReload omit the
ServiceInstanceID, reducing traceability; update all logger calls in
signalConfigReload (notably the logger.Debug() when ErrNoServiceContainer, the
logger.Warn() on GetServiceContainer failure, the logger.Warn() on
ContainerSignal failure, and the logger.Info() after sending the signal) to
include the ServiceInstanceID (r.ServiceInstanceID) as a structured field so
every branch emits the service instance identifier for observability.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0d1a8c07-a810-45b7-a8ef-f183df05a9cf
📒 Files selected for processing (1)
server/internal/orchestrator/swarm/mcp_config_resource.go
| Str("container_id", container.ID). | ||
| Str("signal", signal). | ||
| Msg("failed to signal MCP container") | ||
| return nil |
There was a problem hiding this comment.
Are we treating this return as a success?
There was a problem hiding this comment.
Dave fixed the root issue in MCP, so I was able to revert back to a simpler version of the code with no workarounds. However, sending the signals is still "best-effort" - if it fails, we log a message and move on.
03bdcf1 to
a820845
Compare
mmols
left a comment
There was a problem hiding this comment.
Works from my end - I tested various config updates and updates to target_nodes.
Summary
MCPConfigResource.Update()rewritesconfig.yaml, the running MCP container now receivesSIGHUPto trigger a hot config reload — no container restart neededforce_update=true)ContainerSignal(ctx, containerID, signal)method to the Docker client wrapper, usable by any service typeTest plan
docker kill --signal SIGHUPtriggers reload in MCP container logs (Received SIGHUP, reloading configuration...)allow_writesflip (RO → RW credential switch) works via SIGHUP — MCP server closes old DB connections and reconnects with new credentialsPLAT-480