Conversation
Adds an MCP server that manages a single Docker container per process
lifetime, exposing a shell_exec tool for running arbitrary CLI commands
in an isolated environment with an optional host workspace mount.
Three profiles are provided, each with a Dockerfile, toolbox YAML, and
demo taskflow:
- base: debian:bookworm-slim + binutils, file, xxd, python3, curl, git
- malware-analysis: extends base with radare2, binwalk, yara, exiftool,
checksec, capstone, pwntools, volatility3
- network-analysis: extends base with nmap, tcpdump, tshark, netcat,
dnsutils, jq, httpie
New files:
- src/seclab_taskflows/mcp_servers/container_shell.py
- src/seclab_taskflows/containers/{base,malware_analysis,network_analysis}/Dockerfile
- src/seclab_taskflows/toolboxes/container_shell_{base,malware_analysis,network_analysis}.yaml
- src/seclab_taskflows/taskflows/container_shell/{README.md,demo_base,demo_malware_analysis,demo_network_analysis}.yaml
- scripts/build_container_images.sh
- scripts/run_container_shell_demo.sh
- tests/test_container_shell.py (14 tests, all mocked)
There was a problem hiding this comment.
Pull request overview
Adds a new container_shell MCP server plus supporting Docker images, toolboxes, and demo taskflows to enable running confirmed shell commands inside a long-lived (per-process) Docker container with an optional host workspace mount.
Changes:
- Introduces
src/seclab_taskflows/mcp_servers/container_shell.py(shell_exec) that lazily starts/stops a Docker container and executes commands viadocker exec. - Adds three container profiles (base / malware-analysis / network-analysis) with Dockerfiles and corresponding toolbox YAMLs.
- Adds demo taskflows + README and scripts to build images and run the demos; includes mocked unit tests for the new server/toolboxes.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/seclab_taskflows/mcp_servers/container_shell.py |
New MCP server implementing lazy-start container execution via Docker. |
tests/test_container_shell.py |
Mocked unit tests for container lifecycle and shell_exec, plus toolbox YAML loading. |
src/seclab_taskflows/toolboxes/container_shell_base.yaml |
Toolbox wiring for base container image + confirmation guardrail. |
src/seclab_taskflows/toolboxes/container_shell_malware_analysis.yaml |
Toolbox wiring for malware-analysis container profile. |
src/seclab_taskflows/toolboxes/container_shell_network_analysis.yaml |
Toolbox wiring for network-analysis container profile. |
src/seclab_taskflows/taskflows/container_shell/demo_base.yaml |
Headless demo taskflow using the base container toolbox. |
src/seclab_taskflows/taskflows/container_shell/demo_malware_analysis.yaml |
Headless demo taskflow for static malware triage workflow. |
src/seclab_taskflows/taskflows/container_shell/demo_network_analysis.yaml |
Headless demo taskflow for pcap analysis workflow. |
src/seclab_taskflows/taskflows/container_shell/README.md |
Documentation for profiles, building images, and running demos. |
src/seclab_taskflows/containers/base/Dockerfile |
Base Debian image with core CLI + Python utilities. |
src/seclab_taskflows/containers/malware_analysis/Dockerfile |
Malware analysis image extending base with RE/forensics tooling. |
src/seclab_taskflows/containers/network_analysis/Dockerfile |
Network analysis image extending base with recon/pcap tooling. |
scripts/build_container_images.sh |
Helper script to build the container images. |
scripts/run_container_shell_demo.sh |
Helper script to run one of the demo taskflows end-to-end. |
Comments suppressed due to low confidence (1)
src/seclab_taskflows/containers/malware_analysis/Dockerfile:12
- This Dockerfile downloads and installs a
.debfrom GitHub Releases usingcurland the movingreleases/latestendpoint without any checksum or signature verification. If the GitHub repo or the network path is compromised, an attacker can serve a malicious.debthat will be installed as root into your analysis image, leading to arbitrary code execution in every container built from it. Pin the radare2 artifact to a specific immutable version and verify its integrity (e.g., via a known-good checksum or publisher signature) before installing.
# radare2 is not in Debian bookworm apt; install prebuilt deb from GitHub releases
RUN ARCH=$(dpkg --print-architecture) \
&& R2_TAG=$(curl -fsSL "https://api.github.com/repos/radareorg/radare2/releases/latest" \
| grep -o '"tag_name": *"[^"]*"' | grep -o '"[^"]*"$' | tr -d '"') \
&& R2_VER="${R2_TAG#v}" \
&& curl -fsSL "https://github.com/radareorg/radare2/releases/download/${R2_TAG}/radare2_${R2_VER}_${ARCH}.deb" \
-o /tmp/r2.deb \
&& apt-get install -y /tmp/r2.deb \
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _container_id: str | None = None | ||
|
|
||
| CONTAINER_IMAGE = os.environ.get("CONTAINER_IMAGE", "") |
There was a problem hiding this comment.
_container_id is actually storing the container name (generated via --name), not the Docker container ID returned on stdout. Renaming this (e.g., _container_name/_container_ref) would make the code easier to reason about and avoid confusion in future changes (especially if you later decide to use the real ID).
| subprocess.run( | ||
| ["docker", "stop", "--time", "5", _container_id], | ||
| capture_output=True, | ||
| text=True, | ||
| ) |
There was a problem hiding this comment.
_stop_container() ignores the result of docker stop and unconditionally sets _container_id = None. If docker stop fails (e.g., daemon unavailable), the container may keep running but the server will lose track of it. Consider checking returncode and logging a warning (and/or keeping _container_id set when stop fails).
| subprocess.run( | |
| ["docker", "stop", "--time", "5", _container_id], | |
| capture_output=True, | |
| text=True, | |
| ) | |
| try: | |
| result = subprocess.run( | |
| ["docker", "stop", "--time", "5", _container_id], | |
| capture_output=True, | |
| text=True, | |
| ) | |
| except Exception as exc: | |
| logging.warning( | |
| "Failed to execute 'docker stop' for container %s: %s", | |
| _container_id, | |
| exc, | |
| ) | |
| return | |
| if result.returncode != 0: | |
| logging.warning( | |
| "docker stop for container %s failed with return code %s: %s", | |
| _container_id, | |
| result.returncode, | |
| (result.stderr or "").strip(), | |
| ) | |
| return | |
| logging.debug("Container stopped successfully: %s", _container_id) |
| import importlib | ||
| import subprocess | ||
| from unittest.mock import MagicMock, patch |
There was a problem hiding this comment.
Add the standard SPDX header at the top of this new test file. Other tests (e.g., tests/test_00.py) include SPDX copyright + license lines, but this file currently starts directly with imports.
See below for a potential fix:
@ -1,3 +1,5 @@
# SPDX-FileCopyrightText: 2024 Secure AI Labs, Inc.
# SPDX-License-Identifier: Apache-2.0
| @@ -0,0 +1,188 @@ | |||
| import importlib | |||
There was a problem hiding this comment.
importlib is imported but never used in this test module. Please remove the unused import to keep the test file lint-clean.
| import importlib |
| import atexit | ||
| import logging | ||
| import os | ||
| import subprocess | ||
| import uuid | ||
| from typing import Annotated |
There was a problem hiding this comment.
Add the standard SPDX license header at the top of this new module. Most Python files in this repo start with # SPDX-FileCopyrightText: GitHub, Inc. and # SPDX-License-Identifier: MIT, but this file currently starts directly with imports.
A colon in the workspace path breaks Docker's volume mount syntax (host:container[:options]), silently changing mount behaviour. Raise RuntimeError early in _start_container() if the colon is present. Adds a corresponding test.
SLF001 (private member accessed) is expected in tests that exercise module internals directly. Suppress it via per-file-ignores for tests/*. PLW0603 (global statement used for assignment) is the correct pattern for the module-level container ID state. Add to the global ignore list alongside the existing PLW0602 exemption.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
src/seclab_taskflows/containers/malware_analysis/Dockerfile:13
- This Dockerfile downloads and installs a prebuilt
radare2.debdirectly from GitHub using the mutablereleases/latesttag with no checksum or signature verification (curl ... radare2_${R2_VER}_${ARCH}.deb→apt-get install -y /tmp/r2.deb). If the GitHub account, release artifacts, or the network path are compromised, a malicious.debwould be transparently installed into your analysis image and run with container privileges, enabling supply-chain compromise of any environment using this image. Pin the download to a specific trusted release (e.g., by hard-coding a tag/version) and verify the artifact’s integrity (checksum or signature) before installation.
RUN ARCH=$(dpkg --print-architecture) \
&& R2_TAG=$(curl -fsSL "https://api.github.com/repos/radareorg/radare2/releases/latest" \
| grep -o '"tag_name": *"[^"]*"' | grep -o '"[^"]*"$' | tr -d '"') \
&& R2_VER="${R2_TAG#v}" \
&& curl -fsSL "https://github.com/radareorg/radare2/releases/download/${R2_TAG}/radare2_${R2_VER}_${ARCH}.deb" \
-o /tmp/r2.deb \
&& apt-get install -y /tmp/r2.deb \
&& rm /tmp/r2.deb
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| CONTAINER_IMAGE = os.environ.get("CONTAINER_IMAGE", "") | ||
| CONTAINER_WORKSPACE = os.environ.get("CONTAINER_WORKSPACE", "") | ||
| CONTAINER_TIMEOUT = int(os.environ.get("CONTAINER_TIMEOUT", "30")) | ||
|
|
There was a problem hiding this comment.
CONTAINER_IMAGE defaults to an empty string; if the env var isn't set, _start_container() will invoke docker run ... '' tail -f /dev/null which fails with a fairly opaque Docker error. Consider validating CONTAINER_IMAGE early (e.g., raise a clear RuntimeError when it's missing) so misconfiguration is easier to diagnose.
| cmd = ["docker", "run", "-d", "--rm", "--name", name] | ||
| if CONTAINER_WORKSPACE: | ||
| cmd += ["-v", f"{CONTAINER_WORKSPACE}:/workspace"] | ||
| cmd += [CONTAINER_IMAGE, "tail", "-f", "/dev/null"] | ||
| logging.debug(f"Starting container: {' '.join(cmd)}") | ||
| result = subprocess.run(cmd, capture_output=True, text=True) | ||
| if result.returncode != 0: | ||
| raise RuntimeError(f"docker run failed: {result.stderr.strip()}") |
There was a problem hiding this comment.
subprocess.run() when starting the container has no timeout, so a hung Docker daemon/CLI can block the MCP server indefinitely on first use. Consider adding a bounded timeout for docker run (separate from command execution timeout) and returning a clear error if it expires.
| subprocess.run( | ||
| ["docker", "stop", "--time", "5", _container_id], | ||
| capture_output=True, | ||
| text=True, | ||
| ) |
There was a problem hiding this comment.
_stop_container() ignores the result of docker stop and always sets _container_id = None. If docker stop fails, this can leave a container running but make it impossible to retry cleanup later in-process. Consider checking/logging returncode (and optionally retrying or leaving _container_id intact on failure).
| subprocess.run( | |
| ["docker", "stop", "--time", "5", _container_id], | |
| capture_output=True, | |
| text=True, | |
| ) | |
| result = subprocess.run( | |
| ["docker", "stop", "--time", "5", _container_id], | |
| capture_output=True, | |
| text=True, | |
| ) | |
| if result.returncode != 0: | |
| logging.error( | |
| "docker stop failed for container %s: %s", | |
| _container_id, | |
| result.stderr.strip(), | |
| ) | |
| return | |
| logging.debug(f"Container stopped: {_container_id}") |
| import atexit | ||
| import logging | ||
| import os |
There was a problem hiding this comment.
This new MCP server module is missing the SPDX license header that most src/seclab_taskflows/mcp_servers/*.py files include (e.g., local_file_viewer.py, repo_context.py). Consider adding the standard SPDX copyright + MIT identifier header at the top for consistency with the repo's licensing convention.
See below for a potential fix:
@ -1,3 +1,5 @@
# SPDX-FileCopyrightText: 2024-present OpenAI
# SPDX-License-Identifier: MIT
| -o /tmp/r2.deb \ | ||
| && apt-get install -y /tmp/r2.deb \ | ||
| && rm /tmp/r2.deb | ||
| RUN pip3 install --no-cache-dir --break-system-packages pwntools capstone volatility3 |
There was a problem hiding this comment.
pip3 install is unpinned here, so builds can change over time (and potentially break) as pwntools/capstone/volatility3 release new versions. Consider pinning versions (or constraints file) to keep container builds reproducible.
| RUN pip3 install --no-cache-dir --break-system-packages pwntools capstone volatility3 | |
| RUN pip3 install --no-cache-dir --break-system-packages \ | |
| pwntools==4.12.0 \ | |
| capstone==5.0.1 \ | |
| volatility3==2.6.1 |
S101 started firing after a ruff version bump in CI, including against the pre-existing test_00.py. Use of assert is standard pytest practice; suppress it for tests/* alongside SLF001.
- seclab-shell-sast image extends base with semgrep, pyan3, universal-ctags, GNU global, cscope, graphviz, ripgrep, fd, tree - Toolbox YAML with server_prompt documenting Python and C call graph workflows - Demo taskflow: tree, fd, semgrep, ctags, pyan3, gtags then summarise findings - Runner generates a demo Python file with a shell=True anti-pattern if workspace is empty, so semgrep has something to find out of the box - build_container_images.sh and run_container_shell_demo.sh updated for sast target - test_toolbox_yaml_valid_sast added (16/16 passing)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 12 comments.
Comments suppressed due to low confidence (2)
src/seclab_taskflows/taskflows/container_shell/README.md:9
- PR description says “Three profiles…”, but this README (and the added toolboxes/taskflows) describe four profiles including
sast. Please align the PR description and documentation so they describe the same set of supported profiles.
Four container profiles are provided. Each has its own Dockerfile, toolbox
YAML, and demo taskflow.
src/seclab_taskflows/containers/malware_analysis/Dockerfile:12
- This Dockerfile downloads and installs a radare2
.debdirectly from GitHub using thereleases/latestendpoint andcurlwithout any version pinning or integrity verification. If theradareorg/radare2project or its release artifacts are compromised, a malicious package could be pulled into your image and executed at build time and whenever this container runs. To harden the supply chain, pin radare2 to a specific trusted version and verify the downloaded artifact (for example with a checksum or signature) before installing it, or obtain it from a vetted package repository.
RUN ARCH=$(dpkg --print-architecture) \
&& R2_TAG=$(curl -fsSL "https://api.github.com/repos/radareorg/radare2/releases/latest" \
| grep -o '"tag_name": *"[^"]*"' | grep -o '"[^"]*"$' | tr -d '"') \
&& R2_VER="${R2_TAG#v}" \
&& curl -fsSL "https://github.com/radareorg/radare2/releases/download/${R2_TAG}/radare2_${R2_VER}_${ARCH}.deb" \
-o /tmp/r2.deb \
&& apt-get install -y /tmp/r2.deb \
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| _container_id: str | None = None | ||
|
|
||
| CONTAINER_IMAGE = os.environ.get("CONTAINER_IMAGE", "") | ||
| CONTAINER_WORKSPACE = os.environ.get("CONTAINER_WORKSPACE", "") | ||
| CONTAINER_TIMEOUT = int(os.environ.get("CONTAINER_TIMEOUT", "30")) | ||
|
|
||
|
|
||
| def _start_container() -> str: | ||
| """Start the Docker container and return its name.""" | ||
| if CONTAINER_WORKSPACE and ":" in CONTAINER_WORKSPACE: | ||
| raise RuntimeError(f"CONTAINER_WORKSPACE must not contain a colon: {CONTAINER_WORKSPACE!r}") | ||
| name = f"seclab-shell-{uuid.uuid4().hex[:8]}" |
There was a problem hiding this comment.
The global is named _container_id, but it actually stores the Docker container name (as created by --name). Renaming this to _container_name (and adjusting log messages accordingly) would make the code easier to follow and reduce confusion when debugging.
| FROM seclab-shell-base:latest | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ |
There was a problem hiding this comment.
This Dockerfile is missing the SPDX header (the sast/Dockerfile in this PR includes it). Please add SPDX-FileCopyrightText and SPDX-License-Identifier at the top for consistency.
See below for a potential fix:
@ -1,3 +1,5 @@
# SPDX-FileCopyrightText: 2024 seclab contributors
# SPDX-License-Identifier: Apache-2.0
| # Must be run from the root of the seclab-taskflows repository. | ||
| # Images must be rebuilt whenever a Dockerfile changes. | ||
| # | ||
| # Usage: ./scripts/build_container_images.sh [base|malware|network|all] |
There was a problem hiding this comment.
The usage comment at the top lists [base|malware|network|all], but this script also supports sast (and the usage message below includes it). Please update the header comment to match the actual supported targets.
| # Usage: ./scripts/build_container_images.sh [base|malware|network|all] | |
| # Usage: ./scripts/build_container_images.sh [base|malware|network|sast|all] |
| if [ ! -d "$workspace" ] && [ ! -f "${workspace}/${target}" ]; then | ||
| echo "No source found at ${workspace}/${target}" >&2 |
There was a problem hiding this comment.
In the sast case, this existence check is ineffective because workspace is always a directory after mkdir -p "$workspace", so [ ! -d "$workspace" ] will always be false. As a result, missing ${workspace}/${target} won’t be caught. Please change the condition to validate the actual target path (directory/file) under the workspace.
| if [ ! -d "$workspace" ] && [ ! -f "${workspace}/${target}" ]; then | |
| echo "No source found at ${workspace}/${target}" >&2 | |
| target_path="${workspace}/${target}" | |
| if [ ! -d "$target_path" ] && [ ! -f "$target_path" ]; then | |
| echo "No source found at ${target_path}" >&2 |
| @@ -0,0 +1,202 @@ | |||
| import importlib | |||
| import subprocess | |||
There was a problem hiding this comment.
importlib is imported but never used in this test file. Please remove the unused import (or use it) to keep the tests clean and avoid relying on global Ruff ignores like F401.
| import subprocess |
| FROM seclab-shell-base:latest | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ |
There was a problem hiding this comment.
This Dockerfile is missing the SPDX header (the sast/Dockerfile in this PR includes it). Please add SPDX-FileCopyrightText and SPDX-License-Identifier at the top for consistency.
See below for a potential fix:
@ -1,3 +1,5 @@
# SPDX-FileCopyrightText: 2024 Example Project Authors
# SPDX-License-Identifier: Apache-2.0
| CONTAINER_IMAGE = os.environ.get("CONTAINER_IMAGE", "") | ||
| CONTAINER_WORKSPACE = os.environ.get("CONTAINER_WORKSPACE", "") | ||
| CONTAINER_TIMEOUT = int(os.environ.get("CONTAINER_TIMEOUT", "30")) | ||
|
|
||
|
|
||
| def _start_container() -> str: | ||
| """Start the Docker container and return its name.""" | ||
| if CONTAINER_WORKSPACE and ":" in CONTAINER_WORKSPACE: | ||
| raise RuntimeError(f"CONTAINER_WORKSPACE must not contain a colon: {CONTAINER_WORKSPACE!r}") | ||
| name = f"seclab-shell-{uuid.uuid4().hex[:8]}" | ||
| cmd = ["docker", "run", "-d", "--rm", "--name", name] | ||
| if CONTAINER_WORKSPACE: | ||
| cmd += ["-v", f"{CONTAINER_WORKSPACE}:/workspace"] | ||
| cmd += [CONTAINER_IMAGE, "tail", "-f", "/dev/null"] |
There was a problem hiding this comment.
When CONTAINER_IMAGE is unset/empty, _start_container() will build a docker run command with a missing image argument, which produces a confusing Docker error. Consider validating CONTAINER_IMAGE early and raising a clear RuntimeError (so shell_exec can return a helpful message).
| `CONTAINER_TIMEOUT` — default command timeout in seconds. Defaults to 30 (base | ||
| and network) or 60 (malware analysis). | ||
|
|
There was a problem hiding this comment.
The README states CONTAINER_TIMEOUT defaults are 30 (base/network) or 60 (malware), but this PR also includes a sast profile whose toolbox sets CONTAINER_TIMEOUT to 60. Please update this section to include the SAST default (or describe the default per-profile more generally).
This issue also appears on line 7 of the same file.
| _DEFAULT_WORKDIR = "/workspace" if CONTAINER_WORKSPACE else "/" | ||
|
|
||
|
|
||
| @mcp.tool() | ||
| def shell_exec( | ||
| command: Annotated[str, Field(description="Shell command to execute inside the container")], | ||
| timeout: Annotated[int, Field(description="Timeout in seconds")] = CONTAINER_TIMEOUT, | ||
| workdir: Annotated[str, Field(description="Working directory inside the container")] = _DEFAULT_WORKDIR, |
There was a problem hiding this comment.
_DEFAULT_WORKDIR defaults to "/" when CONTAINER_WORKSPACE is unset, but the provided images set WORKDIR /workspace and the toolbox prompts describe /workspace as the working directory. Consider defaulting workdir to "/workspace" unconditionally (or make it image-configurable) to match the container images and prompts.
| ## Container Shell (malware analysis) | ||
| You have access to an isolated Docker container for binary and malware analysis. | ||
| The working directory is /workspace — place samples here before analysis. | ||
| ALL tooling runs inside the container, keeping the host safe. |
There was a problem hiding this comment.
The malware-analysis toolbox prompt claims “keeping the host safe”, but container_shell.py bind-mounts the host workspace read-write by default. This is misleading because container commands can still modify host files through the mount. Please soften/clarify this wording (or switch the default mount to read-only).
| ALL tooling runs inside the container, keeping the host safe. | |
| ALL tooling runs inside the container to reduce risk to the host. Note that /workspace may be shared with the host, so changes there can affect host files. |
Adds an MCP server (
container_shell.py) that manages a single Docker container per process lifetime. The container starts on the firstshell_execcall and stops on exit viaatexit. An optional host directory is mounted at/workspaceviaCONTAINER_WORKSPACE.Four profiles, each with a Dockerfile, toolbox YAML, and demo taskflow:
shell_execis listed underconfirm:in all toolbox YAMLs. Setheadless: trueon a task to skip confirmation in automated pipelines.Images are built with
scripts/build_container_images.sh. Demos run withscripts/run_container_shell_demo.sh <base|malware|network|sast>.16 unit tests, all mocked (no Docker required to run tests).