feat(bridge): OpenFang tool surface v2 over MCP bridge (+ Stage 9 hardening)#1205
feat(bridge): OpenFang tool surface v2 over MCP bridge (+ Stage 9 hardening)#1205benhoverter wants to merge 44 commits into
Conversation
Standalone crate exposing OpenFang's tool surface to MCP clients (primarily
Claude Code subprocesses) over stdio. Per architectural decision in ANAI-22:
not folded into openfang-runtime — keeps the protocol adapter out of the
kernel/compactor blast radius and the dep graph clean.
This commit is scaffolding only:
* Cargo manifest with rmcp 1.x (server, transport-io, macros)
* lib.rs: ToolDispatcher seam trait (runtime-implements, bridge-consumes,
one-way dep), ToolDispatchError enum, Bridge struct wrapping an
Optional<Arc<dyn ToolDispatcher>>, single stub `ping` tool
* main.rs: stdio MCP server entrypoint, tracing -> stderr (stdout is the
transport), no dispatcher attached
* Workspace members updated
Identity is bound at Bridge construction time, not per-call — the security
invariant tracked by ANAI-31. Real tool surface mapping lands in ANAI-30.
cargo check -p openfang-mcp-bridge: clean.
cargo check --workspace: clean (pre-existing imap-proto future-incompat
warning unrelated).
Refs: ANAI-22, ANAI-29
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the daemon-side foundation for the MCP bridge per the ANAI-30 plan
(topology 1b: daemon → CC → bridge → unix socket → daemon dispatcher).
- New `protocol` module in openfang-mcp-bridge: Frame/Hello/HelloAck/
CallRequest/CallResponse types with length-prefixed JSON framing
(1 MiB cap, 4-byte BE length prefix). Gated by `ipc-codec` feature
so type-only consumers can drop the tokio io traits.
- New `bridge_ipc` module in openfang-api: BridgeIpcServer binds
<home_dir>/run/bridge.sock (0600), accept loop with graceful
shutdown via Notify, per-connection Hello validation and CallRequest
→ CallResponse loop.
- run_daemon spawns the listener; failure is non-fatal (HTTP keeps
serving; bridge just unavailable). Socket file removed on shutdown.
Step 1 stub: the dispatcher returns CallResult::Error
("not yet wired"). Step 2 replaces this with a call into
openfang_runtime::tool_runner::execute_tool, scoped to the four-tool
allowlist (file_read, file_list, agent_list, channel_send). Identity
binding + token-table auth land in ANAI-31.
Tests: 3 protocol roundtrip tests + 4 IPC handler tests
(handshake/dispatch end-to-end via tempfile socket, version mismatch
rejection, empty-token rejection).
Refs ANAI-30, ANAI-22.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the step-1 stub in `BridgeIpcServer` with a real call into
`openfang_runtime::tool_runner::execute_tool`, mirroring the argument
bundle used by the HTTP /mcp endpoint in routes.rs.
- Added ALLOWED_TOOLS allowlist: file_read, file_list, agent_list,
channel_send. Rejection happens at the protocol layer (CallResult::Error)
before any kernel touch.
- Added dispatch_call(): snapshots the skill registry, builds a
KernelHandle from Arc<OpenFangKernel>, and invokes execute_tool.
- ToolResult mapped to CallResult::Ok { content, is_error }, preserving
the Ok/Error distinction (Error = bridge couldn't dispatch; Ok with
is_error = tool ran but returned an error).
- Identity stub: caller_agent_id taken at face value from
CallRequest::agent_id. Real per-spawn token-bound identity lands in
ANAI-31.
Test: ipc_handshake_and_allowlist_gate verifies wire shape end-to-end:
disallowed tool gets allowlist Error, allowed tool gets Ok response. Real
execute_tool integration tests come once the daemon spawns the bridge
for real (ANAI-31).
…l surface
Replaces the stub `ping` tool with the four ANAI-30 tools (file_read,
file_list, agent_list, channel_send) and wires the bridge binary to forward
each `tools/call` over the daemon IPC socket established in step 1.
Library (lib.rs):
- ToolDispatcher::call now returns DispatchOk { content, is_error }
preserving the tool-error-vs-dispatch-error distinction across the seam
- built_in_tools() declares the four-tool slice; schemas mirror
runtime::tool_runner::builtin_tool_definitions() (kept in lockstep)
- Bridge: manual ServerHandler impl (drops the #[tool_router] macro). Filters
advertised tools by intersecting built_in_tools() with
ToolDispatcher::allowed_tools(); double-checks before dispatch
- Bridge::new now requires a dispatcher (was Option<_>)
Binary (main.rs):
- Reads OPENFANG_BRIDGE_SOCKET / TOKEN / AGENT_ID env vars (last is stub for
ANAI-30; ANAI-31 derives identity from token)
- Connects to daemon, performs Hello/HelloAck handshake, exits on rejection
- IpcDispatcher: bridge-side ToolDispatcher impl. Forwards each call via mpsc
to an actor task that owns the stream; correlation-by-request_id with a
PendingMap<u64, oneshot> so concurrent tools/call invocations don't
serialize at the dispatcher layer
- Reader task drains pending oneshots with an error on connection close so
in-flight calls don't hang; production path exits the process so CC
notices and tears down (gated behind cfg(not(test)))
Tests:
- lib: built_in_tools_has_anai30_slice, permitted_tools_intersects_with_dispatcher_allowed
- main: ipc_dispatcher_round_trip_and_correlation — fake daemon listener,
full handshake, two concurrent calls, verifies per-id correlation and the
NotPermitted gate
Workspace check clean. Daemon-side bridge_ipc tests still pass (4/4).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end topology now exists at the type level:
daemon → claude (per-prompt) → openfang-mcp-bridge → IPC → daemon
- Add `caller_agent_id: Option<String>` to CompletionRequest. Plumbed
through all construction sites; agent_loop populates it with
session.agent_id, everywhere else passes None.
- Daemon (`server.rs::run_daemon`): after BridgeIpcServer starts,
publish OPENFANG_BRIDGE_SOCKET and OPENFANG_BRIDGE_BIN as process env
for subprocess drivers to discover. Bridge bin defaults to a sibling
of current_exe; operators can override with OPENFANG_BRIDGE_BIN. Both
set with `unsafe` (edition 2024) but only during single-threaded
daemon startup, before any subprocess spawns.
- BridgeIpcServer gains `socket_path()` accessor.
- ClaudeCodeDriver: per-spawn `try_build_bridge_mcp_config`. When
caller_agent_id is set AND both discovery env vars are present,
generate a UUID token, write `<home>/run/cc-mcp-<uuid>.json` (0600),
and add `--mcp-config <path> --strict-mcp-config` to the claude args.
RAII guard removes the file on drop so per-spawn token lifetime is
bounded by the CC subprocess.
- apply_env_filter extended to strip OPENFANG_BRIDGE_* from CC's child
env. Bridge gets these only via the explicit `env` map in the
mcp-config — CC inheriting them would risk a stray bridge picking up
the daemon socket without a fresh per-spawn token.
- Tests:
- test_build_bridge_mcp_config_shape — verifies wire shape claude
expects: mcpServers.openfang.{command,args,env} with exactly the
three discovery vars in env (no extras to leak state).
- test_apply_env_filter_strips_bridge_discovery_vars — confirms
filter removes all four bridge vars from CC's child env.
- test_bridge_mcp_config_drop_removes_file — RAII cleanup invariant.
Stub points still flagged: token validated as non-empty (ANAI-31
replaces with daemon-issued per-spawn token table); agent_id taken
in-band from CallRequest (ANAI-31 derives from token).
11 CC driver tests pass. bridge_ipc (4) and bridge crate (6) tests
unchanged. Workspace check clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…NABLED
Default-off kill switch so we can deploy the bridge code path without
inlining it into every CC invocation. When the gate is unset or not in
{1, true}, try_build_bridge_mcp_config returns None and CC is spawned
exactly as it was pre-step-4 — no --mcp-config, no temp file, no bridge
child. Validation flow: deploy with gate off (sanity), launchctl setenv
OPENFANG_BRIDGE_ENABLED 1, bounce daemon, observe; if anything regresses,
flip back to 0 and bounce for instant recovery.
Daemon still starts the IPC listener and publishes BRIDGE_SOCKET/BIN env
unconditionally — both are harmless without a bridge child connecting.
Pure additive switch; zero behavior change when off.
Test exercises the full truth table for bridge_enabled() (unset, truthy
variants, falsy/garbage variants) and confirms the gate suppresses
config generation regardless of other env. Single test owns the global
env var so no serial_test infra needed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bridge IPC handshake works standalone (bridge binary connects + Hello/HelloAck ok against the live socket), and the daemon-side `wired CC --mcp-config for OpenFang bridge` debug line confirms the flag is being passed to claude. But no `bridge IPC accepted connection` events ever fire — meaning claude is launched with `--mcp-config` but isn't spawning the MCP server subprocess. Without `--debug`, claude swallows MCP launch errors silently. And we drop CC's stderr on success spawns, so any silent rejection is invisible. Add (both spawn paths): - `--debug` flag when bridge config is wired, so MCP errors print to stderr. - Always log a 4 KB tail of CC stderr at info when bridge_wired, regardless of success/failure. Streaming path now drains stderr concurrently to avoid pipe deadlock under chatty --debug output. Existing 12/12 claude_code unit tests still pass; workspace check clean. Diagnostic only — once the cause is identified we'll pare back to bounded on-demand logging.
- bridge_ipc: promote handshake/dispatch events to INFO and add an `accepted connection` log on accept. Operators can now observe the full bridge lifecycle from daemon stderr without crawling through ~/.claude/debug/<uuid>.txt. - claude_code driver: gate --debug + the 4 KB CC-stderr-tail diagnostic behind a new OPENFANG_BRIDGE_DEBUG env var (off by default). With proper INFO logs daemon-side, the noisy --debug output and the per-spawn ~/.claude/debug/ files are no longer load-bearing. - server: validate operator-supplied OPENFANG_BRIDGE_BIN path at boot and log the resolution outcome (override vs. probe). Catches deploy ordering bugs where the env points at a binary that doesn't exist. Stderr is still drained concurrently in the streaming path — required whenever --debug might be on, cheap when it isn't.
The MCP bridge IPC is unix-domain-socket-only by construction (daemon
listens on a unix socket; bridge subprocess connects to it). The bridge
crate and the daemon-side `bridge_ipc` module unconditionally imported
`tokio::net::{UnixStream, UnixListener}`, which broke Windows CI with
E0432 unresolved-import errors in `openfang-mcp-bridge::main` and
`openfang-api::bridge_ipc`.
Gates:
- `openfang-mcp-bridge::main` — entire body cfg-gated to `unix`; on
non-unix the binary is a no-op stub that prints a clear message and
exits non-zero. Tests gated `cfg(all(test, unix))`.
- `openfang-api::lib` — `pub mod bridge_ipc` gated to `unix`.
- `openfang-api::server::run_daemon` — `BridgeIpcServer::start` call
gated to `unix`; non-unix logs a single info line and proceeds without
bridge IPC. The CC driver's existing missing-socket fallthrough means
CC subprocesses spawn without `--mcp-config` on Windows, matching the
bridge-disabled path.
No behavioral change on Linux/macOS. Windows users get a daemon that
boots without bridge support; MCP-routed tools are unavailable until a
Windows-native transport (named pipes / TCP loopback) lands as a
follow-up.
Verified: cargo check --workspace, cargo check --workspace --tests,
cargo test -p openfang-mcp-bridge -p openfang-api --lib, cargo fmt
--check, and cargo clippy all clean on macOS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Introduce the security primitives that will gate bridge IPC handshake: - openfang-types::bridge_auth::Token — 32-byte CSPRNG-generated opaque token. No Debug impl (anti-leak); only an 8-hex-char fingerprint() for logs. Constant-time equality, hex round-trip. 8 unit tests. - openfang-runtime::bridge_auth — TokenIssuer trait (issue/revoke) and SpawnGuard. Guard holds Arc<dyn TokenIssuer>; Drop revokes. Trait lives here so the Claude Code driver (phase B) can hold Arc<dyn TokenIssuer> without a circular dep on openfang-api. 2 unit tests via stub issuer. - openfang-api::bridge_auth::BridgeAuthority — concrete TokenIssuer impl. Mutex<HashMap<Token, AgentId>> behind Arc::new_cyclic so issued guards carry a Weak<Self> back for self-revocation. Inherent resolve() and live_spawn_count() for the IPC dispatcher and Debug impl. Manual Debug redacts to spawn count (Token has none by design). 7 unit tests, including an Arc<dyn TokenIssuer> round-trip that exercises the exact abstraction the driver will hold. Dep direction respected: openfang-api -> openfang-runtime -> openfang-types. No wiring yet; phase B threads Arc<dyn TokenIssuer> into ClaudeCodeDriver, phase C constructs the Arc<BridgeAuthority> in the daemon and hands it to both BridgeIpcServer and the driver factory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire the spawn side of the bridge handshake. The driver can now mint
authenticated bridge tokens via the daemon-side authority, with the legacy
UUID path preserved so dev builds without an issuer don't regress.
- ClaudeCodeDriver gains token_issuer: Option<Arc<dyn TokenIssuer>>.
Builder with_token_issuer(...) for phase C wiring; new()/with_timeout()
default to None.
- BridgeMcpConfig gains _guard: Option<SpawnGuard>. Lifetime-only field
(underscore-prefixed) — when the spawned process exits and the config
drops, SpawnGuard::drop revokes the token from the authority's spawn
table.
- try_build_bridge_mcp_config is now a &self method with two branches:
* issuer present -> parse caller_agent_id as AgentId, refuse to wire
if it doesn't parse (no anonymous tokens), then issuer.issue(...) and
emit guard.token().to_hex() (64-char hex) on OPENFANG_BRIDGE_TOKEN.
Debug log carries token fingerprint.
* issuer absent -> legacy UUID path (renamed
generate_legacy_bridge_token) — preserves current ANAI-30 behavior
where any non-empty token is treated as authenticated.
- Both call sites in complete() and stream_complete() updated to call
through self. Env var name OPENFANG_BRIDGE_TOKEN unchanged.
- Two test fixtures updated to construct via the driver and to set
_guard: None on synthetic BridgeMcpConfig values. claude_code tests:
21/21 green.
Phase C will construct Arc<BridgeAuthority> in openfang-api::server and
hand it to both BridgeIpcServer::start and the driver factory.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase C1 of bridge auth wiring. create_driver now accepts an optional TokenIssuer; the Claude Code driver consumes it via with_token_issuer. OpenFangKernel gains a token_issuer slot + setter/getter; resolve_driver threads the issuer through to fresh driver builds. Boot-time and one-shot probe sites keep the legacy UUID path (None). Agent-loop fallback sites are marked for C2.
…loop Phase C2 of bridge auth wiring. KernelHandle gains a default-None token_issuer() accessor; OpenFangKernel overrides it to expose the daemon's BridgeAuthority. The agent loop reads the issuer via KernelHandle and threads it into call_with_retry / stream_with_retry, which forward it to create_driver on the ModelNotFound fallback path so fallback drivers also get hardened bridge tokens. server.rs constructs the BridgeAuthority at daemon startup (unix-gated) and hands it to the kernel before background agents start.
…r connection Phase D of bridge auth wiring. validate_hello is renamed to authenticate_hello and now resolves the presented token through the daemon's BridgeAuthority. Well-formed hex tokens that the authority issued resolve to an AgentId that overrides the bridge's claimed agent_id on every subsequent CallRequest (with a warn! on mismatch). Well-formed hex tokens the authority never issued are rejected. Non-hex tokens fall through to the ANAI-30 legacy path for back-compat with spawn sites that haven't yet been wired through the TokenIssuer (boot- time create_driver in particular). Handshake log line now includes the token fingerprint and authenticated agent id for audit correlation. BridgeIpcServer::start gains the authority argument; server.rs hoists the authority out of the C2 cfg block so it can be threaded in.
…gents Phase E of the bridge tool surface v2 work closes a boot-time loophole left by phases C1/C2/D: agents brought up during kernel boot (autostart + persisted) were instantiated before the post-boot `set_token_issuer` call, so their drivers were baked with legacy-UUID identity even though every later code path was already on the hardened BridgeAuthority/TokenIssuer track. Changes: - kernel: add `boot_with_config_and_issuer(config, Option<Arc<dyn TokenIssuer>>)`; the existing `boot_with_config` becomes a thin wrapper passing `None`. The issuer is stashed into the kernel's `token_issuer` slot before the boot-time driver chain is built, so the three in-boot `create_driver` sites see the hardened path. Adds two unit tests covering both entry points. - cli/main: construct `BridgeAuthority` before kernel boot and pass `Some(issuer)` into the new entrypoint; thread the same `Arc` into `run_daemon`. - server::run_daemon: gain `#[cfg(unix)] bridge_authority` param; remove the now-redundant post-boot `set_token_issuer` call. - bridge_auth: add `as_token_issuer()` ergonomic helper. - bridge_ipc: comment refresh - the legacy lane is now reserved for non-unix / non-daemon callers only. Tests: 307/307 green, including the two new boot_with_config tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…IDGE_ALLOWED The bridge subprocess has been deafer than the rest of the system: every CC spawn fell back to the static four-tool DEFAULT_ALLOWED set inside `openfang-mcp-bridge`, regardless of what the calling agent's `agent.toml` actually permits. Capability/RBAC decisions made by the kernel and agent loop didn't reach the bridge. This commit plumbs the existing source of truth — `agent.toml` → kernel-resolved `available_tools` → agent loop — into the bridge's env via `OPENFANG_BRIDGE_ALLOWED`: - Add `allowed_tools: Option<Vec<String>>` to `CompletionRequest`. - Populate at both agent_loop call sites from `available_tools`. - Other 9 construction sites (compactor, routing tests, driver tests, HTTP probe, kernel router probe, fallback test helper) pass `None` — they don't drive bridge subprocesses or don't have an agent context. - Thread the list through `try_build_bridge_mcp_config` → `build_bridge_mcp_config_value`, which now emits `OPENFANG_BRIDGE_ALLOWED` in the per-spawn `--mcp-config` env map. - Strip `OPENFANG_BRIDGE_ALLOWED` in `apply_env_filter` so a CC child never inherits a stale allowlist. - `None` → env absent → bridge falls back to its hard-coded default, matching today's behavior. Empty list is meaningful: it emits the env var as the empty string so the bridge produces an explicit zero-tool surface instead of silently defaulting. Tool surface change: zero. This is pure plumbing; the bridge still only knows about the ANAI-30 four-tool slice. Adding `agent_send` to that slice ships in the next commit. Tests: - New `test_build_bridge_mcp_config_emits_allowed_tools` asserts the comma-separated env var lands in the config. - New `test_build_bridge_mcp_config_emits_empty_allowed_tools_explicitly` pins the empty-list-vs-absent distinction. - Existing `test_build_bridge_mcp_config_shape` updated to assert `OPENFANG_BRIDGE_ALLOWED` is *absent* when no list is supplied. - Existing env-filter test extended to assert the new var is stripped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First tool added to the bridge surface past the original ANAI-30 slice. Now that the previous commit makes the bridge honor per-agent capability gates (via OPENFANG_BRIDGE_ALLOWED sourced from each agent's agent.toml), expanding the bridge's *capability* set is safe: agents whose toml does not list agent_send will not see or be able to dispatch it. Schema mirrors `openfang_runtime::tool_runner::builtin_tool_definitions` → agent_send entry. Kept in sync by hand; the bridge crate is runtime-free by design and can't import the source. - `built_in_tools()` in openfang-mcp-bridge gains the agent_send Tool. - `DEFAULT_ALLOWED` in openfang-mcp-bridge/src/main.rs gains agent_send so the legacy/dev path (env var unset) still advertises every tool the bridge can dispatch. Production daemon spawns always set the env. - `bridge_ipc::ALLOWED_TOOLS` (daemon-side ceiling) gains agent_send so the dispatch_call gate doesn't reject it. - `built_in_tools_has_anai30_slice` → `built_in_tools_surface`: now asserts the five-tool set and includes a drift-warning comment. - `permitted_tools_intersects_with_dispatcher_allowed` swaps `agent_send` for `shell_exec` as its "unknown to the bridge" example, since agent_send is no longer unknown. - Bridge `instructions` string updated: drops the ANAI-30 enumeration, explains the gating model instead. Tool surface effect: agents whose agent.toml grants `agent_send` (which already includes the coder agents that have been using it via the legacy path) can now call it through MCP. Agents that don't list it remain unable to. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The bridge IPC path (`bridge_ipc::dispatch_call`) and HTTP `/mcp` endpoint both passed `workspace_root: None` to `execute_tool`, which caused `resolve_file_path` to fall through to a `..`-only validation check. Absolute paths bypassed it entirely and bare-relative paths resolved against the daemon CWD (`~/.openfang`), allowing any agent advertised `mcp__openfang__file_read`/`file_list` to read every sibling workspace plus `secrets.env` and GCP service-account JSON sitting at the openfang root. Fix: - bridge_ipc: look up workspace via authenticated `AgentId` → registry manifest. Refuse FS tools when no workspace is registered rather than silently falling back to an unscoped view. - routes (`POST /mcp`): same gate; HTTP has no ambient agent identity, so callers must pass `_agent_id` in arguments to scope FS calls. Kernel-level tools (agent_list, channel_send, etc.) keep working without `_agent_id` since they don't touch the filesystem. - Both paths now thread `workspace_root` through to `execute_tool`, so `workspace_sandbox::resolve_sandbox_path` actually runs — canonicalize- then-prefix-check blocks absolute escapes and symlink traversal; the existing `..` denial covers relative traversal. Smoke (post-rebuild, from `coder-learn-rust` with file_read/list): - `secrets.env` (bare relative) → ENOENT inside sandboxed workspace - `/Users/rlyeh/.openfang/secrets.env` → "Access denied: resolves outside workspace" - `../assistant/IDENTITY.md` → "Path traversal denied" - `IDENTITY.md` → success (workspace-relative legitimate read works) - `file_list .` → returns the agent's own workspace contents only; zero openfang-root entries Deferred: defensive log on disallowed-tool invocation at execution time (thread C), and removing `tool_runner`'s `workspace_root: None` legacy fallback (used only by tests). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Defense-in-depth against a misbehaving subprocess crafting JSON-RPC calls to tools outside its advertised set. Restructures dispatch_call into three ordered gates: 1. Static bridge allowlist (unchanged) — reject unknown tools pre-identity. 2. Identity resolution (hardened) — authenticated AgentId wins; legacy lane requires claimed string to parse as AgentId AND have a registry entry. Closes the "trust the claimed string" loophole (ANAI-30 follow-up). 3. Per-agent permission gate (new) — calls the canonical resolver kernel.available_tools_with_registry + AgentMode::filter_tools, the exact pair agent_loop uses to build OPENFANG_BRIDGE_ALLOWED at spawn. Advertise-time and execute-time gates cannot drift. 4. Workspace sandbox (D-fix, unchanged) — FS tools require a workspace. Snapshot construction now matches agent_loop's bundled→global→workspace layering, so workspace skill overrides are visible bridge-side. Rejections log warn! with request_id, tool, agent, mode, permitted_count. Caller sees a generic "tool 'X' not permitted for this agent". Promotes available_tools_with_registry from fn to pub fn. Verified end-to-end via raw IPC frames: rejection on agent_list for coder-learn-rust (Gate 2), positive path on file_read, identity gate rejection for unregistered UUID and unparseable legacy claim, and Gate 1 rejection for shell_exec outside bridge surface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add file_write and web_fetch to the bridge tool surface so CC-backed agents reach parity with API-backed agents for FS-write and HTTP-out. Both inherit the existing four-gate pipeline (static allowlist, identity, per-agent permission via available_tools_with_registry, FS sandbox). - openfang-mcp-bridge: hand-mirrored schemas in built_in_tools(); drift catcher test updated. Both added to DEFAULT_ALLOWED. - openfang-api/bridge_ipc: both added to ALLOWED_TOOLS. file_write added to FS_SANDBOXED_TOOLS; web_fetch deliberately not (no FS touch — SSRF guardrails live in the runtime impl, including the external-content delimiter wrapper). Smoke (raw-IPC harness): permitted positive + unpermitted negative for each tool, all four pass with matching daemon warn lines.
Adds the four remaining agent-lifecycle tools to the MCP bridge surface, bringing parity with OpenFang's native kernel tool set for agent control. - openfang-mcp-bridge/lib.rs: register all four in built_in_tools(); drift-catcher test bumped to 11 tools. - openfang-mcp-bridge/main.rs: add to DEFAULT_ALLOWED. - openfang-api/bridge_ipc.rs: add to ALLOWED_TOOLS. Not added to FS_SANDBOXED_TOOLS (kernel-only, no filesystem touch). Gate behavior is inherited from existing plumbing: - Gate 1 (static allowlist): all four now pass. - Gate 2 (per-agent capability via agent.toml): enforced through the canonical resolver, no extra hardcoded checks. - Gate 3 (FS sandbox): skipped by design. Smoke: 8/8 scenarios green (positives via coder-openfang against a throwaway coder-smoketest agent; negatives via code-reviewer hit Gate 2 cleanly with permitted_count=5 matching her agent.toml). Full spawn → activate → kill lifecycle verified end-to-end through the bridge. Smoketest workspace + registry entry cleaned up.
Brings the MCP bridge surface to 13 tools, leaving only shell_exec pending the Claude Code Bash-disable spike. Memory tools are gated per-agent via agent.toml; not sandboxed at the FS layer since memory writes route through kernel-managed workspace memory. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Material change: OpenFang now authoritatively blocks Claude Code's native filesystem, shell, and web tool surface for every CC subprocess it spawns, by injecting a per-spawn settings.json (written under the bridge socket dir, removed on drop) and passing it via `claude --settings <file>`. Why this works alongside `--dangerously-skip-permissions`: per Anthropic's settings documentation, `permissions.deny` rules are enforced even when the YOLO flag is set — that flag only bypasses allow/ask prompts, not security-critical denies. The daemon keeps non-interactive operation (no TTY to answer prompts) while still authoritatively blocking the native surface. Deny set: Bash, Read, Write, Edit, MultiEdit, NotebookEdit, WebFetch, WebSearch, Glob, Grep. The MCP namespace (`mcp__openfang__*`) lives in a separate pattern space and is untouched — that's the point: replace the bypassable native surface with the gated, RBAC-checked bridge surface. TodoWrite and Task (subagent) are deliberately left alone: agent-internal control flow with no escape to the host. Gated on `bridge_enabled()` so the deny set co-travels with the bridge wiring — either both or neither. Without the bridge wired the agent has no `mcp__openfang__*` fallback, so blanket-denying native tools would yield a useless agent. Also retroactively earns the docstring claim in `llm_driver.rs::skip_permissions`: before this commit the claim that "OpenFang's RBAC layer makes YOLO safe" was true at the bridge wire but false at CC's wide-open native surface. With native tools denied, the bridge is now the only path to host resources, and per-agent agent.toml capabilities become authoritative for CC subprocesses. Tests: - test_build_cc_settings_shape: wire shape, bare tool names only - test_cc_native_deny_includes_glob_grep: pins the deny set decisions - test_cc_settings_file_drop_removes_file: RAII cleanup Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Commit 13a denies CC's native Bash via --settings; every CC-driven agent
now needs shell access through the OpenFang bridge instead. This adds
`shell_exec` to the bridge allowlist and threads the two pieces of
context the runtime needs to dispatch it safely:
* `exec_policy` resolved from the registered manifest (per-agent
override or kernel fallback, already populated by the kernel at
agent boot — kernel.rs:1440-1447). Drives the allowlist/full/deny
decision plus shell-metacharacter rejection in tool_runner.
* `allowed_env_vars` read from `manifest.metadata["hand_allowed_env"]`,
mirroring agent_loop.rs:320 — same lookup, same semantics, so the
bridge path and the in-proc path see identical hand-granted env.
`shell_exec` joins `FS_SANDBOXED_TOOLS`: tool_runner sets the command's
cwd to the workspace_root, so without a registered workspace the shell
would default to the daemon CWD (`~/.openfang`, where secrets.env and
the GCP service-account JSON live). Refusing the call when no workspace
is registered keeps that closed.
Tests:
* `allowlist_contains_shell_exec` — name-locked allowlist membership
* `shell_exec_is_workspace_sandboxed` — name-locked sandbox membership
* Updated `ipc_handshake_and_allowlist_gate` to use a synthetic
non-tool name as the "not on allowlist" example; shell_exec is now
permitted there.
Backcompat: agents that already declared `shell_exec` in their
capabilities get it via the bridge with the same gating they had
in-proc. Agents that didn't, can't — gate 2 still rejects.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Commit 13a's deny set blocks CC's native `WebSearch`; researcher agents
(researcher, researcher-autumn-business, researcher-autumn-medical)
depend on this surface for primary research. Restoring it through the
bridge means the request flows through OpenFang's multi-provider chain
(Tavily → Brave → Perplexity → DDG) instead of CC's own provider, which
is also a sturdier behavior win — we control the retry/fallback policy.
Implementation is a one-line allowlist add. Every prerequisite was
already in place:
* `web_ctx` is already passed to `execute_tool` at the dispatch site.
* `web_search` is already a fully-implemented native tool with a
ToolDefinition (tool_runner.rs:645) and provider chain
(tool_runner.rs:233).
* Per-agent capability gating already honors `web_search` against the
agent.toml's resolved surface (gate 2 at bridge_ipc.rs:475).
No new context arg, no new FS sandbox concerns (pure-net tool). The
smallest bridge add to date.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Commits 9bd3e48 (13b) and 64078d0 (13d) added shell_exec and web_search to the daemon-side ALLOWED_TOOLS allowlist in bridge_ipc, but missed the two corresponding files on the bridge subprocess side: - crates/openfang-mcp-bridge/src/lib.rs :: built_in_tools() The MCP tools/list surface advertised to CC subprocesses. Without an entry here, CC never sees the tool name or schema; permitted_tools() intersects this list with the dispatcher's allowed_tools, so the daemon's allowlist alone is not enough. - crates/openfang-mcp-bridge/src/main.rs :: DEFAULT_ALLOWED Fallback allowlist used when OPENFANG_BRIDGE_ALLOWED is unset (legacy / dev spawn path). Mirrors the daemon's bridge_ipc list. Net effect: with 13a deploying the deny of CC's native Bash/WebSearch tools, CC agents would have lost shell + web search entirely on deploy. This commit closes that gap so the bridge actually delivers what 13b/13d advertised. Surface drift test (built_in_tools_surface) updated with both names. The permitted_tools intersection test now uses a synthetic 'not_a_real_tool' name since shell_exec is no longer an outsider. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bridges OpenFang's native apply_patch tool (registered in tool_runner.rs at the ToolDefinition table, dispatched at line 209) end-to-end so CC subprocesses can perform surgical multi-hunk edits instead of paying the token + drift cost of whole-file file_write rewrites. This is the closest analogue to CC's native `Edit` tool that OpenFang ships today — slightly higher emit cost (unified diff vs string substitution) but the same core ergonomic: change only what changes. A native `string_edit` follow-up may replace this as the primary edit ergonomic for agents; for now apply_patch closes the worst of the "no Edit" gap left by 13a's CC native-tool deny set. Three files, mirroring every prior bridge-tool add since 908c39c: - crates/openfang-api/src/bridge_ipc.rs • apply_patch added to ALLOWED_TOOLS • apply_patch added to FS_SANDBOXED_TOOLS — tool_apply_patch resolves Add/Update/Delete paths embedded in the patch against workspace_root, so the no-workspace fail-closed gate is critical (same sibling-leak class as file_write) • Two name-locked tests: allowlist + sandbox membership - crates/openfang-mcp-bridge/src/lib.rs • Tool::new("apply_patch", ...) added to built_in_tools() (mirrors the runtime schema by hand, as the bridge crate is runtime-free by design) • built_in_tools_surface test updated - crates/openfang-mcp-bridge/src/main.rs • apply_patch added to DEFAULT_ALLOWED No new plumbing required — workspace_root + the kernel handle are already threaded into the bridge's execute_tool call site, and the native apply_patch implementation already enforces the sandbox via crate::apply_patch::apply_patch(&ops, root). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Commit 18 closes the gaps surfaced by the deny-set audit (plan §10d). Adds 14 entries to CC_NATIVE_DENY, bringing it from 10 → 24 names, grouped into five categories: - Bash adjuncts / shell streamer: BashOutput, KillShell, KillBash, Monitor. Inert today (Bash is denied) but locked down as defense in depth — Monitor in particular is Bash-with-streaming and bypasses shell_exec's exec_policy gating entirely. - FS escape via worktree: EnterWorktree, ExitWorktree. Direct FS-escape primitive; native workspace-aware variant is on the follow-up backlog. - Notebook read: NotebookRead. Symmetry with NotebookEdit; forward-compat against future CC versions that ship it. - Skill substrate: SlashCommand. CC's parallel skill curation surface; OF's is canonical. - OpenFang-First scheduling / control plane: CronCreate, CronDelete, CronList, ScheduleWakeup, RemoteTrigger, plus PushNotification on the comms-routing axis. Deliberately NOT denied (pinned in tests): TodoWrite, Task, Skill, AskUserQuestion, EnterPlanMode, ExitPlanMode, TaskOutput, TaskStop, ToolSearch — agent-internal control flow / UX with no escape surface. Tests: existing pins extended; new test_cc_native_deny_covers_audit_gaps locks in each new addition with a category-rationale assertion. 20/20 in drivers::claude_code green; 1005/1005 openfang-runtime lib green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Locks in the invariant that the bridge tool surface is described by three co-equal lists, all of which must agree: 1. openfang_api::bridge_ipc::ALLOWED_TOOLS (daemon dispatch gate) 2. openfang_mcp_bridge::built_in_tools() (MCP advertise surface) 3. openfang_mcp_bridge::DEFAULT_ALLOWED (bridge process default) 13b/13d shipped with a tool dispatchable on the daemon side but absent from the MCP advertise surface — invisible to CC. The existing IPC smoke tests exercised the daemon path directly and never hit `tools/list`, so the gap shipped silently in two commits in a row. This commit closes that class of failure permanently. Changes: - Move `DEFAULT_ALLOWED` from `main.rs` (binary, untestable) into `lib.rs` as `pub const`, so the drift-catcher can reach all three sets from `openfang-api` tests. `main.rs` now imports it. - Add `allowlist_three_way_correspondence` in `bridge_ipc::tests` — asserts the three sets are equal, with diff output identifying which set is missing what when it fires. - Add `allowlist_count_is_sixteen` — pins cardinality on all three sets so an off-by-one add/remove on any single side fails loudly. Both tests are documented with the lesson from 13b/13d and the three-file pattern any future bridge tool add must follow. Test results: - openfang-api lib: 13/13 bridge_ipc tests pass (was 11; +2 new) - openfang-mcp-bridge lib: 5/5 - openfang-runtime lib: 1005/1005 - openfang-runtime drivers::claude_code: 20/20 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The exec_policy deny path in shell_exec and process_start formatted the
rejection message as "{reason}. Current exec_policy.mode = ...", but
validate_command_allowlist already terminates its reason string with a
period. Result was "safe_bins.. Current ..." in every deny payload.
Trim a trailing "." off the reason before interpolating so the rendered
message has a single, correctly-placed period.
Cosmetic only - no behavior change to the deny gate itself.
Closes S1-11 / S3-03 / S8-02 (FS-sandbox drift) and the FS-half of S3-01
(HTTP `/mcp` parity) by collapsing two drifted copies of the FS-sandbox
allowlist into one canonical `pub const FS_SANDBOXED_TOOLS` in
`openfang_runtime::tool_runner` (6-tool union).
Before: `bridge_ipc.rs` carried a 5-tool list (file_read, file_list,
file_write, shell_exec, apply_patch); `routes.rs` carried a 4-tool list
(file_read, file_list, file_write, create_directory). Both lists gate
whether a tool call's path args get rewritten against the caller's
`workspace_root` and whether the call fail-closes when no workspace is
registered. Drift meant:
- HTTP `/mcp` invoking `shell_exec` did **not** force cwd to
`workspace_root` — sibling-workspace / secrets.env leak.
- HTTP `/mcp` invoking `apply_patch` did **not** sandbox patch-embedded
paths — same leak surface.
- IPC invoking `create_directory` did **not** sandbox the target path
(when reachable) — same leak surface inverted.
The union closes all three holes in one place. Both surfaces now consult
the same constant; future FS-touching tools land in one file.
Side effect: `create_directory` is added to `ALLOWED_TOOLS`,
`DEFAULT_ALLOWED`, and `built_in_tools()`. The PR's stated objective is
full native-tool parity on the bridge; its absence from those three lists
was an oversight, not a deliberate gate. Audit of
`~/.openfang/agents/*/agent.toml` confirms zero references to
`create_directory` — no agent was relying on it being unreachable.
Tests:
- New `fs_sandboxed_tools_subset_of_allowed_tools` (belt-and-braces):
catches "added an FS tool to the sandbox set without putting it on
the bridge surface" or the inverse.
- Existing `allowlist_three_way_correspondence` continues to enforce
daemon-dispatch ≡ MCP-advertise ≡ bridge-default sets.
- Cardinality assertion bumped 16 → 17 in lockstep.
Test results: cargo test -p openfang-api -p openfang-mcp-bridge
-p openfang-runtime → 1170 passed, 0 failed.
… (S9-09) Tier B: PowerShell -EncodedCommand / -ec / -e payloads are base64(UTF-16LE) encoded scripts that bypass the existing inline-flag allowlist. Decode them and feed the inner script back through the same wrapper-extraction logic so they are validated against allowed_commands / safe_bins like any other inline script. Recursion is capped at MAX_SHELL_RECURSION_DEPTH = 2 (one outer wrapper plus one nested wrapper), which is more than any legitimate pwsh-in-pwsh use case needs and prevents algorithmic-DoS via deeply-nested payloads. Tier C: hard-deny load-from-disk and interactive flags on every known shell wrapper, regardless of allowlist contents — the validator cannot see what those flags will load. - pwsh / powershell: -File, -PSConsoleFile - bash / sh / zsh: --rcfile, --init-file, -i - bash / sh / zsh: -O extdebug (two-token form, handled separately) These checks run before any other parsing in validate_command_allowlist, so a flag like `bash -i -c "..."` is rejected outright rather than relying on the inline-script extractor. Internal API: extract_shell_wrapper_commands and extract_inner_script_commands now return Result<Vec<String>, String> so decode failures, depth violations, and disk-load flag violations propagate as clear errors. Public surface (validate_command_allowlist) signature is unchanged. Tests: 13 new unit tests covering both tiers, plus a roundtrip and odd-length test for the UTF-16LE decoder. Existing 50 subprocess_sandbox tests unchanged. cargo test -p openfang-runtime --lib subprocess_sandbox → 63 passed, 0 failed. Closes S9-09.
…rp tools (S9-08) Closes S9-08. Three new gates in Allowlist mode: 1. WRAPPER_BINARIES_RECURSE (env, sudo, nice, nohup, timeout). The validator now unwraps these and validates the inner binary against the allowlist — plugging the `env FOO=bar /bin/evil` and `sudo /bin/evil` bypass where only the outer wrapper was previously checked. Per-wrapper flag-parsers handle the common flag-with-value forms (`-u`, `-g`, `--unset=`, `-n`, `-s SIG`, `-k DUR`, `--`, `KEY=VALUE` env assignments, `timeout`'s DURATION positional). Fail-closed: unrecognized flag patterns or missing inner command → reject. 2. WRAPPER_BINARIES_DENY (xargs, find, strace, gdb, chroot, unshare, setsid, stdbuf, flock, time) — hard-denied regardless of allowlist contents. Sentinel-token execution (`xargs`, `find -exec`), process-tracing (`strace`, `gdb`), and namespace/buffer manipulation have no legitimate use through LLM-driven `shell_exec`. 3. INLINE_SCRIPT_INTERPRETERS — `python -c`, `node -e/--eval/-p/--print`, `perl -e/-E`, `ruby -e`. Inline scripts in these languages are not parseable for allowlist validation; hard-denied. Operators can still pass a script file path (a plain arg, not a command) or use Full mode. Recursion: wrapper-binary chains share the existing MAX_SHELL_RECURSION_DEPTH cap (2). Inner segments that are themselves shell wrappers route through `extract_shell_wrapper_inner`; nested wrapper binaries recurse through `extract_wrapper_binary_chain`. Gates also fire on segments extracted from shell-wrapper inline scripts, so `bash -c "sudo evil"` and `bash -c "xargs ls"` are caught. Tests: 25 new (88 total in subprocess_sandbox; 1044 in -runtime; 113 in -api; 33 in -mcp-bridge). All passing. No exposure change on any surface — this is allowlist-validation tightening only.
Closes S3-01 from the bridge-v2 audit. Before this commit, `shell_exec`
dispatched through the HTTP `/mcp` endpoint ran with
`exec_policy = None` and an empty hand-allowed-env list — degrading to
the daemon-global `ExecPolicy` (typically `Full` on developer setups)
and silently bypassing every per-agent manifest gate the operator had
authored. The bridge IPC path applied the gate correctly; HTTP did not.
This commit:
1. Extracts the per-agent resolution into a shared helper module
`openfang_runtime::agent_tool_context` exposing:
- `AgentExecContext::from_manifest(&AgentManifest)` — pure read-only
derivation of `exec_policy` and `hand_allowed_env` from the
manifest, matching the existing pattern at agent_loop.rs:320,
agent_loop.rs:1572, and bridge_ipc.rs:553.
- `EXEC_POLICY_REQUIRED_TOOLS` + `requires_exec_policy(tool)` —
the tight set of tools (currently `shell_exec`) whose dispatch
is unsafe without a manifest-bound policy.
2. Wires the helper into `bridge_ipc::dispatch_call`, replacing the
inline 16-line extraction with a 4-line shared call. Behavior
unchanged on the bridge IPC path.
3. Wires the helper into `routes::mcp_http`:
- Resolves the registry entry once so workspace lookup and
exec_context lookup share a manifest snapshot (no TOCTOU).
- Fail-loud (-32602) when `requires_exec_policy(tool_name)` and
no manifest-bound `exec_policy` is available — distinct error
messages for "no `_agent_id` supplied", "agent has no registry
entry", and "manifest has no `[exec_policy]`". Per Ben's call:
loud during the security-fix window so misconfigured callers
surface immediately rather than silently degrade.
- Threads `allowed_env_arg` and `effective_exec_policy` into
`execute_tool` from `AgentExecContext` instead of hard-coded
`None, None`.
Tests (7 new in `agent_tool_context::tests`):
- Default manifest → no policy, no env.
- Manifest with `exec_policy` propagates intact (mode + allowed_commands).
- `hand_allowed_env` array reads from metadata correctly.
- Malformed `hand_allowed_env` value (string instead of array) →
fail-closed to empty (NOT partial-bind).
- Empty `hand_allowed_env` array → `allowed_env()` returns `None`
(parity with bridge_ipc.rs:559's "fall through to runtime default"
semantics, not "explicitly grant nothing").
- `requires_exec_policy` flags `shell_exec` and nothing else.
- Drift-catcher: `EXEC_POLICY_REQUIRED_TOOLS` still contains
`shell_exec` (forces intentional change if someone removes it).
Verification:
- `cargo test -p openfang-runtime --lib agent_tool_context`
→ 7 passed, 0 failed.
- `cargo test -p openfang-api --lib` → 113 passed, 0 failed
(bridge_ipc tests unchanged after refactor).
- `cargo test -p openfang-runtime --lib subprocess_sandbox`
→ 88 passed (prior Stage 9 commits unaffected).
- `cargo build -p openfang-api` clean.
Files touched: 4 (1 new). +143/-19 LOC including the new helper
module and its tests.
Bridge-v2 audit: S3-01 (HIGH, block-this-PR). Stage 9 rollup
unaffected.
… routes (S6-04)
`/api/agents/{id}/message` and `/api/agents/{id}/message/stream` accepted
`sender_id` / `sender_name` from the request body and threaded them straight
into the kernel, where they land in the agent's prompt context as the
authoritative "this turn came from $sender". Neither route has a
provenance/auth path today (no `BridgeAuthority` resolve, no per-spawn
token, no signed envelope), so any local POST could forge any channel
identity — e.g. claim to be a specific WhatsApp number or Telegram user-id
and the LLM would treat the turn as that user's. This is the impersonation
primitive behind the S6-04 → S6-09 → S4-03 chain described in
`stage-6-findings.md`.
Tight fix: refuse the assertion entirely. Both HTTP routes now hard-fail
(400) with a stable error string when `sender_id` or `sender_name` is
present, and pass `None`/`None` to the kernel unconditionally. Empty
strings are treated as absence (defensive). Channel adapters (Telegram,
WhatsApp, Discord, ...) do **not** call these routes — they reach the
kernel directly via the channel router — so legitimate channel flows are
unaffected.
The follow-up that *restores* the legitimate identity-forwarding capability
under a proper trust model is filed as **FU-01** in
`_shared/openfang/findings/bridge-v2/followups.md` (Option B from the
design discussion): add a bridge-auth-style provenance path so verified
callers can supply identity, with hard-fail on mismatch, and tag inbound
messages with a remote-origin envelope (merges naturally with the S6-09 /
S5-01 family). Bundled together because that's a single design across
three patch points; cheap to do once, expensive to retrofit later.
Implementation:
- New `pub(crate) fn reject_unauthenticated_sender_identity(&MessageRequest)
-> Option<&'static str>` with stable error strings (test-pinned).
- `send_message` and `send_message_stream` both validate before kernel
dispatch; on rejection, log a structured warn with the agent id and the
reason, return 400 JSON `{"error": <reason>}`.
- Kernel calls now pass `None, None` for sender identity in both routes —
validator guarantees the fields are absent/empty, but we pass `None`
explicitly to be defensive (belt-and-suspenders).
Tests (6 new, all passing):
- `s604_no_sender_fields_is_allowed`
- `s604_empty_sender_fields_is_allowed` (empty-string == absence)
- `s604_sender_id_alone_is_rejected`
- `s604_sender_name_alone_is_rejected`
- `s604_both_fields_rejected_with_combined_message`
- `s604_reason_strings_are_stable` (pins exact error strings — breaking
change for callers if these are bumped)
`cargo test -p openfang-api --lib` → **119 passed** (113 prior + 6 new).
Closes S6-04 (HIGH, block-this-PR).
Refs: FU-01 in `_shared/openfang/findings/bridge-v2/followups.md`
… allowlist (S7-06/S4-02)
The bridge's `DEFAULT_ALLOWED` constant — used by `openfang-mcp-bridge`'s
`main.rs` when `OPENFANG_BRIDGE_ALLOWED` is unset (legacy/dev/test fallback) —
previously included `agent_spawn`, `agent_kill`, and `agent_activate`. In
the fallback path any agent could spawn/kill/activate sibling agents
regardless of its `agent.toml` capabilities.
The production runtime path is unaffected: `agent_loop.rs` already threads
the manifest-derived `available_tools` into `OPENFANG_BRIDGE_ALLOWED`, so
agents that opt in via manifest still reach these tools. The daemon-side
dispatch (`ALLOWED_TOOLS`) and MCP advertise surface (`built_in_tools()`)
retain all 17 tools so opted-in agents can still dispatch them.
Changes:
- `openfang-mcp-bridge::DEFAULT_ALLOWED` shrinks from 17 → 14 entries
(drops `agent_spawn`, `agent_kill`, `agent_activate`).
- New `openfang-mcp-bridge::PRIVILEGED_DEFAULT_DENY` lists the three
excluded tools with the rationale doc-pinned in source.
- Drift-catcher in `bridge_ipc::tests` evolves from three-way equality to:
- Invariant A (equality): `ALLOWED_TOOLS == built_in_tools()`.
- Invariant B (safe-by-default subset):
`DEFAULT_ALLOWED ⊂ ALLOWED_TOOLS`,
`PRIVILEGED_DEFAULT_DENY ⊂ ALLOWED_TOOLS`,
`DEFAULT_ALLOWED ∩ PRIVILEGED_DEFAULT_DENY = ∅`,
`DEFAULT_ALLOWED ∪ PRIVILEGED_DEFAULT_DENY = ALLOWED_TOOLS`
(so every new tool must be classified default-safe or privileged-deny).
- New name-level regression guard `privileged_lifecycle_tools_excluded_from_default`
hard-pins the three tool names so a grep for `agent_spawn` lands on the
guard.
Tests: `cargo test -p openfang-mcp-bridge --lib` → 5 passed.
`cargo test -p openfang-api --lib` → 120 passed (was 119; +1 new).
Closes S7-06 / S4-02 (bridge-side), the last block-this-PR HIGH on
`feat/bridge-tool-surface-v2`.
…andbox Defense-in-depth for the sandbox: in addition to confining agent file operations to `workspace_root`, `resolve_sandbox_path` now hard-denies a curated set of sensitive paths under the OpenFang home, regardless of the agent's workspace_root configuration. Motivation: the workspace-confinement check is sufficient only when each agent's `workspace_root` is correctly scoped to its own subdir under `workspaces/`. A misconfigured or privileged manifest (e.g. `workspace_root = "~/.openfang"` or `"~"`), or a future tool surface that bypasses workspace confinement, could otherwise read or overwrite credentials, runtime tokens, and daemon state. Generalizes jaberjaber's review feedback on the closed RightNow-AI#1162 to the right layer. Deny categories (returned as stable reason strings for audit): - config config.toml - config-backup config.toml.bak* - secrets secrets.env, .env - daemon-state daemon.json - credential-file gcp-key*, *.pem, *.key, *.p12, *.pfx - runtime-tokens run/ (bridge sockets, mcp-config-*.json) - credential-vault vault/ - paired-devices paired_devices.json - daemon-log daemon.stderr.log*, daemon.stdout.log* Workspaces, skills, bin, scripts, and other non-sensitive subdirs of the OpenFang home remain unaffected. Implementation notes: - New private `is_sensitive_openfang_path()` predicate, called after canonicalization (symlink escapes are resolved before the check). - `openfang_home()` honors `OPENFANG_HOME` env var, falls back to `$HOME/.openfang`. No new crate dependency (uses std::env only). - Denials emit `tracing::warn!` with target `openfang_runtime::sandbox`, the user-supplied path, the resolved path, and the reason string, for auditability. Tests: 8 new unit tests covering each deny category, non-sensitive passthroughs, the misconfigured-workspace_root case, and the absolute-path escape attempt. All 14 workspace_sandbox tests pass.
Clears RUSTSEC-2026-0141 (CVSS 9.1 — TLS hostname verification disabled with Boring TLS backend). Workspace uses tokio1-rustls-tls so the vuln is not exploitable in this config, but the advisory still gates cargo audit so we bump to clear CI.
Pre-flight for upstream CI on PR RightNow-AI#1205. Three clippy lints fixed alongside a workspace-wide cargo fmt --all run: doc_lazy_continuation in openfang-mcp-bridge/src/lib.rs (paragraph break before the 'Default tool allowlist' doc block), manual_inspect at four call sites in runtime/drivers/claude_code.rs (switch .map to .inspect for side-effect-only closures over Option of TempPath), and field_reassign_with_default in runtime/agent_tool_context.rs (collapsed mut policy assignment into struct-update-syntax init). No behavior changes.
Windows clippy CI was failing with -D unused-imports because `tokio::io::BufReader` and `tracing_subscriber::EnvFilter` are only referenced inside the `#[cfg(unix)] async fn main`. The items they support were already gated; the imports were not. Add `#[cfg(unix)]` to both `use` lines so the lint matches usage on non-unix targets.
Document the full local gate (fmt, clippy, test, audit) plus the cross-platform cfg-gating gotcha that bit us on RightNow-AI#1205. Anyone opening a PR should run all of this locally first so CI feedback loops stay short.
`tracing::warn!` is only invoked from `#[cfg(unix)]` blocks in this
file (signal-handler shutdown paths). On Windows, the bare `warn`
import was flagged unused under `-D warnings`.
Verified locally via cross-compile:
cargo clippy --target x86_64-pc-windows-gnu --workspace \
--all-targets -- -D warnings
Automates the Windows cross-compile sanity check we've been running manually for the last few rounds of cfg-gate fixes. Verifies the x86_64-pc-windows-gnu rustup target + mingw-w64 are present, sets linker/CC env vars, then runs clippy (default), check, or build against the Windows target. Referenced from the PR template pre-PR checklist.
|
👋 Friendly nudge on this one — it's been open ~2 weeks and I want to make it as easy as possible to pick up. Status: still I know +6.4k / 35 files is a big ask to review cold, so here's a risk map to make it scannable:
Happy to do whatever lowers the cost of review:
A fair bit of follow-on work is stacked behind this, so even a "looks fine, just needs eyes" or a pointer to what would unblock it would help a lot. Thanks for the time — I know maintainer bandwidth is the scarce resource here. 🙏 |
…_DENY Closes a subprocess-sandbox gap missed during bridge tool-surface work. Three CC-native tools bypassed the mcp-bridge enforcement seam by construction. - Task/Skill are parallel control-plane substrates. OpenFang owns orchestration via agent_spawn/agent_send and skill curation via skill_execute. This reconciles with the already-denied SlashCommand, which is the same skill room. - AskUserQuestion is headless-inert under -p/stdin-null. An explicit deny tool_result replaces the silent dead-end that plausibly fed phantom picker-failed openers. Removed the three from the must_allow test loop and added positive deny assertions. Loop-critical control flow stays allowed: TodoWrite, EnterPlanMode/ExitPlanMode, TaskOutput/TaskStop, ToolSearch. (cherry picked from commit 9b8424c8098848ce0fcac834f94f35847eebcf17)
|
Added a follow-up to this branch: What it doesExtends
What it deliberately keepsThe six loop-critical natives stay allowed: Verification
One file, +40/-8. No schema or workspace-layout changes. |
Summary
Lands the full OpenFang tool surface (file / memory / agent / shell / web /
patch) over the MCP bridge so Claude Code subprocess agents act on
OpenFang's authoritative tool implementations instead of CC's native ones.
Stage 9 of the same branch closes every audit finding raised against that
new surface — shipping the surface without those closures would ship a
known-bad feature, so they land together as a single concern: land the
bridge surface safely. 37 commits ahead of
upstream/main(
8405c4a..421e5d9); near-single-file commits; workspace tests green atHEAD; live-smoked end-to-end.
Changes
Bridge tool surface (
openfang-mcp-bridge+ driver wiring):openfang-mcp-bridgecrate; per-spawn IPC listener on the daemonside, bridge-side
RuntimeDispatcherclient.--mcp-config, gated onOPENFANG_BRIDGE_ENABLED.BridgeAuthority+TokenIssuer: Hello frames authenticated,AgentIdbound per connection,
TokenIssuerconstructed before kernel boot soautostart agents are hardened from t=0.
CompletionRequest → OPENFANG_BRIDGE_ALLOWED;execute-time permission gate honors
agent.tomlcapabilities.agent_send,file_write,web_fetch,agent_spawn/kill/activate/find,memory_store/recall,shell_exec(workspace + exec_policy gated),web_search,apply_patch(workspace sandboxed).
--settings(
CC_NATIVE_DENYexpansion), so the only path for those ops is backthrough OpenFang.
Hardening (Stage 9 — audit closures):
ee7edd5521391aFS_SANDBOXED_TOOLSgate; bridgecreate_directorye59cdaf478ac5a48a7741-EncodedCommanddecode + recurse; load-from-disk / interactive hard-deny52e13abenv/sudo/nice/nohup/timeout); hard-denyxargs/find/strace/gdb/chroot/unshare/setsid/stdbuf/flock/time; inline-interpreter-c/-edeny8c74500/mcpexec_policy parity with bridge IPC (TOCTOU closed)f01700csender_id/sender_nameb421318421e5d9421e5d9generalizes the PR #1162 review feedback (config exfil via outboundattach) to the sandbox layer, so the protection is independent of any
single tool surface and survives
workspace_rootmisconfiguration.Categories:
config/config-backup/secrets/daemon-state/credential-file(gcp-key*/*.pem/*.key/*.p12/*.pfx) /runtime-tokens(run/) /credential-vault(vault/) /paired-devices/daemon-log(rotated variants included).workspaces/,skills/,bin/,scripts/,src/pass through untouched.Dependencies: no new workspace dependencies.
421e5d9usesstd::envrather than pulling
dirs.Deferred / out of scope (tracked in maintainer notes, do not block
merge):
shell_execsubprocess bypass ofresolve_sandbox_path(
cat/headunderexec_policy = Full). Layering property, not aregression introduced by this PR. Fix shape: argv-path inspection +
default
exec_policy = Nonefor non-developer agents.AgentManifest::workspacesilent-ignore. Field defined butno loader threads it into
SandboxConfig.workspace_root; no#[serde(deny_unknown_fields)]. Fix shape: thread it through, or removethe field and add
deny_unknown_fields.session
Messagedeserialization drift) — also deferred, documented.Testing
cargo clippy --workspace --all-targets -- -D warningspassescargo test --workspacepassesNotes:
cargo build -p openfang-runtimeclean at HEAD; clippy clean on the newcode. Pre-existing warnings in
agent_tool_context.rsanddrivers/claude_code.rsare unchanged by this PR.subprocess_sandboxmodule: 88 tests passing (25 new for S9-08, 13new for S9-09, plus the three-way drift-catcher).
workspace_sandboxmodule: 14 tests passing (6 original + 8 new forthe sensitive-path predicate, including misconfigured-
workspace_rootdefense-in-depth assertions).
file_readagainst<openfang_home>/config.tomlwas denied withreason="config"fromopenfang_runtime::sandbox, while aworkspace-internal
file_readsucceeded. A second probe via athrowaway agent exercised the structured deny error surfaced to the
caller and confirmed the predicate fired with the correct reason in the
daemon log.
Security
Notes:
numbered audit finding (table above); each ships with unit tests
asserting the deny path.
AgentIdbinding,
TokenIssuerconstructed pre-boot so autostart agents aregated from t=0.
and before workspace-confinement, with auditable
tracing::warn!(target = "openfang_runtime::sandbox", ...)on every denial(user_path, resolved path, reason).
sender_id/sender_name(S6-04).shell_execargv parsing recurses through wrapper binaries(
env/sudo/nice/nohup/timeout) and hard-deniesxargs/find/strace/gdb/chroot/unshare/setsid/stdbuf/flock/time;inline-interpreter
-c/-eflags denied; pwsh-EncodedCommanddecodedand recursed.
apply_patchconfined to agent workspace.agent_spawn/killand similar) excludedfrom the bridge default allowlist; explicit per-agent grant required.
cc @jaberjaber for security review of the Stage 9 hardening commits.