Code execution in MCPProxy: JS sandbox for orchestrating multiple MCP servers #627
Dumbris
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Pre-submission Checklist
What would you like to share?
I’ve been experimenting with the "code execution with MCP" pattern Anthropic described for MCP: presenting servers as code APIs instead of exposing every tool directly, so agents can write code to orchestrate tools and keep most data processing off the context window.
In the current ecosystem, most MCP clients still load all tool definitions up front into context, exposing them via direct tool-calling syntax; this is exactly the behaviour Anthropic calls out as a source of context bloat.
The spec also allows a higher-level pattern where a search_tools tool is added so the model can look up relevant definitions instead of eagerly loading everything.
I’ve implemented a related approach in MCPProxy: in addition to search_tools, there is a new code_execution tool that lets the LLM orchestrate multiple upstream MCP servers using sandboxed JavaScript. The client sees a single "run JS" tool; inside the sandbox, the agent gets:
The heavy JSON payloads and intermediate steps stay in the JS VM; the model only sees the short script and the final output.
Sandbox & security:
The JS VM is heavily restricted: no filesystem, no network, no Node.js modules, no timers, no environment variables—just ES5.1+ built-ins and call_tool. There are configurable timeouts, optional call-count limits, server allowlists, and integration with MCPProxy’s quarantine system so you can strictly control which MCP servers are reachable from code.
Web UI & observability:
All code_execution calls show up in MCPProxy’s Web UI: the JS that was executed, every nested upstream tool call, timing, and basic token metrics. I’ll attach a screenshot in this discussion so you can see how the call stack looks.
Conceptually, this is meant to sit alongside patterns like search_tools: instead of loading many tools into context, or searching them all the time, you expose a narrow "code execution + discovery" surface and let the agent import only what it needs and process data locally before sending a compact result back.
Give it a try, feedback welcome!
Relevant Links
Beta Was this translation helpful? Give feedback.
All reactions