Add persistent Python REPL tool for Templates by datvo06 · Pull Request #687 · BasisResearch/effectful

datvo06 · 2026-06-11T21:38:28Z

Closes #678.

Adds a generic code-execution tool a Template's LLM can call to run Python before producing a final answer. Few desiderata:
(1) linked to the Template's lexical context
(2) persistent state across tool calls
(3) redirected output streams.

Example.

from effectful.handlers.llm import Template, LiteLLMProvider
from effectful.handlers.llm.completions import PythonRepl
from effectful.handlers.llm.evaluation import UnsafeEvalProvider
from effectful.ops.semantics import handler
from effectful.ops.types import NotHandled

readings = [12, 19, 23, 31, 8, 27]

@Template.define
def outlier_count() -> int:
    """Use the `exec_code` tool to compute how many values in `readings`
    lie more than one population stdev from the mean; return that count."""
    raise NotHandled

with handler(LiteLLMProvider()), handler(UnsafeEvalProvider()), handler(PythonRepl()):
    n = outlier_count()

The LLM runs code across rounds with state persisting:

exec_code("m = ...; s = ...")   ->  ""        # readings came from lexical scope
exec_code("print(...using m, s...)")  ->  "2\n"   # m, s persisted from the prior call

ReplSession (evaluation.py): a plain class seeded from a lexical context. Each run(source) executes one complete snippet in exec mode through the parse/compile/exec effect operations (so the installed eval-provider owns sandboxing), captures stdout/stderr into a per-call buffer, and persists bindings in self.locals across calls. Per-snippet filenames keep each cell's source in linecache so cross-snippet tracebacks format correctly; the traceback formatter trims the effect-machinery frames so the LLM sees only its own code.

PythonRepl (completions.py): a collect_tools handler exposing an exec_code Tool bound to a session that persists for the Template invocation. Off by default. Sessions are keyed by id(env) and pruned via weakref.finalize (no leak, no id-reuse), created lazily on first use.

The exec op docstring now states its binding-effects contract: after exec(bytecode, env), env reflects all top-level bindings — new and rebound alike.

Fenced to UnsafeEvalProvider (#685). RestrictedEvalProvider drops rebindings of seeded names and lacks RestrictedPython's print wiring. The two restricted REPL tests are xfail(strict) against #685 and flip to xpass when it lands.

Tests. 12 ReplSession laws and 7 PythonRepl handler tests, deterministic under UnsafeEvalProvider (seed, persistence, rebind, multi-statement, print, syntax-error, exception isolation, KeyboardInterrupt-propagation, traceback trim, cross-snippet traceback, reentrancy, effect-routing; off-by-default, exposes-tool, name-collision, composes-with-LexicalReaders, lazy creation, same-session-across-rounds, distinct-env). Plus one replayed gpt-4o-mini integration test where the model genuinely uses exec_code across rounds to compute a statistic.

Decoupled from LexicalReaders by default; coupling REPL-created symbols into the readers is noted as a follow-up.

Closes #678. Adds a generic code-execution tool a Template's LLM can call to run Python before producing a final answer, with the three properties #678 asked for: linked to the Template's lexical context, persistent state across tool calls, and redirected output streams. - `ReplSession` (evaluation.py): a plain class seeded from a lexical context. Each `run(source)` executes one complete snippet in exec mode through the `parse`/`compile`/`exec` effect operations (so the installed eval-provider owns sandboxing), captures stdout/stderr into a per-call buffer, and persists bindings in `self.locals` across calls. Per-snippet filenames keep each cell's source in linecache so cross-snippet tracebacks format correctly; `_format_user_traceback` trims the effect-machinery frames so the LLM sees only its own code. - `PythonRepl` (completions.py): a `collect_tools` handler exposing an `exec_code` Tool bound to a session that persists for the Template invocation. Off by default. Sessions are keyed by `id(env)` and pruned via `weakref.finalize`, created lazily on first use. - The `exec` op docstring now states its binding-effects contract (after exec, env reflects all top-level bindings — new and rebound). v1 requires `UnsafeEvalProvider`. `RestrictedEvalProvider` drops rebindings of seeded names and lacks RestrictedPython's print wiring (#685); the two restricted REPL tests are xfail(strict) against #685 and will flip to xpass when it lands. Tests: 12 `ReplSession` laws + 7 `PythonRepl` handler tests (deterministic, under UnsafeEvalProvider) plus one replayed gpt-4o-mini integration test where the model uses `exec_code` across rounds to compute a statistic.

datvo06 · 2026-06-11T22:05:41Z

This doesn't close #685 yet, as it is right now. I'm still deciding how persistent the state between different call this really is (shared between one agent, or all agents and all templates). Also there might be a new problem with OpenAI trimming headers. Investigating.

eb8680 · 2026-06-12T15:19:39Z

how persistent the state between different call this really is (shared between one agent, or all agents and all templates)

I think you only want it to persist for the duration of one Template call, not to share across multiple calls or multiple agents.

datvo06 · 2026-06-12T18:52:36Z

I see, that is simplifying! On it now.

datvo06 · 2026-06-12T19:13:56Z

Another question that's coming up is if there are nested template:

@Template.define
def foo():
      """Some prompt"""
      raise NotHandled


@Template.define
def bar():
      """Some prompt"""
      raise NotHandled

Then will the lexical context/envs be shared between foo and bar if bar() calls exec() and then foo() as tools? My current take is that it will be separated, bar() calling exec() will have no effect on foo() calling exec() down in the call stack.

eb8680 · 2026-06-12T20:45:47Z

Code should be understood to live in the lexical context of the relevant Template body. The body of bar can read the shared lexical context across foo, bar and whatever else is in the same scope, as well as any arguments to bar (none in this example), but any new variables are local to the body of bar and hence invisible to foo.

ReplSession subclasses the stdlib code.InteractiveInterpreter (compile routed through the parse/compile ops for linecache, runcode through the exec op, tracebacks trimmed to the user's own frames) instead of a plain class; PythonRepl._session_for drops its weakref guard. Adds tests for lexical-scope isolation (new bindings stay local to the body, invisible to siblings) and nested-env session isolation.

datvo06 marked this pull request as draft June 11, 2026 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add persistent Python REPL tool for Templates#687

Add persistent Python REPL tool for Templates#687
datvo06 wants to merge 2 commits into
masterfrom
dn-678-repl-tool

datvo06 commented Jun 11, 2026 •

edited

Loading

Uh oh!

datvo06 commented Jun 11, 2026

Uh oh!

eb8680 commented Jun 12, 2026

Uh oh!

datvo06 commented Jun 12, 2026

Uh oh!

datvo06 commented Jun 12, 2026 •

edited

Loading

Uh oh!

eb8680 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

datvo06 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datvo06 commented Jun 11, 2026

Uh oh!

eb8680 commented Jun 12, 2026

Uh oh!

datvo06 commented Jun 12, 2026

Uh oh!

datvo06 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eb8680 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

datvo06 commented Jun 11, 2026 •

edited

Loading

datvo06 commented Jun 12, 2026 •

edited

Loading