fix(runtime): gate phantom-action guard on text-reply-is-delivery#1228
Open
benhoverter wants to merge 1 commit into
Open
fix(runtime): gate phantom-action guard on text-reply-is-delivery#1228benhoverter wants to merge 1 commit into
benhoverter wants to merge 1 commit into
Conversation
The phantom-action guard in `run_agent_loop` re-prompts the LLM when it
emits action-shaped text (e.g. "I sent the message to the channel.")
without calling any tool. In channel-reply paths (Discord, Telegram,
etc.), however, the bridge delivers the agent's text response verbatim
back to the originating channel — the text IS the delivery, not a
hallucinated tool call. The guard misfires on these legitimate turns
and either re-prompts the model or surfaces a "claimed action but did
not call any tools" system reminder that the next turn tries to argue
with.
Fix: thread a `text_reply_is_delivery: bool` from each kernel entry
point through `execute_llm_agent` into `run_agent_loop`, and gate the
phantom detector on `!text_reply_is_delivery`. Channel adapters
(`KernelBridgeAdapter`) opt in via new wrappers
`send_message_channel_reply{,_with_blocks}`; cron, peer-to-peer
agent_send, and API direct paths keep the default `false` and the
detector behaves unchanged for them.
The detector logic itself is unchanged — only the gating condition
becomes more conservative (never more aggressive). The streaming
variant (`run_agent_loop_streaming`) does not reference the phantom
detector, so no change there.
Plumbing:
* `agent_loop::run_agent_loop`: new last param `text_reply_is_delivery`
* `kernel::send_message_with_handle_and_blocks`: new last param
* `kernel::execute_llm_agent`: new last param, forwarded
* `kernel::send_message_channel_reply{,_with_blocks}`: new wrappers
that pass `true`
* `KernelBridgeAdapter::send_message{,_with_blocks}`: switched to
the new channel-reply wrappers
* `kernel::send_message{,_with_blocks,_with_handle}`: pass `false`
* `routes::send_message` API direct call: passes `false`
* Cron path (`send_message_with_handle` at delivery): passes `false`
Tests:
* `phantom_guard_fires_when_text_reply_is_delivery_false` — confirms
unchanged behavior for non-channel callers.
* `phantom_guard_suppressed_when_text_reply_is_delivery_true` —
confirms the bug is fixed for channel-reply callers.
Both use a `PhantomShapedDriver` that returns action-shaped text on
iteration 0 and a distinct marker on iteration 1, so the test asserts
on which turn's output is delivered.
Verified:
* `cargo test -p openfang-runtime --lib` → 995/995 pass
* `cargo test -p openfang-kernel --lib` → 289/289 pass
* `cargo test -p openfang-api --lib` → 92/92 pass
* `cargo clippy -p openfang-{runtime,kernel,api} --all-targets
-- -D warnings` → clean
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The phantom-action guard in
run_agent_loopre-prompts the model when it emitsaction-shaped text (e.g. "I sent the message to the channel.") without calling
a tool. In channel-reply paths (Discord, Telegram, …) the bridge delivers the
agent's text response verbatim back to the originating channel — the text is
the delivery, not a hallucinated tool call. The guard misfires on these
legitimate turns: it either re-prompts the model or injects a "claimed action
but did not call any tools" system reminder that the next turn then argues with.
Fix
Thread a
text_reply_is_delivery: boolfrom each kernel entry point throughexecute_llm_agentintorun_agent_loop, and gate the phantom detector on!text_reply_is_delivery.conservative (never more aggressive). Non-channel callers see identical behavior.
KernelBridgeAdapter) opt in via new wrapperssend_message_channel_reply{,_with_blocks}(passtrue).agent_send, and API-direct paths keep thedefault
false— detector behaves exactly as before for them.run_agent_loop_streaming) does not reference thephantom detector, so it is untouched.
Plumbing
agent_loop::run_agent_loop— new trailing paramtext_reply_is_deliverykernel::send_message_with_handle_and_blocks— new trailing paramkernel::execute_llm_agent— new trailing param, forwardedkernel::send_message_channel_reply{,_with_blocks}— new wrappers (passtrue)KernelBridgeAdapter::send_message{,_with_blocks}— switched to the channel-reply wrapperskernel::send_message{,_with_blocks,_with_handle}— passfalseroutes::send_message(API direct) — passesfalsesend_message_with_handle) — passesfalseTest plan
phantom_guard_fires_when_text_reply_is_delivery_false— confirms unchangedbehavior for non-channel callers.
phantom_guard_suppressed_when_text_reply_is_delivery_true— confirms the bugis fixed for channel-reply callers.
PhantomShapedDriverreturning action-shaped text on iteration 0 anda distinct marker on iteration 1, asserting on which turn's output is delivered.
Verified locally:
cargo test -p openfang-runtime --lib→ 995/995 passcargo test -p openfang-kernel --lib→ 289/289 passcargo test -p openfang-api --lib→ 92/92 passcargo clippy -p openfang-{runtime,kernel,api} --all-targets -- -D warnings→ clean