Skip to content

kernel: thread channel callback context through agent loop (replaces global DashMap)#1220

Open
pbranchu wants to merge 2 commits into
RightNow-AI:mainfrom
pbranchu:kernel-context-threading
Open

kernel: thread channel callback context through agent loop (replaces global DashMap)#1220
pbranchu wants to merge 2 commits into
RightNow-AI:mainfrom
pbranchu:kernel-context-threading

Conversation

@pbranchu

Copy link
Copy Markdown
Contributor

Summary

Refactors `KernelHandle::send_message` to plumb `ChannelCallbackContext` as a parameter through the agent loop instead of via a global `DashMap<AgentId, ChannelCallbackContext>`. Eliminates a cross-user/channel callback bleed race that would have surfaced once async dispatch tools land.

This is the architectural foundation PR in a three-PR series that splits the work from the closed #1066:

  1. `a2a-streaming` (a2a: switch synchronous a2a_send to SSE streaming (tasks/sendSubscribe) #1219) — SSE streaming for the existing sync tool
  2. This PR — kernel context threading (additive API, no caller breakage)
  3. `a2a-async-dispatch` — async tools built on top of PRs 1 + 2

The race this fixes

In the closed #1066 design, the kernel held `channel_contexts: DashMap<AgentId, ChannelCallbackContext>` keyed by agent ID. `dispatch_message` would write the context for the dispatched agent before calling `send_message`. Two concurrent dispatches to the same agent from different users/channels would clobber the context — and the async tool, reading the context inside the agent loop, could deliver the callback to the wrong channel.

The proper fix is to thread the context through the call path, not store it globally. Once it's a parameter, the type system enforces isolation.

Design

Strictly additive — does not change existing `send_message` signatures, so all current callers compile unchanged:

  • New trait method `KernelHandle::send_message_with_context(agent_id, text, callback_context: Option)` with a default impl that delegates to `send_message`
  • New kernel method `OpenFangKernel::send_message_with_context`
  • New bridge method `ChannelBridgeHandle::send_message_in_conversation_with_context`
  • Context flows: `dispatch_message` → `send_message_with_context` → `send_message_with_handle_and_blocks_and_context` → `run_agent_loop` → `execute_tool(callback_context: Option<&ChannelCallbackContext>)`
  • Tools that need the callback context receive it as a parameter, not by reading a global
  • Adds a stub `KernelHandle::inject_async_callback` trait method (default returns `Err`) — real impl lives in the async dispatch PR

The `_callback_context` parameter on `execute_tool` is unused in this PR (prefixed with underscore to silence warnings). It is consumed by the third PR. This is a deliberate stacked-PR pattern for reviewability.

Retry path in `dispatch_message` rebuilds the context with the new agent ID when re-resolution succeeds, so the post-retry agent doesn't inherit the dead agent's identity.

Tests

  • New `test_concurrent_send_message_with_context_no_crossbleed` confirms the type-level isolation
  • 939 runtime + 504 channels tests pass on two consecutive runs (no flakes)
  • `cargo clippy --workspace --tests --all-targets -- -D warnings` clean

Cargo.lock: bumps `lettre` to 0.11.22 (RUSTSEC-2026-0141).

Notes for reviewer

  • This PR adds plumbing that has no caller in this PR. The motivation is to land the architectural change before the async tools that consume it.
  • The chosen additive design (parallel `*_with_context` methods rather than breaking signatures) keeps blast radius small. Old `send_message` still works; new `send_message_with_context` is for the channel-callback path.

Test plan

  • CI passes
  • Local: `cargo test -p openfang-runtime -p openfang-channels --lib` passes

Philippe Branchu and others added 2 commits May 31, 2026 20:35
The previous design proposal stored a per-agent ChannelCallbackContext in a
global DashMap on the kernel and read it back from tools. That structure
caused cross-user callback bleed whenever the same agent was messaged by
two channels concurrently — DashMap::insert is atomic but offers no
isolation between the write and the later read inside the agent loop.

The proper fix is to never store the context globally: pass it as an owned
parameter from the bridge dispatch site, down through the agent loop, to
the tool runner. Each concurrent invocation owns its own context value,
so isolation is enforced by the type system rather than by ad-hoc locks.

- Add `openfang_types::ChannelCallbackContext` (channel, recipient,
  display name, thread_id, agent_id).
- Add `ChannelBridgeHandle::send_message_with_context` with a default
  impl that ignores the context and falls back to `send_message`.
- `KernelBridgeAdapter` overrides it to call the new
  `OpenFangKernel::send_message_with_context` →
  `send_message_with_handle_and_blocks_and_context` →
  `execute_llm_agent` plumbing path.
- `run_agent_loop` / `run_agent_loop_streaming` gain a
  `callback_context: Option<ChannelCallbackContext>` parameter that is
  forwarded to `tool_runner::execute_tool` as `Option<&...>`.
- `execute_tool` accepts the context as a new (currently-unused) param;
  the async-dispatch PR will wire it into `tool_a2a_send_async`.
- `dispatch_message` (channels bridge) builds the context from the
  incoming message and calls `send_message_with_context`.
- `KernelHandle::inject_async_callback` declared here with a default
  Err — production implementation lands with the async-dispatch PR.
- New test `test_concurrent_send_message_with_context_no_crossbleed`:
  two concurrent same-agent dispatches with distinct contexts each
  receive their own context back, confirming ownership-based isolation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant