Skip to content

feat: introduce UntrustedContext layer with channel topic support #578

@chaodu-agent

Description

@chaodu-agent

Description

Introduce an UntrustedContext layer to openab so that external, user-editable metadata — starting with Discord channel topics — can be safely injected into downstream ACP agent prompts.

Today, openab's SenderContext carries only trusted, structured identity fields (sender_id, sender_name, channel_id, etc.). There is no mechanism to pass channel-level metadata like the channel topic or description. This means agents running on openab have no awareness of the channel's stated purpose — information that other platforms already provide.

Use Cases

  1. Channel-scoped behavior — A channel topic like "Production alerts only — respond in English" tells the agent the expected tone, language, and scope without per-channel config.
  2. Project context — Teams using Discord forum channels set topics like "Planning and coordination for Project X". The agent can use this to stay on-topic.
  3. Self-documenting channels — Operators can steer agent behavior by editing the channel topic in Discord UI, no config file changes needed.
  4. Security — Channel topics are user-editable by anyone with MANAGE_CHANNELS permission. They must not be treated as trusted system instructions, or they become a prompt injection vector.

Prior Art Investigation

We investigated how OpenClaw and Hermes Agent handle channel topics:

OpenClaw (openclaw/openclaw, commit 7e13f3f)

OpenClaw has a mature UntrustedContext system with sanitization + isolation:

  1. Read topic from Discord channel objectchannel-access.ts L49-51: resolveDiscordChannelTopicSafe() safely reads channel.topic.

  2. Pass through message handlermessage-handler.process.ts L310: channelTopic: channelInfo?.topic.

  3. Wrap as untrusted contentinbound-context.ts L59-66: Uses buildUntrustedChannelMetadata() to wrap the topic with security boundaries.

  4. Sanitization pipelinechannel-metadata.ts and external-content.ts:

    • Wraps content in <<<EXTERNAL_UNTRUSTED_CONTENT id="random">>> markers with random IDs to prevent spoofing
    • Strips LLM special tokens (<|im_start|>, [INST], etc. — 20+ patterns)
    • Neutralizes Unicode homoglyph attacks (fullwidth chars, angle bracket variants)
    • Detects prompt injection patterns (ignore previous instructions, you are now a..., etc.)
    • Truncates to 400 chars/entry, 800 chars total
  5. Isolation — The untrusted context is placed in a separate UntrustedContext[] array, never mixed into GroupSystemPrompt. The test at inbound-context.test.ts and message-handler.inbound-context.test.ts explicitly verify that a topic like "Ignore system instructions" stays in the untrusted layer.

  6. Default on, no opt-out — Channel topic injection is always active for guild channels; DMs skip it automatically (isGuild: false).

Hermes Agent (NousResearch/hermes-agent, commit 192e7eb)

Hermes Agent also supports channel topics, but with a simpler (less secure) approach:

  1. SessionSource.chat_topicsession.py L86: chat_topic: Optional[str] = None.

  2. Discord reads topicdiscord.py L3031-3038: _get_effective_topic() reads channel.topic, falls back to parent forum topic for threads.

  3. Injected as plain text in system promptsession.py L282-283: **Channel Topic:** {topic} — directly in the system prompt with no untrusted wrapping or sanitization.

Key Differences

Aspect OpenClaw Hermes Agent openab (current)
Reads channel topic
Untrusted context layer ✅ (sanitize + isolate) ❌ (plain text in system prompt) ❌ (no layer exists)
Prompt injection protection ✅ (boundary markers, token stripping, homoglyph defense) N/A
Forum thread fallback ✅ (via resolveDiscordChannelInfoSafe) ✅ (via _get_effective_topic) N/A

Proposed Design for openab

Architecture

                    openab — Proposed UntrustedContext Layer
 ═══════════════════════════════════════════════════════════════════

  Discord Message
       │
       ▼
 ┌───────────────────────────────────────────────────────────────┐
 │  discord.rs                                                    │
 │                                                                │
 │  msg.channel_id.to_channel(&ctx.http).await                    │
 │       │                                                        │
 │       ▼                                                        │
 │  GuildChannel {                                                │
 │    topic: Some("Production alerts — respond in English"),      │
 │    name: "ops-alerts",                                         │
 │    thread_metadata, parent_id, owner_id, ...                   │
 │  }                                                             │
 │       │                                                        │
 │       │  gc.topic already available at line ~387               │
 │       │  (currently only thread_metadata/parent_id used)       │
 │       ▼                                                        │
 │  ┌─────────────────────────────────────────────────────┐      │
 │  │  UntrustedContext::wrap(source, label, content)     │      │
 │  │                                                     │      │
 │  │  1. Truncate (400 char/entry, 800 total)           │      │
 │  │  2. Strip LLM special tokens                       │      │
 │  │  3. Neutralize boundary marker spoofing            │      │
 │  │  4. Wrap in random-ID boundary markers             │      │
 │  └──────────────────────┬──────────────────────────────┘      │
 │                         │                                      │
 └─────────────────────────┼──────────────────────────────────────┘
                           │
                           ▼
 ┌───────────────────────────────────────────────────────────────┐
 │  ACP Prompt Assembly (adapter.rs)                              │
 │                                                                │
 │  content_blocks: [                                             │
 │    Text {                                                      │
 │      "<sender_context>{...}</sender_context>"     ← trusted   │
 │                                                                │
 │      "<<<EXTERNAL_UNTRUSTED_CONTENT id=\"a1b2\">>>"           │
 │      "Source: Channel metadata"                                │
 │      "---"                                                     │
 │      "UNTRUSTED channel metadata (discord)"                   │
 │      "Discord channel topic:"                                  │
 │      "Production alerts — respond in English"                  │
 │      "<<<END_EXTERNAL_UNTRUSTED_CONTENT id=\"a1b2\">>>"       │
 │                                                    ← isolated │
 │      "user's actual message"                                   │
 │    },                                                          │
 │    Image { ... },   ← if image attached                       │
 │  ]                                                             │
 └──────────────────────┬────────────────────────────────────────┘
                        │
                        ▼
 ┌───────────────────────────────────────────────────────────────┐
 │  Downstream ACP Agent (Kiro CLI / Claude Code / etc)          │
 │                                                                │
 │  Agent sees channel topic as reference context,                │
 │  but knows it's untrusted external content.                    │
 └───────────────────────────────────────────────────────────────┘

Protection Layers

The UntrustedContext wrapper provides sanitize + isolate defense-in-depth:

Layer What it does Why
Truncation 400 chars/entry, 800 chars total Limit attack surface
LLM token stripping Remove <|im_start|>, [INST], <s>, etc. Prevent chat-template escape
Homoglyph folding Normalize fullwidth chars, Unicode angle brackets Prevent visual spoofing of markers
Marker sanitization Replace spoofed <<<EXTERNAL_UNTRUSTED_CONTENT>>> Prevent boundary escape
Random boundary IDs <<<EXTERNAL_UNTRUSTED_CONTENT id="random_hex">>> Attacker can't predict end marker
Isolation Separate from system prompt / sender_context Semantic separation for LLM

Files Changed

File Change Description
src/untrusted.rs NEW UntrustedContext wrapper — sanitize, truncate, wrap with boundary markers (~100 lines)
src/adapter.rs MOD Add untrusted_context: Option<Vec<String>> to prompt assembly
src/discord.rs MOD Read gc.topic and gc.name from the existing to_channel() call at L387, wrap as untrusted
src/gateway.rs MOD Add topic: Option<String> and name: Option<String> to ChannelInfo for Telegram/LINE
gateway/src/main.rs MOD Pass topic/name in gateway ChannelInfo

Future Extensions

Once the UntrustedContext layer exists, it can carry more than just channel topics:

  • Channel name — useful for agents to know where they are
  • Slack channel purpose/description
  • Telegram group description
  • Webhook payloads (email bodies, CI notifications)
  • Forum thread tags

References

Discord: https://discord.com/channels/1491295327620169908/1497578757333057577/1497930272635748473

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions