feat(openclaw-plugin): add SCCS tool-output compression integration by HaotianChen616 · Pull Request #1547 · volcengine/OpenViking

HaotianChen616 · 2026-04-17T16:28:29Z

Description

为 OpenViking OpenClaw 插件新增 SCCS（Shared Context Caching System），在 context engine assemble 阶段自动压缩过长的 tool output，替换为紧凑摘要 + REF_ID 占位符，Agent 可通过 fetch_original_data 工具按需取回原始数据。

Related Issue

Resolves #1548

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update

Changes Made

新增 SCCS 模块（`examples/openclaw-plugin/sccs/`）

compressor.ts — 压缩引擎，遍历 messages 中的 tool output，超过阈值时压缩为 REF_ID 占位符；包含 OpenClaw 配置文件白名单豁免逻辑
summarizer.ts — 智能摘要提取器，自动识别 6 种内容类型（JSON / Markdown / 表格 / 日志 / 代码 / 纯文本）并采用针对性摘要策略
storage.ts — 引用存储，MemoryStore（内存 FIFO 淘汰）+ DiskBackedStore（磁盘持久化 + 内存缓存，支持 TTL 过期）
ref-tool.ts — fetch_original_data 工具定义，Agent 通过 REF_ID 取回原始数据
integration.ts — 集成入口，createSccsIntegration() 返回 wrapContextEngine 装饰器和 tool 定义
utils.ts — 工具函数（md5Hex、hasRefId、normalizeRefId、extractTextContent、estimateTokens 等）

集成到 OpenViking 插件

config.ts — 新增 7 个 SCCS 配置项及默认值、最小值约束
openclaw.plugin.json — 新增对应 UI hints 和 JSON Schema 定义
index.ts — 调用 createSccsIntegration，通过 sccs.wrapContextEngine(baseEngine) 接入 context engine；注册 fetch_original_data 工具

设计概要

核心思路

Agent 与工具交互时，tool output 往往远超 Agent 实际需要的信息量（如完整文件内容、大量日志、长 JSON 数组等），这些冗余内容占用大量上下文窗口 token。SCCS 的核心思路是：通过引用替换，将大工具输出卸载到上下文窗口之外，同时保留完整按需取回的能力 — 将长输出替换为 REF_ID 引用 + 紧凑摘要，原始数据持久化存储，Agent 需要细节时可随时通过 fetch_original_data 取回。

架构：Decorator 模式接入 Context Engine

SCCS 通过 Decorator 模式包装 OpenViking context engine，仅拦截 assemble() 方法：

graph TB
    subgraph OpenClaw Framework
        A[Agent Loop]
        B[Tool Execution]
    end

    subgraph OpenViking Plugin
        C[Context Engine]
        D[SCCS Layer]
        E[fetch_original_data Tool]
    end

    A -->|assemble| C
    C -->|wrap| D
    D -->|compressed messages| A
    A -->|tool call| B
    B -->|tool output| C
    A -->|need detail| E
    E -->|REF_ID lookup| D

    style D fill:#e1f5fe,stroke:#0288d1
    style E fill:#e1f5fe,stroke:#0288d1

其他方法（ingest、compact、afterTurn）均透传，零侵入。sccsEnabled=false 时 wrapContextEngine 为恒等函数，完全无开销。

压缩流程

assemble() 返回原始 messages 后，SCCS 逐条扫描
跳过非 tool role、低于阈值、已含 REF_ID、属于配置文件白名单的消息
对需要压缩的消息调用 SummaryExtractor 生成智能摘要
以原文 MD5 哈希作为 REF_ID，将原文存入 DiskBackedStore
替换 message content 为 [REF_ID: xxx] (Summary: ...) 占位符
若有压缩发生，注入 REF_ID_INSTRUCTION 到 systemPromptAddition，指导 Agent 何时应 fetch 原始数据

flowchart LR
    A[assemble 返回<br>原始 messages] --> B{遍历每条 message}
    B --> C{是 tool role?}
    C -->|No| B
    C -->|Yes| D{长度 > 阈值?<br>非配置文件?<br>无 REF_ID?}
    D -->|No| B
    D -->|Yes| E[智能摘要]
    E --> F[原文存入 RefStore]
    F --> G[替换为<br>REF_ID + Summary]
    G --> B
    B -->|遍历结束| H{有压缩?}
    H -->|Yes| I[注入 REF_ID_INSTRUCTION<br>到 systemPromptAddition]
    H -->|No| J[原样返回]

智能摘要（6 种内容类型）

SummaryExtractor 自动检测内容类型并针对性摘要：

类型	检测方式	摘要策略
JSON	`JSON.parse` 成功	提取 array 长度/类型分布、object key 列表/字段采样
Markdown	标题 + 列表/代码块/链接特征	提取标题列表、代码块语言、列表/链接数量
表格	首行含分隔符 + 列数一致	提取行数/列数/表头
日志	日志级别关键词 + 时间戳模式	统计各级别数量、时间范围、错误类型、关键行
代码	花括号/分号 + 关键字密度	提取函数名列表、import 语句
纯文本	以上均不匹配	统计长度/行数/错误/路径数量 + 关键词 + 首行 + 关键行

所有摘要最终截断到 summaryMaxChars（默认 300 字符）。

引用存储：内存 + 磁盘双层

┌─────────────────────────────────┐
│         DiskBackedStore         │
│  ┌───────────┐  ┌────────────┐  │
│  │MemoryStore│  │  Disk fs   │  │
│  │ (FIFO,   │  │ refs/*.json│  │
│  │ maxEntries)│  │ (TTL过期)  │  │
│  └─────┬─────┘  └─────┬──────┘  │
│        │              │         │
│   get: 先查内存    未命中→读磁盘  │
│   set: 写内存      异步写磁盘    │
└─────────────────────────────────┘

MemoryStore：FIFO 淘汰，受 maxEntries 限制（默认 10000）
DiskBackedStore：内存 miss 时从磁盘恢复；set() 先写内存再异步写磁盘
TTL 过期：get() 时检查 expiresAt，过期则返回 null 并删除磁盘文件

防护机制

幂等性：hasRefId() 检测已压缩消息，防止重复压缩
配置文件豁免：SOUL.md、MEMORY.md、USER.md 等 OpenClaw 配置文件通过首行白名单检测跳过压缩
最小值约束：所有数值型配置项均有 Math.max 下限保护，防止不合理配置

配置项

参数	类型	默认值	最小值	说明
`sccsEnabled`	boolean	`false`	—	总开关
`sccsCompressThreshold`	number	`3000`	`2000`	压缩阈值（字符数）
`sccsSummaryMaxChars`	number	`300`	`50`	摘要最大字符数
`sccsEnableSmartSummary`	boolean	`true`	—	启用智能摘要
`sccsStorageTtlSeconds`	number	`86400`	`600`	存存活时间（秒）
`sccsStorageDir`	string	`~/.openclaw/sccs`	—	磁盘存储目录
`sccsMaxEntries`	number	`10000`	`1000`	内存最大条目数

Testing

Token 节省效果（GLM-4.7 模型）

使用不同场景测试 SCCS 对 assemble 返回 token 数的影响：

用例	Baseline avg (tokens)	SCCS avg (tokens)	节省比例
test1	208,015	96,856	-53.44%
test2	203,010	84,893	-58.18%
test3	89,063	60,299	-32.30%
test4	166,669	82,872	-50.28%
test5	96,891	81,312	-16.08%

每个用例均运行 3 次，取平均值。Baseline 为关闭 SCCS，SCCS 为开启 SCCS。

结论：SCCS 实现显著的 token 节省，平均节省 42.06%。对于 tool output 较多的大型任务（test1/test2/test4），节省比例可达 50%+；对于触发了解引用 fetch_original_data 的任务（test5），仍有 16% 的优化。

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested this on the following platforms:
- Linux
- macOS
- Windows

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

默认关闭

SCCS 默认关闭（sccsEnabled: false），不影响现有用户。需在插件配置中显式开启。

配置示例

{
  "sccsEnabled": true,
  "sccsCompressThreshold": 3000,
  "sccsSummaryMaxChars": 300,
  "sccsEnableSmartSummary": true,
  "sccsStorageTtlSeconds": 86400,
  "sccsStorageDir": "~/.openclaw/sccs",
  "sccsMaxEntries": 10000
}

Add Shared Context Caching System (SCCS) to the OpenViking context engine plugin. SCCS compresses large tool outputs into compact summaries with REF_ID placeholders, allowing agents to fetch original data on demand via the fetch_original_data tool. Key components: - compressor: detects and compresses oversized tool outputs - summarizer: smart content-type-aware summary extraction (JSON, markdown, tables, logs, code, plain text) - storage: disk-backed ref store with TTL expiration - integration: wraps context engine assemble() with compression layer - ref-tool: fetch_original_data tool for retrieving original outputs Configuration: - sccsEnabled (default: false), sccsCompressThreshold (default: 3000) - sccsSummaryMaxChars (default: 300), sccsEnableSmartSummary (default: true) - sccsStorageTtlSeconds (default: 86400), sccsStorageDir, sccsMaxEntries Also includes: - Whitelist for OpenClaw config files (SOUL.md, MEMORY.md, etc.) to prevent compression of critical agent context

github-actions · 2026-04-17T16:29:48Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ❌ 0288 - Not compliant Non-compliant requirements: Fix issue 1: Restore viking:// URI prefix in read operations Fix issue 2: Replace deprecated is_leaf field with level=2
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Potential Data Loss DiskBackedStore.set() uses fire-and-forget disk writes without waiting for completion. If the process exits immediately after compression, REF_ID entries may not be persisted to disk, leading to fetch_original_data failures. async set(refId: string, content: string, ttlSeconds: number): Promise<void> { await this.memory.set(refId, content, ttlSeconds); const expiresAt = Date.now() + Math.max(1, ttlSeconds) * 1000; const path = this.pathFor(refId); void (async () => { try { await mkdir(join(this.dir, "refs"), { recursive: true }); await writeFile(path, JSON.stringify({ content, expiresAt }), "utf8"); } catch { // best-effort } })(); } Possible Content Loss setTextContent() always replaces message content with an array of a single text block, discarding any non-text content blocks (e.g., images) that may exist in the original tool output. export function setTextContent(message: MessageLike, text: string): MessageLike { return { ...message, content: [{ type: "text", text }] }; }

github-actions · 2026-04-17T16:30:19Z

PR Code Suggestions ✨

No code suggestions found for the PR.

Mijamind719 · 2026-04-18T01:23:09Z

I found two functional issues in the SCCS integration.

fetch_original_data currently allows path traversal outside the SCCS refs directory.
- normalizeRefId() accepts any trimmed string when the input is not a [REF_ID: ...] token (examples/openclaw-plugin/sccs/utils.ts).
- DiskBackedStore.pathFor() then does join(this.dir, "refs", ${refId}.json) with that unchecked value (examples/openclaw-plugin/sccs/storage.ts).
- So a call like ref_ids: ["../../outside"] resolves to a file outside .../refs/, and get() will read it if it looks like { content, expiresAt } JSON.

I reproduced this locally with a temporary store directory and a sibling outside.json; fetch_original_data returned the content from that escaped path.

Impact: the new tool can read (and, on expired entries, delete) JSON files outside the intended SCCS storage area.

DiskBackedStore.set() returns before the disk write completes, so a fresh store / restarted process can immediately lose access to a just-issued REF.
- set() awaits the in-memory write, but the file write is fire-and-forget inside void (async () => ...)() (examples/openclaw-plugin/sccs/storage.ts).
- That means compressToolMessages() can replace a tool output with a REF_ID before durable storage exists.
- A later fetch_original_data in a new store instance (or after a quick restart) can legitimately return <not found or expired> even though compression already happened.

I reproduced this locally by delaying writeFile(): set() resolved, a fresh DiskBackedStore immediately returned null, and only succeeded after the delayed write finished.

Impact: SCCS can hand the model a REF_ID that is not actually recoverable yet.

I did not find any SCCS-specific tests in this PR, so both issues appear to be uncovered right now. I’d recommend:

validating ref_ids to a strict hash format before building the path, and
making disk persistence part of the awaited set() path (or explicitly documenting / handling the non-durable window).

github-project-automation bot added this to OpenViking project Apr 17, 2026

github-project-automation bot moved this to Backlog in OpenViking project Apr 17, 2026

github-actions bot added the Review effort 3/5 label Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openclaw-plugin): add SCCS tool-output compression integration#1547

feat(openclaw-plugin): add SCCS tool-output compression integration#1547
HaotianChen616 wants to merge 1 commit intovolcengine:mainfrom
HaotianChen616:main

HaotianChen616 commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

Mijamind719 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HaotianChen616 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Changes Made

新增 SCCS 模块（examples/openclaw-plugin/sccs/）

集成到 OpenViking 插件

设计概要

核心思路

架构：Decorator 模式接入 Context Engine

压缩流程

智能摘要（6 种内容类型）

引用存储：内存 + 磁盘双层

防护机制

配置项

Testing

Token 节省效果（GLM-4.7 模型）

Checklist

Additional Notes

默认关闭

配置示例

Uh oh!

github-actions bot commented Apr 17, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Apr 17, 2026

PR Code Suggestions ✨

Uh oh!

Mijamind719 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HaotianChen616 commented Apr 17, 2026 •

edited

Loading

新增 SCCS 模块（`examples/openclaw-plugin/sccs/`）