Skip to content

test(sessions): add replay consistency harness for session memory summaries#120

Open
YAO-001 wants to merge 9 commits into
trpc-group:mainfrom
YAO-001:codex/issue-89-replay-consistency
Open

test(sessions): add replay consistency harness for session memory summaries#120
YAO-001 wants to merge 9 commits into
trpc-group:mainfrom
YAO-001:codex/issue-89-replay-consistency

Conversation

@YAO-001

@YAO-001 YAO-001 commented Jul 4, 2026

Copy link
Copy Markdown

Closes #89

English

Summary

This PR adds a replay consistency harness for Session / Memory / Summary backends.

It replays the same normalized Session / Memory / Summary trajectories across multiple backends, compares the persisted snapshots, and emits structured diff reports for backend-specific drift. The default lightweight matrix runs InMemory against temporary SQLite. Optional integration paths are available through TRPC_AGENT_REPLAY_SQL_URL and TRPC_AGENT_REPLAY_REDIS_URL; when those environment variables are absent, the external backends are skipped instead of failing local or CI runs.

What changed

  • Added a reusable replay consistency test framework under tests/sessions/replay_consistency/.
  • Added 14 public replay cases, including single-turn text conversation, multi-turn append/order preservation, tool call and tool response round trip, state overwrite behavior, memory write/read behavior, memory isolation, summary generation, summary update/overwrite, summary with event truncation, duplicate/error recovery, nested serialization order normalization, session list consistency, temporary state filtering, and truncation preserving recent context.
  • Added a JSONL replay manifest at tests/sessions/replay_cases/session_memory_summary_replay_cases.jsonl.
  • Added normalization for backend-unstable fields such as timestamps, generated identifiers, storage metadata, and serialization order.
  • Kept business fields strict, including event order/content, tool arguments, tool responses, state values, memory text/scope, summary text, summary session ownership, and summary overwrite semantics.
  • Added structured report generation with backend pair/status, case-level diff counts, false-positive summary, mutation detection summary, session id, event index / memory index / summary id, field path, and left/right values.
  • Added real replay snapshot mutation checks, not only synthetic comparator checks. The mutation test injects faults into normalized replay snapshots and verifies that they are detected with precise diff context.
  • Covered summary-specific regressions: dropped summary, stale/incorrect summary overwrite, and summary attached to the wrong session.

Reports

Checked-in examples:

  • session_memory_summary_diff_report.json
    • normal replay report
    • 14 replay cases
    • default pair: InMemory vs SQLite
    • 0 unexpected normal-case diffs
  • tests/sessions/replay_consistency/session_memory_summary_mutation_report.json
    • mutation replay example
    • detected mutations are reported with field paths and left/right values
    • includes summary wrong-session and tool response drift examples

Backend behavior

Default mode:

  • InMemory
  • temporary SQLite

Optional integration mode:

  • TRPC_AGENT_REPLAY_SQL_URL
  • TRPC_AGENT_REPLAY_REDIS_URL

External SQL / Redis are not required for default local runs or CI. If the environment variables are absent, the report records them as skipped.

Validation

Validated locally:

python -m json.tool session_memory_summary_diff_report.json
python -m json.tool tests/sessions/replay_consistency/session_memory_summary_mutation_report.json
git diff --check
python -m pytest tests/sessions/test_replay_consistency.py -q
python -m pytest tests/sessions -q

中文

摘要

本 PR 为 Session / Memory / Summary 后端新增 replay consistency harness。

它会在多个后端上回放同一组经过规范化的 Session / Memory / Summary 轨迹,比较持久化后的 snapshot,并输出结构化 diff report,用于发现后端间的行为漂移。默认轻量矩阵运行 InMemory 与临时 SQLite。外部 SQL 和 Redis 可通过 TRPC_AGENT_REPLAY_SQL_URLTRPC_AGENT_REPLAY_REDIS_URL 启用;当这些环境变量不存在时,外部后端会被记录为 skipped,而不会导致本地或 CI 失败。

变更内容

  • tests/sessions/replay_consistency/ 下新增可复用 replay consistency 测试框架。
  • 新增 14 个公开 replay case,覆盖单轮文本、多轮追加顺序、工具调用与响应 round trip、状态覆盖、memory 读写、memory 隔离、summary 生成、summary 更新/覆盖、summary 事件截断、重复与错误恢复、嵌套序列化顺序规范化、session list 一致性、临时状态过滤,以及截断后保留最近上下文等场景。
  • 新增 JSONL replay manifest:tests/sessions/replay_cases/session_memory_summary_replay_cases.jsonl
  • 对时间戳、生成式 id、存储元数据、序列化顺序等后端不稳定字段做规范化。
  • 对业务字段保持严格比较,包括事件顺序/内容、工具参数、工具响应、状态值、memory 文本/作用域、summary 文本、summary session 归属以及 summary overwrite 语义。
  • 新增结构化 report 输出,包括后端 pair/status、case 级 diff 统计、false-positive summary、mutation detection summary、session id、event/memory/summary 上下文、字段路径,以及左右值。
  • 新增真实 replay snapshot mutation 检测,不只依赖 synthetic comparator case。mutation 测试会向规范化后的 replay snapshot 注入故障,并验证 precise diff context。
  • 覆盖 summary 专项回归:summary 丢失、stale/错误 summary overwrite、summary 绑定到错误 session。

报告

已提交示例:

  • session_memory_summary_diff_report.json
    • normal replay report
    • 14 个 replay case
    • 默认 pair:InMemory vs SQLite
    • normal case unexpected diff 为 0
  • tests/sessions/replay_consistency/session_memory_summary_mutation_report.json
    • mutation replay 示例
    • 已检测 mutation 会包含字段路径和左右值
    • 包含 summary wrong-session 和 tool response drift 示例

后端行为

默认模式:

  • InMemory
  • 临时 SQLite

可选集成模式:

  • TRPC_AGENT_REPLAY_SQL_URL
  • TRPC_AGENT_REPLAY_REDIS_URL

默认本地运行和 CI 不要求外部 SQL / Redis。环境变量缺失时,report 会将它们记录为 skipped。

验证

本地已验证:

python -m json.tool session_memory_summary_diff_report.json
python -m json.tool tests/sessions/replay_consistency/session_memory_summary_mutation_report.json
git diff --check
python -m pytest tests/sessions/test_replay_consistency.py -q
python -m pytest tests/sessions -q

@YAO-001

YAO-001 commented Jul 4, 2026

Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@codecov

codecov Bot commented Jul 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@73655ab). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             main        #120   +/-   ##
==========================================
  Coverage        ?   87.53323%           
==========================================
  Files           ?         467           
  Lines           ?       44005           
  Branches        ?           0           
==========================================
  Hits            ?       38519           
  Misses          ?        5486           
  Partials        ?           0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

构建 Session / Memory 多后端回放一致性测试框架

1 participant