feat(examples): add evaluation optimization closed-loop example by YAO-001 · Pull Request #119 · trpc-group/trpc-agent-python

YAO-001 · 2026-07-04T14:42:58Z

English

This PR implements issue #91 with a reproducible Evaluation + Optimization closed-loop example.

Highlights:

Adds examples/optimization/eval_optimize_loop with train/validation evalsets, baseline prompt, optimizer config, README, DESIGN, and example reports.
Provides a deterministic fake mode that runs without API keys and covers baseline train/validation evaluation, failure attribution, candidate optimization, validation regression, per-case delta, configurable gate, and audit artifacts.
Rejects overfit candidates that improve train results but regress validation/protected cases.
Adds SDK mode using real AgentOptimizer and TargetPrompt integration. SDK mode maps OptimizeResult aggregate metrics, cost, duration, token usage, best prompts, and round summaries into the same report.
Keeps SDK wrapper gate config separate from SDK OptimizeConfigFile via --gate-config.
Supports multiple TargetPrompt fields without requiring system_prompt.
Writes strict JSON and append-only audit artifacts with input hashes, prompt hashes, per-field prompt snapshots, diffs, case results, gate reasons, and reproducibility command.
Source prompt write-back is disabled by default; --update-source is required.
Adds tests for fake hidden-sample generalization, gate rejection paths, failure attribution, report schema, SDK adapter wiring, SDK aggregate gate, audit safety, run-id validation, strict JSON, and non-finite numeric rejection.

Validation:

python -m compileall examples/optimization/eval_optimize_loop
python -m pytest examples/optimization/eval_optimize_loop/tests
python examples/optimization/eval_optimize_loop/run_pipeline.py --mode fake --trace --output-dir /tmp/eval-optimize-loop-fake
python examples/optimization/eval_optimize_loop/run_pipeline.py --fake-model --fake-judge --trace --output-dir /tmp/eval-optimize-loop-legacy

中文

本 PR 实现 issue #91，新增一个可复现的 Evaluation + Optimization 闭环示例。

重点：

新增 examples/optimization/eval_optimize_loop，包含 train/validation evalsets、baseline prompt、optimizer config、README、DESIGN 和示例报告。
提供确定性的 fake mode，无需 API key 即可运行完整流程，覆盖 baseline train/validation 评测、失败归因、候选优化、验证集退化、per-case delta、可配置 gate 和审计产物。
gate 会拒绝 train 提升但 validation/protected case 退化的过拟合候选。
新增 SDK mode，接入真实 AgentOptimizer 和 TargetPrompt。SDK mode 将 OptimizeResult 的聚合指标、成本、耗时、token usage、best prompts 和 round summary 映射到同一份审计报告中。
SDK wrapper gate config 通过 --gate-config 与 SDK OptimizeConfigFile 解耦。
支持多个 TargetPrompt 字段，不强制要求 system_prompt。
输出严格 JSON，并写入 append-only 审计产物，包括输入哈希、prompt 哈希、分字段 prompt 快照、diff、case results、gate reasons 和 reproducibility command。
默认不回写源 prompt；必须显式传入 --update-source。
测试覆盖 fake hidden-sample 泛化、gate 拒绝路径、失败归因、报告 schema、SDK adapter wiring、SDK aggregate gate、审计安全、run-id 校验、严格 JSON 和非有限数值拒绝。

验证：

python -m compileall examples/optimization/eval_optimize_loop
python -m pytest examples/optimization/eval_optimize_loop/tests
python examples/optimization/eval_optimize_loop/run_pipeline.py --mode fake --trace --output-dir /tmp/eval-optimize-loop-fake
python examples/optimization/eval_optimize_loop/run_pipeline.py --fake-model --fake-judge --trace --output-dir /tmp/eval-optimize-loop-legacy

github-actions · 2026-07-04T14:43:08Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

codecov · 2026-07-04T14:46:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@73655ab). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             main        #119   +/-   ##
==========================================
  Coverage        ?   87.51506%           
==========================================
  Files           ?         467           
  Lines           ?       44005           
  Branches        ?           0           
==========================================
  Hits            ?       38511           
  Misses          ?        5494           
  Partials        ?           0

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

YAO-001 · 2026-07-04T14:50:36Z

I have read the CLA Document and I hereby sign the CLA

YAO-001 added 9 commits July 4, 2026 18:07

Add eval optimize loop example

88baed0

Strengthen eval optimize loop report checks

2646e37

Harden eval optimize loop example

be2401d

Polish eval optimize loop SDK path

843de99

Map SDK optimize aggregates into eval report

680369c

Decouple SDK wrapper gate config

d6c22e9

Polish SDK prompt audit handling

46b0774

Harden SDK audit reproducibility

227827c

Harden eval optimize audit inputs

270f9a4

Rook1ex added a commit to trpc-group/cla-database that referenced this pull request Jul 4, 2026

@YAO-001 has signed the CLA in trpc-group/trpc-agent-python#119

85985ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(examples): add evaluation optimization closed-loop example#119

feat(examples): add evaluation optimization closed-loop example#119
YAO-001 wants to merge 9 commits into
trpc-group:mainfrom
YAO-001:codex/issue-91-eval-optimize-loop

YAO-001 commented Jul 4, 2026

Uh oh!

github-actions Bot commented Jul 4, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jul 4, 2026

Uh oh!

YAO-001 commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

YAO-001 commented Jul 4, 2026

English

中文

Uh oh!

github-actions Bot commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jul 4, 2026

Codecov Report

Uh oh!

YAO-001 commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jul 4, 2026 •

edited

Loading