Skip to content

v137.21.2 — Deterministic Latency & Throughput Budget Ledger#291

Merged
EmergentMonk merged 3 commits intomainfrom
codex/implement-latency-and-throughput-budget-ledger
Apr 15, 2026
Merged

v137.21.2 — Deterministic Latency & Throughput Budget Ledger#291
EmergentMonk merged 3 commits intomainfrom
codex/implement-latency-and-throughput-budget-ledger

Conversation

@EmergentMonk
Copy link
Copy Markdown

Motivation

  • Provide a narrow, additive side-band ledger that deterministically accounts for latency and throughput budgets for replay-safe validation while preserving the decoder boundary and logical outputs.
  • Guarantee canonical JSON serialization, stable SHA-256 receipts, bounded metrics, deterministic ordering, and a validator that never raises on malformed inputs.

Description

  • Add src/qec/orchestration/latency_throughput_budget_ledger.py implementing frozen dataclasses LatencyThroughputSample, BudgetMetric, BudgetReceipt, and LatencyThroughputBudgetLedger with to_dict(), to_canonical_json(), and stable_hash() methods.
  • Implement required APIs: build_latency_throughput_scenario, run_latency_throughput_budget_ledger, validate_latency_throughput_budget_ledger, build_budget_receipt, compare_budget_replay, and summarize_latency_throughput_budget, including deterministic normalization for missing/malformed/NaN/Inf values, stable metric ordering, advisory-state ladder, and replay-safe receipts.
  • Add tests at tests/test_latency_throughput_budget_ledger.py exercising determinism, canonical JSON round-trip, stable hashing, validator safety, malformed input handling (including NaN/Inf sentinels), metric bounds and ordering, all four advisory bands, replay comparison stability, no input mutation, decoder-untouched confirmation, receipt determinism, summary content, empty-sample behavior, and deterministic ordering of samples.
  • Exact files changed: src/qec/orchestration/latency_throughput_budget_ledger.py, tests/test_latency_throughput_budget_ledger.py.

Testing

  • Ran pytest -q tests/test_latency_throughput_budget_ledger.py and observed all tests passing: 19 passed, 0 failed.
  • Test coverage includes deterministic repeated runs, stable hash reproducibility, canonical JSON round-trip, validator never raising, malformed input normalization, metric bounds + fixed order, all advisory bands, replay comparison, and no input mutation.
  • Merge readiness: ready to merge as an additive orchestration-side feature that preserves decoder integrity, enforces determinism, and is fully exercised by the included tests.

Codex Task

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @EmergentMonk, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e052d4455

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/qec/orchestration/latency_throughput_budget_ledger.py Outdated
Comment thread src/qec/orchestration/latency_throughput_budget_ledger.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an additive orchestration-side “latency & throughput budget ledger” intended to produce deterministic, replay-safe JSON + hashes + receipts for timing/throughput budget accounting and validation.

Changes:

  • Introduces LatencyThroughputBudgetLedger and related frozen dataclasses with canonical JSON + SHA-256 hashing helpers.
  • Implements scenario normalization, metric computation, receipt building, validation, replay comparison, and summary helpers.
  • Adds a dedicated pytest suite covering determinism, normalization behavior, hashing, and validator safety.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/qec/orchestration/latency_throughput_budget_ledger.py New ledger implementation: normalization, metric calculation, canonical serialization, receipts, validation, replay compare, and summary.
tests/test_latency_throughput_budget_ledger.py New tests for determinism, normalization, canonical JSON round-trip, hashing, validator behavior, and replay comparisons.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +358 to +367
def build_budget_receipt(ledger: LedgerLike) -> BudgetReceipt:
normalized = ledger if isinstance(ledger, LatencyThroughputBudgetLedger) else run_latency_throughput_budget_ledger(**build_latency_throughput_scenario(ledger))
timing_exceeded = normalized.advisory_state == "budget_violation"
receipt_body = {
"ledger_version": normalized.ledger_version,
"advisory_state": normalized.advisory_state,
"logical_replay_identity": "logical-replay-unchanged",
"logical_outputs_valid": True,
"timing_budget_exceeded": timing_exceeded,
"composite_budget_pressure": normalized.composite_budget_pressure,
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_budget_receipt() sets timing_budget_exceeded based on advisory_state == "budget_violation". This can be false even when the timing budget is actually exceeded (e.g., a single latency sample > latency_budget_ms but composite_pressure remains in "budget_pressure"). If this field is intended to represent actual timing budget exceedance, derive it directly from the normalized timing_series vs latency_budget_ms (or from latency_budget_compliance < 1.0) rather than from the advisory band.

Copilot uses AI. Check for mistakes.
Comment on lines +490 to +507
def validate_latency_throughput_budget_ledger(ledger: LedgerLike) -> Dict[str, Any]:
violations = []
try:
normalized = _as_ledger(ledger)
if normalized.ledger_version != LEDGER_VERSION:
violations.append("ledger version drift")
if tuple(metric.metric_name for metric in normalized.budget_analysis) != METRIC_ORDER:
violations.append("metric ordering drift")
for metric in normalized.budget_analysis:
if not (0.0 <= metric.metric_value <= 1.0):
violations.append(f"metric out of bounds: {metric.metric_name}")
if normalized.advisory_state not in ADVISORY_STATES:
violations.append("invalid advisory state")
if normalized.budget_receipt.ledger_hash != normalized.ledger_hash:
violations.append("receipt ledger hash mismatch")
if normalized.budget_receipt.receipt_hash != build_budget_receipt(normalized).receipt_hash:
violations.append("receipt hash drift")
except Exception as exc: # nosec - validator must never raise
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_latency_throughput_budget_ledger() never recomputes the expected ledger_hash from the normalized ledger body, so a mapping input can supply a forged ledger_hash (and matching forged receipt_hash) and still pass validation. To make the validator actually replay-safe, recompute the canonical ledger hash from the normalized content (similar to validate_bounded_autonomy_ledger) and report drift when provided ledger_hash != expected_hash (and likewise base receipt_hash drift on the expected ledger hash).

Copilot uses AI. Check for mistakes.
Comment on lines +521 to +540
def compare_budget_replay(left: LedgerLike, right: LedgerLike) -> Dict[str, Any]:
try:
left_ledger = _as_ledger(left)
right_ledger = _as_ledger(right)
except Exception as exc:
return {
"replay_stable": False,
"violations": (f"normalization failure: {type(exc).__name__}:{exc}",),
"left_hash": None,
"right_hash": None,
}

violations = []
if left_ledger.ledger_hash != right_ledger.ledger_hash:
violations.append("ledger hash mismatch")
if left_ledger.budget_receipt.receipt_hash != right_ledger.budget_receipt.receipt_hash:
violations.append("receipt hash mismatch")
if left_ledger.advisory_state != right_ledger.advisory_state:
violations.append("advisory mismatch")

Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare_budget_replay() compares the stored ledger_hash/receipt_hash fields, but because _as_ledger() accepts these hashes from mapping inputs without verifying they match the content, a tampered mapping can be made to look replay-stable by forging hashes. Compare should be based on recomputed hashes from canonical content (or ensure _as_ledger/validation recomputes and overwrites hashes) so replay stability reflects the actual series/requirements/metrics.

Copilot uses AI. Check for mistakes.
Comment on lines +321 to +326
ledger_version = _normalize_text(raw.get("ledger_version"), fallback.ledger_version)
advisory_state = _normalize_text(raw.get("advisory_state"), fallback.advisory_state)
logical_replay_identity = _normalize_text(raw.get("logical_replay_identity"), fallback.logical_replay_identity)
logical_outputs_valid = bool(raw.get("logical_outputs_valid", fallback.logical_outputs_valid))
timing_budget_exceeded = bool(raw.get("timing_budget_exceeded", fallback.timing_budget_exceeded))
composite_budget_pressure, _ = _normalize_float(raw.get("composite_budget_pressure"), fallback.composite_budget_pressure)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_deserialize_budget_receipt() uses bool(raw.get(...)) for logical_outputs_valid and timing_budget_exceeded. This will treat non-empty strings like "False" as True, which can silently flip semantics on malformed inputs. Consider only accepting actual bool values (isinstance(x, bool)) and otherwise falling back, or normalizing common string/int representations explicitly.

Copilot uses AI. Check for mistakes.
@EmergentMonk
Copy link
Copy Markdown
Author

@copilot apply changes based on the comments in this thread

…, hash integrity in validator and compare

Agent-Logs-Url: https://github.com/QSOLKCB/QEC/sessions/d4ff5d3d-2b6c-4def-8da7-5dd8dc13f4c5

Co-authored-by: EmergentMonk <73952179+EmergentMonk@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI commented Apr 15, 2026

@copilot apply changes based on the comments in this thread

Applied all four changes in ec62413:

  1. _deserialize_budget_receipt() bool handling — added _normalize_bool helper that accepts bool, int, and common string representations ("true"/"false", "yes"/"no", "1"/"0"), falling back to the default for anything else. "False" strings no longer silently flip to True.

  2. build_budget_receipt() timing_budget_exceeded — now derived from latency_budget_compliance < 1.0 (true if any timing sample exceeded the budget), rather than advisory_state == "budget_violation". Cases like budget_pressure with partial latency overruns are now correctly flagged.

  3. validate_latency_throughput_budget_ledger() hash recomputation — added _compute_ledger_hash helper; validator now recomputes the expected hash from normalised content and checks: ledger hash drift (stored hash ≠ expected), receipt ledger hash drift (receipt's ledger_hash ≠ expected), and receipt hash drift (based on expected hash, not the potentially forged stored hash). Forged-hash bypass is no longer possible.

  4. compare_budget_replay() content-based comparison — added _compute_receipt_hash helper; both sides now compared via recomputed hashes from canonical content, so replay stability reflects actual series/requirements/metrics rather than stored hash fields. Tests updated to verify content-change detection.

@EmergentMonk EmergentMonk merged commit ab43c54 into main Apr 15, 2026
1 check passed
@EmergentMonk EmergentMonk deleted the codex/implement-latency-and-throughput-budget-ledger branch April 15, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants