v137.21.2 — Deterministic Latency & Throughput Budget Ledger#291
Conversation
There was a problem hiding this comment.
Sorry @EmergentMonk, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e052d4455
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds an additive orchestration-side “latency & throughput budget ledger” intended to produce deterministic, replay-safe JSON + hashes + receipts for timing/throughput budget accounting and validation.
Changes:
- Introduces
LatencyThroughputBudgetLedgerand related frozen dataclasses with canonical JSON + SHA-256 hashing helpers. - Implements scenario normalization, metric computation, receipt building, validation, replay comparison, and summary helpers.
- Adds a dedicated pytest suite covering determinism, normalization behavior, hashing, and validator safety.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/qec/orchestration/latency_throughput_budget_ledger.py |
New ledger implementation: normalization, metric calculation, canonical serialization, receipts, validation, replay compare, and summary. |
tests/test_latency_throughput_budget_ledger.py |
New tests for determinism, normalization, canonical JSON round-trip, hashing, validator behavior, and replay comparisons. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def build_budget_receipt(ledger: LedgerLike) -> BudgetReceipt: | ||
| normalized = ledger if isinstance(ledger, LatencyThroughputBudgetLedger) else run_latency_throughput_budget_ledger(**build_latency_throughput_scenario(ledger)) | ||
| timing_exceeded = normalized.advisory_state == "budget_violation" | ||
| receipt_body = { | ||
| "ledger_version": normalized.ledger_version, | ||
| "advisory_state": normalized.advisory_state, | ||
| "logical_replay_identity": "logical-replay-unchanged", | ||
| "logical_outputs_valid": True, | ||
| "timing_budget_exceeded": timing_exceeded, | ||
| "composite_budget_pressure": normalized.composite_budget_pressure, |
There was a problem hiding this comment.
build_budget_receipt() sets timing_budget_exceeded based on advisory_state == "budget_violation". This can be false even when the timing budget is actually exceeded (e.g., a single latency sample > latency_budget_ms but composite_pressure remains in "budget_pressure"). If this field is intended to represent actual timing budget exceedance, derive it directly from the normalized timing_series vs latency_budget_ms (or from latency_budget_compliance < 1.0) rather than from the advisory band.
| def validate_latency_throughput_budget_ledger(ledger: LedgerLike) -> Dict[str, Any]: | ||
| violations = [] | ||
| try: | ||
| normalized = _as_ledger(ledger) | ||
| if normalized.ledger_version != LEDGER_VERSION: | ||
| violations.append("ledger version drift") | ||
| if tuple(metric.metric_name for metric in normalized.budget_analysis) != METRIC_ORDER: | ||
| violations.append("metric ordering drift") | ||
| for metric in normalized.budget_analysis: | ||
| if not (0.0 <= metric.metric_value <= 1.0): | ||
| violations.append(f"metric out of bounds: {metric.metric_name}") | ||
| if normalized.advisory_state not in ADVISORY_STATES: | ||
| violations.append("invalid advisory state") | ||
| if normalized.budget_receipt.ledger_hash != normalized.ledger_hash: | ||
| violations.append("receipt ledger hash mismatch") | ||
| if normalized.budget_receipt.receipt_hash != build_budget_receipt(normalized).receipt_hash: | ||
| violations.append("receipt hash drift") | ||
| except Exception as exc: # nosec - validator must never raise |
There was a problem hiding this comment.
validate_latency_throughput_budget_ledger() never recomputes the expected ledger_hash from the normalized ledger body, so a mapping input can supply a forged ledger_hash (and matching forged receipt_hash) and still pass validation. To make the validator actually replay-safe, recompute the canonical ledger hash from the normalized content (similar to validate_bounded_autonomy_ledger) and report drift when provided ledger_hash != expected_hash (and likewise base receipt_hash drift on the expected ledger hash).
| def compare_budget_replay(left: LedgerLike, right: LedgerLike) -> Dict[str, Any]: | ||
| try: | ||
| left_ledger = _as_ledger(left) | ||
| right_ledger = _as_ledger(right) | ||
| except Exception as exc: | ||
| return { | ||
| "replay_stable": False, | ||
| "violations": (f"normalization failure: {type(exc).__name__}:{exc}",), | ||
| "left_hash": None, | ||
| "right_hash": None, | ||
| } | ||
|
|
||
| violations = [] | ||
| if left_ledger.ledger_hash != right_ledger.ledger_hash: | ||
| violations.append("ledger hash mismatch") | ||
| if left_ledger.budget_receipt.receipt_hash != right_ledger.budget_receipt.receipt_hash: | ||
| violations.append("receipt hash mismatch") | ||
| if left_ledger.advisory_state != right_ledger.advisory_state: | ||
| violations.append("advisory mismatch") | ||
|
|
There was a problem hiding this comment.
compare_budget_replay() compares the stored ledger_hash/receipt_hash fields, but because _as_ledger() accepts these hashes from mapping inputs without verifying they match the content, a tampered mapping can be made to look replay-stable by forging hashes. Compare should be based on recomputed hashes from canonical content (or ensure _as_ledger/validation recomputes and overwrites hashes) so replay stability reflects the actual series/requirements/metrics.
| ledger_version = _normalize_text(raw.get("ledger_version"), fallback.ledger_version) | ||
| advisory_state = _normalize_text(raw.get("advisory_state"), fallback.advisory_state) | ||
| logical_replay_identity = _normalize_text(raw.get("logical_replay_identity"), fallback.logical_replay_identity) | ||
| logical_outputs_valid = bool(raw.get("logical_outputs_valid", fallback.logical_outputs_valid)) | ||
| timing_budget_exceeded = bool(raw.get("timing_budget_exceeded", fallback.timing_budget_exceeded)) | ||
| composite_budget_pressure, _ = _normalize_float(raw.get("composite_budget_pressure"), fallback.composite_budget_pressure) |
There was a problem hiding this comment.
_deserialize_budget_receipt() uses bool(raw.get(...)) for logical_outputs_valid and timing_budget_exceeded. This will treat non-empty strings like "False" as True, which can silently flip semantics on malformed inputs. Consider only accepting actual bool values (isinstance(x, bool)) and otherwise falling back, or normalizing common string/int representations explicitly.
|
@copilot apply changes based on the comments in this thread |
…, hash integrity in validator and compare Agent-Logs-Url: https://github.com/QSOLKCB/QEC/sessions/d4ff5d3d-2b6c-4def-8da7-5dd8dc13f4c5 Co-authored-by: EmergentMonk <73952179+EmergentMonk@users.noreply.github.com>
Applied all four changes in ec62413:
|
Motivation
Description
src/qec/orchestration/latency_throughput_budget_ledger.pyimplementing frozen dataclassesLatencyThroughputSample,BudgetMetric,BudgetReceipt, andLatencyThroughputBudgetLedgerwithto_dict(),to_canonical_json(), andstable_hash()methods.build_latency_throughput_scenario,run_latency_throughput_budget_ledger,validate_latency_throughput_budget_ledger,build_budget_receipt,compare_budget_replay, andsummarize_latency_throughput_budget, including deterministic normalization for missing/malformed/NaN/Inf values, stable metric ordering, advisory-state ladder, and replay-safe receipts.tests/test_latency_throughput_budget_ledger.pyexercising determinism, canonical JSON round-trip, stable hashing, validator safety, malformed input handling (including NaN/Inf sentinels), metric bounds and ordering, all four advisory bands, replay comparison stability, no input mutation, decoder-untouched confirmation, receipt determinism, summary content, empty-sample behavior, and deterministic ordering of samples.src/qec/orchestration/latency_throughput_budget_ledger.py,tests/test_latency_throughput_budget_ledger.py.Testing
pytest -q tests/test_latency_throughput_budget_ledger.pyand observed all tests passing:19 passed, 0 failed.Codex Task