feat(service): ADR-0059 — async query rebuild with persistent checkpoints (#650)#656
Draft
NickSeagull wants to merge 4 commits into
Draft
feat(service): ADR-0059 — async query rebuild with persistent checkpoints (#650)#656NickSeagull wants to merge 4 commits into
NickSeagull wants to merge 4 commits into
Conversation
Proposes a layered solution for #650: persistent QueryObjectStore.Postgres, per-query checkpoint table updated transactionally with object writes, async rebuild with /healthz + /readyz, chunked event reads, KnownHash- driven per-query replay on schema change, and observability counters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…no z), add perf + concurrency test sections Maintainer feedback (#656): - position lives inside query_object_store row (ADR-0006 Snapshot pattern); drop separate checkpoint table - use /health + /ready (no z suffix) to match ADR-0025 convention; per-query degraded mode via X-Query-Status header - add §Performance testing with 8 named suites matching EventStore.Postgres rigor - add §Concurrency & correctness testing with hazards H1-H9 and named test counterparts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…toEncoding) - §1: data-classification paragraph clarifies state_json inherits the data-classification of source events (same JSONB surface as eventstore.events under ADR-0004); encryption-at-rest is a Postgres deployment concern, not an application-schema one. (resolves sec-04-001) - §1: persisted query state must implement toEncoding (compile-time QueryStateSerializable constraint); toJSON-only rejected. Avoids materialising an intermediate Value tree on the per-event hot path. (resolves perf-05-004) - §3: Readiness gains 'Failed Text' constructor; per-query rebuild timeout (default 5 min, configurable via RebuildOptions). On timeout or updater exception, /ready reports failure distinctly so the orchestrator stops flapping; per-query endpoint returns 503 with X-Query-Status: failed. Updater exceptions logged at WARN. (resolves sec-04-002) - Endpoint contract block updated for the new failed-state shape. perf-05-007 (profiling baselines) deferred to phase 13 — declaring assertion shapes is the ADR's job; concrete numbers require code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…query rebuild Adds stub implementations and hspec test files for ADR-0059 (async query rebuild with persistent checkpoints). All 101 new tests compile and fail against `panic "not implemented"` sentinels, ready for phase 10 implementation. New stubs: - Service.QueryObjectStore.Postgres — QueryObjectStoreConfig instance, typed sentinel - Service.Transport.Web.Readiness — /ready endpoint handler stub - Service.Query.Subscriber — rebuildFrom, rebuildAllAsync, readinessOf, readinessOfQuery stubs - Service.Application — useQueryObjectStore, useReadinessEndpoint, withoutReadinessEndpoint stubs New test modules (all red): - Service.QueryObjectStore.PostgresSpec - Service.Query.Subscriber.ReadinessSpec - Service.Transport.Web.ReadinessSpec - Service.Application.ReadinessBuilderSpec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft PR for review of ADR-0059 (Proposed) addressing #650 — query rebuild currently blocks HTTP readiness on every restart because
Subscriber.rebuildAllruns synchronously before transports bind, against an in-memoryQueryObjectStorewith no persistent checkpoint.This PR currently contains the ADR only. Implementation will land in follow-up commits on this branch as the feature pipeline advances through phases 4–17.
Chosen design (Option 1)
QueryObjectStore.Postgres— mirrorsSnapshotCache.Postgresprecedent (ADR-0006).atomicUpdate. Per-query, not global: new queries replay from 0; existing queries resume fromcheckpoint + 1.AsyncTask.runspawns rebuild;/healthzis 200 immediately,/readyzflips when subscriber readiness flips. Per-query readiness lets individual query endpoints return 503 +X-Query-Status: rebuildingwhile still catching up.Limit 9223372036854775807with paged reads (default 1000/page) for memory bound + progress logging.KnownHash-driven schema-evolution path — hash mismatch forces full replay for the affected query only.query.rebuild.events_replayed,lag_from_head,duration_secondsperqueryName.Tradeoffs to weigh during review
Application.useQueryCheckpointis opt-in and defaults to InMemory. Should it auto-promote to Postgres wheneveruseQueryObjectStoreis Postgres? ADR currently chooses explicit-over-implicit.KnownHashmismatch silently clears the affected query's checkpoint and re-replays. An operator might prefer an explicit migration gate.Test plan
pipeline.py approve 3)/healthzis 200 quickly regardless of event count; per-query endpoint returns 503 + documented header during rebuild🤖 Generated with Claude Code