Skip to content

feat(service): ADR-0059 — async query rebuild with persistent checkpoints (#650)#656

Draft
NickSeagull wants to merge 4 commits into
mainfrom
feature/issue-650-async-query-rebuild
Draft

feat(service): ADR-0059 — async query rebuild with persistent checkpoints (#650)#656
NickSeagull wants to merge 4 commits into
mainfrom
feature/issue-650-async-query-rebuild

Conversation

@NickSeagull
Copy link
Copy Markdown
Member

Summary

Draft PR for review of ADR-0059 (Proposed) addressing #650 — query rebuild currently blocks HTTP readiness on every restart because Subscriber.rebuildAll runs synchronously before transports bind, against an in-memory QueryObjectStore with no persistent checkpoint.

This PR currently contains the ADR only. Implementation will land in follow-up commits on this branch as the feature pipeline advances through phases 4–17.

Chosen design (Option 1)

  1. Persistent QueryObjectStore.Postgres — mirrors SnapshotCache.Postgres precedent (ADR-0006).
  2. Per-query checkpoint table — updated transactionally with the object write inside atomicUpdate. Per-query, not global: new queries replay from 0; existing queries resume from checkpoint + 1.
  3. Async rebuild + readiness gateAsyncTask.run spawns rebuild; /healthz is 200 immediately, /readyz flips when subscriber readiness flips. Per-query readiness lets individual query endpoints return 503 + X-Query-Status: rebuilding while still catching up.
  4. Chunked reads — replace Limit 9223372036854775807 with paged reads (default 1000/page) for memory bound + progress logging.
  5. KnownHash-driven schema-evolution path — hash mismatch forces full replay for the affected query only.
  6. Observabilityquery.rebuild.events_replayed, lag_from_head, duration_seconds per queryName.

Tradeoffs to weigh during review

  • Application.useQueryCheckpoint is opt-in and defaults to InMemory. Should it auto-promote to Postgres whenever useQueryObjectStore is Postgres? ADR currently chooses explicit-over-implicit.
  • KnownHash mismatch silently clears the affected query's checkpoint and re-replays. An operator might prefer an explicit migration gate.

Test plan

  • Maintainer reviews ADR and approves phase 3 (pipeline.py approve 3)
  • Phases 4–8 (security/perf/devex/architecture/test-spec reviews) land their artefacts on this branch
  • Phases 9–11 (tests, implementation, build loop) land green
  • Acceptance tests from Query rebuild blocks HTTP readiness on every restart — no checkpoint, no persistent QueryObjectStore #650 pass: cold start skips replayed events; warm restart after schema change rebuilds only the affected query; /healthz is 200 quickly regardless of event count; per-query endpoint returns 503 + documented header during rebuild

🤖 Generated with Claude Code

Proposes a layered solution for #650: persistent QueryObjectStore.Postgres,
per-query checkpoint table updated transactionally with object writes,
async rebuild with /healthz + /readyz, chunked event reads, KnownHash-
driven per-query replay on schema change, and observability counters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bdadacab-1677-418f-9a1d-850648c6321f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/issue-650-async-query-rebuild

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

NickSeagullBot and others added 3 commits May 27, 2026 13:58
…no z), add perf + concurrency test sections

Maintainer feedback (#656):
- position lives inside query_object_store row (ADR-0006 Snapshot pattern); drop separate checkpoint table
- use /health + /ready (no z suffix) to match ADR-0025 convention; per-query degraded mode via X-Query-Status header
- add §Performance testing with 8 named suites matching EventStore.Postgres rigor
- add §Concurrency & correctness testing with hazards H1-H9 and named test counterparts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…toEncoding)

- §1: data-classification paragraph clarifies state_json inherits the
  data-classification of source events (same JSONB surface as
  eventstore.events under ADR-0004); encryption-at-rest is a Postgres
  deployment concern, not an application-schema one. (resolves sec-04-001)
- §1: persisted query state must implement toEncoding (compile-time
  QueryStateSerializable constraint); toJSON-only rejected. Avoids
  materialising an intermediate Value tree on the per-event hot path.
  (resolves perf-05-004)
- §3: Readiness gains 'Failed Text' constructor; per-query rebuild
  timeout (default 5 min, configurable via RebuildOptions). On timeout
  or updater exception, /ready reports failure distinctly so the
  orchestrator stops flapping; per-query endpoint returns 503 with
  X-Query-Status: failed. Updater exceptions logged at WARN.
  (resolves sec-04-002)
- Endpoint contract block updated for the new failed-state shape.

perf-05-007 (profiling baselines) deferred to phase 13 — declaring
assertion shapes is the ADR's job; concrete numbers require code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…query rebuild

Adds stub implementations and hspec test files for ADR-0059 (async query
rebuild with persistent checkpoints). All 101 new tests compile and fail
against `panic "not implemented"` sentinels, ready for phase 10 implementation.

New stubs:
- Service.QueryObjectStore.Postgres — QueryObjectStoreConfig instance, typed sentinel
- Service.Transport.Web.Readiness — /ready endpoint handler stub
- Service.Query.Subscriber — rebuildFrom, rebuildAllAsync, readinessOf, readinessOfQuery stubs
- Service.Application — useQueryObjectStore, useReadinessEndpoint, withoutReadinessEndpoint stubs

New test modules (all red):
- Service.QueryObjectStore.PostgresSpec
- Service.Query.Subscriber.ReadinessSpec
- Service.Transport.Web.ReadinessSpec
- Service.Application.ReadinessBuilderSpec

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants