Skip to content

spec(phase-8): deployment requirements + design#6

Merged
Nether403 merged 2 commits into
mainfrom
spec/phase-8-deployment
May 12, 2026
Merged

spec(phase-8): deployment requirements + design#6
Nether403 merged 2 commits into
mainfrom
spec/phase-8-deployment

Conversation

@Nether403

@Nether403 Nether403 commented May 12, 2026

Copy link
Copy Markdown
Owner

Summary

Phase 8 deployment spec: the requirements and design phases of a requirements-first Kiro spec, grounded in ADR 003. No code changes — this PR is spec + design only. Tasks phase follows in a separate PR.

Spec files:

  • .kiro/specs/phase-8-deployment/requirements.md
  • .kiro/specs/phase-8-deployment/design.md
  • .kiro/specs/phase-8-deployment/.config.kiro

Requirements (14 total)

Each requirement is EARS-formatted with numbered acceptance criteria. Highlights:

  • R1 Split Railway services — web static + API Node 20, separate redeploys, CLI-driven
  • R2 Neon Postgres — forward-only migrations, two-phase for destructive changes, single-deploy for additions
  • R3 Better Auth + GitHub OAuthSameSite=None; Secure; Domain=.stackfast.app cookies, production OAuth app, GitHub callback
  • R4 Upstash Redis rate limiter — 30/min generation, 100/min reads, fail-open on any Upstash error, Retry-After only on 429
  • R5–R6 Health check + post-deploy smoke — 31-request 429 proof, 101-request read burst, restart-survival check
  • R7 Sentry feature flag — no-op when SENTRY_DSN is falsy, idempotent, scrub idea/constraints, Git SHA release
  • R8 Admin key enforcement — 401 immediately without auth headers, Authorization: Bearer fallback, 401 if ADMIN_API_KEY unset
  • R9 DNS + custom domains — HTTP→HTTPS redirect via Railway edge
  • R10 CORS — locked to https://stackfast.app, never wildcard in prod
  • R11 Auth fails closed in production — 503 whenever auth subsystem isn't ready, regardless of ALLOW_AUTH_BYPASS
  • R12 Rollback — Railway CLI per-service, schema-compatibility block on incompatible single-step rollback, manual-reconciliation allowance for multi-generation rollback
  • R13 Staging isolation — separate Neon branch, OAuth app, secrets, ALLOW_AUTH_BYPASS=false
  • R14 README — production env table, Railway CLI commands, rollback procedure, migration command

Non-goals are explicit: preview envs, blue/green, multi-region, custom CDN, APM, status-page tooling.

Design highlights

  • Additive and reversible. New apps/api/src/rate-limit/ module, new Sentry modules, new Railway manifests. Only two surgical edits to existing code (app.ts rate-limit swap + auth.ts two-line guard for R11).
  • RATE_LIMIT_BACKEND=memory|upstash feature flag so the migration ships, flips, and rolls back independently of Upstash provisioning.
  • Five PBT properties via fast-check for the correctness invariants: fail-open rate limiter, idempotent+no-op Sentry init, CORS never wildcard in prod, admin-key gating, production fail-closed auth. Every property has a generator sketch, oracle, and shrinking targets documented. These tests will run with the property-testing warning flagged by the harness.
  • Cross-origin cookie round trip walked through end-to-end, from "Sign in" click to authenticated XHR, with the exact enforcement points for R3.3, R3.4, R10.2 called out.
  • Failure-mode matrix — every dependency (Neon, Upstash, Azure OpenAI, Gemini, Sentry, GitHub OAuth, Better Auth) has a row with behavior + requirement ID.
  • Runbook scriptsscripts/deploy/migrate.ts, scripts/deploy/smoke.ts, scripts/deploy/rollback.md codify the operator steps.

Open questions

All resolved before merge (see Design § 11):

  1. Staging DNS: staging.stackfast.app + api.staging.stackfast.app (shared cookie domain)
  2. Migration tool: drizzle-kit push for MVP; promote to migrate when the first real migrations folder appears
  3. Retry-After on exempt routes: /health and /openapi.json never reach the rate-limit middleware, so R4.8 is trivially satisfied

Review focus

The design doc is 263 lines; worth reviewing are:

  • Module boundaries (Design § 3) — the rate-limit module's public API. Any concerns with the drop-in signature?
  • Migration plan (Design § 9) — the four staged steps for the rate-limiter swap
  • PBT properties (Design § 8) — are the five invariants the right set, or are there others we want property coverage on?

Next

If this lands, the Tasks phase will translate Design § 2's file layout and § 9's migration plan into a numbered checklist for implementation.


Open in Devin Review

Summary by cubic

Adds Phase 8 deployment requirements and design, translating ADR 003 into a Kiro spec for production rollout of stackfast.app and api.stackfast.app.
No code changes; defines R1–R14 covering split Railway services, Neon migrations, Upstash-backed rate limiting (fail-open), cross-origin Better Auth cookies, CORS lock-down, Sentry behind SENTRY_DSN, health/smoke checks, admin key enforcement, rollback, and staging isolation.

Written for commit 88fce53. Summary will update on new commits.

Nether403 added 2 commits May 12, 2026 20:00
Translates ADR 003 into 14 EARS-format requirements that Phase 8
tasks can execute against. No architecture decisions are reopened
here — the spec references ADR 003, ADR 001, and ADR 002 as the
decision record.

Coverage
- R1 split Railway services (web static + API Node 20)
- R2 Neon Postgres production branch, forward-only migrations,
  two-phase destructive column changes, single-deploy
  non-breaking additions
- R3 Better Auth + GitHub OAuth with SameSite=None; Secure;
  Domain=.stackfast.app cookies
- R4 Upstash Redis rate limiter (30/min generation, 100/min
  reads, fail-open on any Upstash error, Retry-After only on 429)
- R5 production health check
- R6 post-deploy rate-limit smoke test
- R7 Sentry behind SENTRY_DSN feature flag (no-op when absent,
  idempotent init, scrub idea/constraints, Git SHA release)
- R8 admin API key enforcement on /admin/* and /internal/*,
  401 immediately when no auth headers present, Authorization:
  Bearer fallback supported
- R9 DNS + custom domains with HTTP→HTTPS redirect
- R10 CORS locked to https://stackfast.app, never wildcard in prod
- R11 auth fails closed in production regardless of
  ALLOW_AUTH_BYPASS; non-prod honors bypass only when not false
- R12 Railway CLI rollback per service; block automatic rollback
  when schema would be incompatible; multi-generation rollbacks
  allowed with manual reconciliation
- R13 staging environment isolation (separate Neon branch, OAuth
  app, secrets, ALLOW_AUTH_BYPASS=false)
- R14 README deployment documentation

Non-goals section explicitly rules out preview envs, blue/green,
multi-region, custom CDN, APM, and status-page tooling.

Correctness properties surfaced for PBT in the design phase
- Fail-open rate limiter (R4.5)
- Idempotent Sentry init + no-op on falsy DSN (R7.3, R7.4)
- CORS never wildcard in prod (R10.3, R10.4)
- Admin paths 401 without valid key (R8.1, R8.3, R8.5)
- Fail-closed auth in prod (R11.2, R11.3, R11.4)

Ran analyze_requirements and folded in 13 clarifications across
requirements 2, 4, 8, 11, and 12. Design and tasks phases follow
as separate PRs.
Maps the 14 approved Phase 8 requirements onto concrete
implementation modules, file edits, Railway manifests, and a
testing strategy spanning unit, contract, property-based, and
post-deploy smoke layers.

Shape
- Additive and reversible: new rate-limit module, new Sentry
  modules, new Railway manifests, two small auth/app.ts edits.
- RATE_LIMIT_BACKEND=memory|upstash feature flag lets the
  rate-limiter migration ship, flip, and roll back independently
  of Upstash provisioning.
- Sentry is idempotent and a no-op when SENTRY_DSN is falsy; the
  same DSN-gated branch is used on both API and web.
- Auth fails closed in production regardless of ALLOW_AUTH_BYPASS,
  implemented as a two-line guard in apps/api/src/middleware/auth.ts.
- No shared package (registry, rules-engine, exporter, ai,
  schemas, shared) is modified.

PBT properties
- Upstash failures never produce a 429 (fail-open invariant).
- Sentry init is idempotent and a no-op without a DSN.
- CORS never echoes Access-Control-Allow-Origin: * with
  credentials in production.
- Admin key gating precedes every other middleware and returns
  401 on any missing/mismatched key variant.
- Production auth fails closed whenever the auth subsystem is
  not ready, regardless of ALLOW_AUTH_BYPASS.

Open questions resolved before committing
- Staging DNS: staging.stackfast.app + api.staging.stackfast.app
  (shared cookie domain, no second branch in Better Auth config).
- Migration command: drizzle-kit push for MVP; promote to migrate
  once a real migrations folder appears.
- Retry-After on exempt routes: /health and /openapi.json never
  reach the rate-limit middleware, so the constraint is trivially
  satisfied.

Tasks phase follows in a separate PR.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

@Nether403 Nether403 merged commit 3f41232 into main May 12, 2026
2 checks passed
@Nether403 Nether403 deleted the spec/phase-8-deployment branch May 12, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant