docs: tighten OPS.md sections 2/3 against first M0-5/M1-6 deploy#75
Open
Augustas11 wants to merge 1 commit into
Open
docs: tighten OPS.md sections 2/3 against first M0-5/M1-6 deploy#75Augustas11 wants to merge 1 commit into
Augustas11 wants to merge 1 commit into
Conversation
…observations
Resolves the two "TBD after first M0-5/M1-6 deploy" callouts in OPS.md
against what was actually observed during the v1.3.0-24-g87b3a6b -> v1.3.1-5-gba04cd4
deploy on 2026-06-11.
- Section 2 (coordinator restart): the post-restart /healthz check is a single
GET after a 2s sleep, not a poll loop. Total window from restart command
to provenance assert is ~5s; /healthz responded immediately.
- Section 3 (gateway restart): confirmed the single-file .prev layout for both
services at /opt/macprovider/{coordinator,gateway}.prev (owned
macprovider:macprovider, mode 0755). Also documented the coordinator deploy
script's timestamped /opt/macprovider/coordinator.yaml.bak-<UTC> backups
(gateway script does not touch gateway.yaml so no equivalent there).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
Augustas11
added a commit
that referenced
this pull request
Jun 12, 2026
…#76) Two drift items the 2026-06-11 deploy hit on Pearl and patched live, left unfixed in the repo (would re-bite the next deploy): 1. phase4-coordinator/dist/nginx-coordinator.streamvc.live.conf re-declared `ws_provider_rate` and `ws_provider_conn` zones that the api.streamvc.live vhost already declares. Two vhosts on the same nginx instance cannot redeclare the same http-context zone — `nginx -t` fails with "limit_conn_zone is already bound." Removed the dup declarations; left a comment explaining the cross-vhost sharing and the restore step if the coordinator vhost is ever deployed standalone. 2. phase5-gateway/dist/deploy-pearl-vps.sh was missing the ssl_certificate sed-uncomment block that the coordinator script has. nginx-api.streamvc.live.conf ships with those lines commented for first-deploy ACME ordering; without the sed, post-cert deploys fail `nginx -t` with "no ssl_certificate is defined for the listen ... ssl" directive. Added the same idempotent sed pair the coordinator script uses at its step 6b. Both surfaced in PR #75's "companion findings" block. The deploy session worked around #1 by editing nginx config in place on Pearl and #2 by switching to a binary-only swap (skipping the script's nginx step. This commit closes the drift in source. EOF ) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
**TBD after first M0-5/M1-6 deploy**callouts in OPS.md §2 and §3 against observations from the first end-to-end M0-5/M1-6 production deploy (2026-06-11, v1.3.0-24-g87b3a6b → v1.3.1-5-gba04cd4 on Pearl)..prevlayout at/opt/macprovider/{coordinator,gateway}.prev(ownedmacprovider:macprovider, mode0755), plus the coordinator-side timestampedcoordinator.yaml.bak-<UTC>accumulating backups.Companion findings worth a follow-up (not in this PR)
During the deploy, two on-disk surprises caused the coordinator deploy script to fail at step 6b and would have caused the gateway deploy script to fail at step 4 (mitigated by switching to a binary-only swap for the gateway). These are not OPS.md content but should be tracked separately:
phase4-coordinator/dist/nginx-coordinator.streamvc.live.confstill declareslimit_req_zone ws_provider_rateandlimit_conn_zone ws_provider_conn— both already declared by theapi.streamvc.livesite on Pearl. Pearl's live coordinator site had been dedup'd earlier on 2026-06-11 (.bak-pre-dedup-20260611T135903Zartifact survives) but the local file was never updated. Step 6b overwrote the dedup'd live with the un-dedup'd local; nginx -t failed with "limit_conn_zone is already bound." Fixed in-place on Pearl; the local file still drifts.phase5-gateway/dist/deploy-pearl-vps.sh) lacks thesed-uncomment step the coordinator script has forssl_certificatelines. The localnginx-api.streamvc.live.confships those lines commented; if the gateway script's step 4 ran end-to-end, it would install the commented config andnginx -twould fail with "no ssl_certificate defined for SSL listener." Avoided here by skipping the script's nginx step. Either the gateway script needs the samesedstep, or the local config needs to ship uncommented.air5) that connected during the deploy gap under the old binary cannot reconnect under v1.3.1-5 withauth.require_provider_tokens=false. Coordinator log:tokenless connect refused; an active token already exists for this provider_id. Operator will revoke the stored token or run a TOFU bypass; flagged for the decision log.Test plan
coordinator.yaml.bak-<UTC>accumulates as documented🤖 Generated with Claude Code