Rework read-only mode: use Postgres native DefaultXactReadOnly#376
Rework read-only mode: use Postgres native DefaultXactReadOnly#376
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRenamed exported enum member Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/spock.c (1)
116-120: Consider adding alocalGUC literal and keepinguseras a hidden compatibility alias.The enum was renamed to
READONLY_LOCAL, but the exposed value is still"user". Accepting"local"improves clarity and keeps backward compatibility.♻️ Suggested adjustment
static const struct config_enum_entry readonly_options[] = { {"off", READONLY_OFF, false}, - {"user", READONLY_LOCAL, false}, + {"local", READONLY_LOCAL, false}, + {"user", READONLY_LOCAL, true}, {"all", READONLY_ALL, false}, {NULL, 0, false} };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock.c` around lines 116 - 120, The readonly_options enum currently exposes "user" while the internal enum is READONLY_LOCAL; update the GUC literals to add "local" as the primary visible option and keep "user" as a hidden/compatibility alias in the readonly_options array that is returned by the config system; specifically modify the readonly_options initializer (refer to readonly_options and READONLY_LOCAL) to include an entry for "local" mapping to READONLY_LOCAL and mark the existing "user" entry as hidden/alias so both parse to READONLY_LOCAL while preserving backward compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock_apply.c`:
- Around line 2887-2899: In the READONLY_ALL branch, replacing the inner
"continue" (when xact_had_exception is true) with a break ensures the worker
exits the loop to run the outer post-loop cleanup/exception/subscription
handling; update the code in spock_apply.c so that after
maybe_send_feedback(...) you do "if (xact_had_exception) break;" (or set a local
flag and break) instead of "continue", preserving any per-iteration cleanup and
allowing the outer post-loop handling to execute.
In `@src/spock_readonly.c`:
- Around line 65-68: spock_ropost_parse_analyze() and spock_roExecutorStart()
currently unconditionally reset DefaultXactReadOnly to false; instead, before
you change DefaultXactReadOnly save its current value into a local variable
(e.g., prev_DefaultXactReadOnly) and record whether Spock actually enforced
readonly (e.g., did_force_readonly = (spock_readonly >= READONLY_LOCAL &&
!superuser())); only set DefaultXactReadOnly when did_force_readonly is true,
and on exit restore DefaultXactReadOnly from prev_DefaultXactReadOnly only if
did_force_readonly was true; apply this save/restore pattern in both
spock_ropost_parse_analyze() and spock_roExecutorStart(), referencing the
DefaultXactReadOnly, spock_readonly, READONLY_LOCAL, and superuser() symbols to
locate the places to change.
---
Nitpick comments:
In `@src/spock.c`:
- Around line 116-120: The readonly_options enum currently exposes "user" while
the internal enum is READONLY_LOCAL; update the GUC literals to add "local" as
the primary visible option and keep "user" as a hidden/compatibility alias in
the readonly_options array that is returned by the config system; specifically
modify the readonly_options initializer (refer to readonly_options and
READONLY_LOCAL) to include an entry for "local" mapping to READONLY_LOCAL and
mark the existing "user" entry as hidden/alias so both parse to READONLY_LOCAL
while preserving backward compatibility.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e0080300-bc1e-4d74-ade7-87ba2b6fbdb5
📒 Files selected for processing (4)
include/spock_readonly.hsrc/spock.csrc/spock_apply.csrc/spock_readonly.c
5f69140 to
e60b508
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/spock_apply.c (1)
2901-2910:⚠️ Potential issue | 🟠 MajorREADONLY_ALL exception path can still spin and misses postmaster-death handling.
In this branch,
WaitLatchOrSocket(... WL_SOCKET_READABLE ...)+continue(Line 2903 onward) can return immediately on a perpetually readable socket, causing tight-loop churn; it also skips checkingWL_POSTMASTER_DEATHin this path. Prefer a latch/timeout wait here and preserve the emergency bailout.💡 Suggested patch
if (xact_had_exception) { - rc = WaitLatchOrSocket(&MyProc->procLatch, - WL_SOCKET_READABLE | WL_LATCH_SET | - WL_TIMEOUT | WL_POSTMASTER_DEATH, - fd, 1000L); - - ResetLatch(&MyProc->procLatch); + rc = WaitLatch(&MyProc->procLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, + 1000L, + WAIT_EVENT_LOGICAL_APPLY_MAIN); + if (rc & WL_POSTMASTER_DEATH) + proc_exit(1); + ResetLatch(&MyProc->procLatch); continue; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 2901 - 2910, In the xact_had_exception branch (around the WaitLatchOrSocket call), don't call WaitLatchOrSocket with WL_SOCKET_READABLE and then continue because a perpetually readable socket can cause a tight spin and skip postmaster-death handling; replace that call with WaitLatch(&MyProc->procLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, timeout) (i.e., use WaitLatch instead of WaitLatchOrSocket), keep ResetLatch(&MyProc->procLatch) after the wait, and explicitly handle the WL_POSTMASTER_DEATH case (or call proc_exit/emergency bailout) so the emergency bailout path is preserved instead of looping continuously. Ensure you reference the xact_had_exception branch, MyProc->procLatch, WaitLatchOrSocket (old) and WaitLatch (new), ResetLatch, and WL_POSTMASTER_DEATH in your changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock.c`:
- Around line 116-119: The readonly_options array currently only maps the old
name "user" to READONLY_LOCAL, so setting spock.readonly = 'local' is rejected;
update the readonly_options entries to include "local" as the canonical string
mapping to READONLY_LOCAL and keep "user" as a backward-compatible alias (both
mapping to READONLY_LOCAL) so the parser accepts either name; modify the static
const struct config_enum_entry readonly_options[] (and any related
parsing/validation that references READONLY_LOCAL) accordingly.
---
Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2901-2910: In the xact_had_exception branch (around the
WaitLatchOrSocket call), don't call WaitLatchOrSocket with WL_SOCKET_READABLE
and then continue because a perpetually readable socket can cause a tight spin
and skip postmaster-death handling; replace that call with
WaitLatch(&MyProc->procLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
timeout) (i.e., use WaitLatch instead of WaitLatchOrSocket), keep
ResetLatch(&MyProc->procLatch) after the wait, and explicitly handle the
WL_POSTMASTER_DEATH case (or call proc_exit/emergency bailout) so the emergency
bailout path is preserved instead of looping continuously. Ensure you reference
the xact_had_exception branch, MyProc->procLatch, WaitLatchOrSocket (old) and
WaitLatch (new), ResetLatch, and WL_POSTMASTER_DEATH in your changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5595b87b-7ef4-4477-90de-6e1d4fe06adb
📒 Files selected for processing (4)
include/spock_readonly.hsrc/spock.csrc/spock_apply.csrc/spock_readonly.c
🚧 Files skipped from review as they are similar to previous changes (1)
- src/spock_readonly.c
e60b508 to
fdb7dd1
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/spock_apply.c (1)
2887-2912:⚠️ Potential issue | 🔴 CriticalREADONLY_ALL exception path can trap the worker in the inner loop and skip finalization flow.
With
xact_had_exception == true,continueat Line 2909 keeps control inside the inner loop, so the post-loop exception/subscription handling block is never reached. This can stall apply progress indefinitely while in READONLY_ALL.🐛 Minimal fix
- if (spock_readonly == READONLY_ALL) + if (spock_readonly == READONLY_ALL) { /* * Send feedback to keep walsender alive - we may avoid it * with introduction of TCP keepalive approach. */ maybe_send_feedback(applyconn, last_received, &last_receive_timestamp); - - /* - * In case of an exception we can't break out of the loop - * because exception processing code may also modify the - * database. Wait briefly and continue to the next iteration. - */ - if (xact_had_exception) - { - rc = WaitLatchOrSocket(&MyProc->procLatch, - WL_SOCKET_READABLE | WL_LATCH_SET | - WL_TIMEOUT | WL_POSTMASTER_DEATH, - fd, 1000L); - - ResetLatch(&MyProc->procLatch); - continue; - } break; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_apply.c` around lines 2887 - 2912, The READONLY_ALL branch currently does a continue when xact_had_exception is true, which keeps execution inside the inner loop and skips the post-loop exception/subscription finalization; change the control flow so that after maybe_send_feedback and the WaitLatchOrSocket call the code breaks out of the inner loop (not continue), ensuring the subsequent post-loop exception/subscription handling runs (update the block around spock_readonly == READONLY_ALL, referencing xact_had_exception, maybe_send_feedback and WaitLatchOrSocket to use break instead of continue).
🧹 Nitpick comments (1)
src/spock_readonly.c (1)
68-89: Extract duplicated read-only sync logic into one helper.
spock_ropost_parse_analyze()andspock_roExecutorStart()now carry identical state-sync logic. A shared static helper would reduce drift risk between hooks.♻️ Suggested refactor
+static inline void +spock_sync_xact_readonly(void) +{ + if (spock_readonly >= READONLY_LOCAL && !superuser()) + XactReadOnly = true; + else if (XactReadOnly) + { + const char *value = GetConfigOption("transaction_read_only", false, false); + if (strcmp(value, "off") == 0) + XactReadOnly = false; + } +} + void spock_ropost_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate) { - ... + spock_sync_xact_readonly(); } void spock_roExecutorStart(QueryDesc *queryDesc, int eflags) { - ... + spock_sync_xact_readonly(); }Also applies to: 95-116
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/spock_readonly.c` around lines 68 - 89, Both spock_ropost_parse_analyze and spock_roExecutorStart contain duplicated logic that syncs spock_readonly with the Postgres XactReadOnly state; extract that logic into a single static helper (e.g., sync_spock_readonly_state) and have both functions call it. The helper should encapsulate the current conditional flow that checks spock_readonly >= READONLY_LOCAL, superuser(), sets XactReadOnly, and restores it by reading transaction_read_only via GetConfigOption and strcmp(value, "off") to detect and revert Spock-imposed changes. Replace the duplicated blocks in spock_ropost_parse_analyze and spock_roExecutorStart with a call to the new static function to avoid drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2887-2912: The READONLY_ALL branch currently does a continue when
xact_had_exception is true, which keeps execution inside the inner loop and
skips the post-loop exception/subscription finalization; change the control flow
so that after maybe_send_feedback and the WaitLatchOrSocket call the code breaks
out of the inner loop (not continue), ensuring the subsequent post-loop
exception/subscription handling runs (update the block around spock_readonly ==
READONLY_ALL, referencing xact_had_exception, maybe_send_feedback and
WaitLatchOrSocket to use break instead of continue).
---
Nitpick comments:
In `@src/spock_readonly.c`:
- Around line 68-89: Both spock_ropost_parse_analyze and spock_roExecutorStart
contain duplicated logic that syncs spock_readonly with the Postgres
XactReadOnly state; extract that logic into a single static helper (e.g.,
sync_spock_readonly_state) and have both functions call it. The helper should
encapsulate the current conditional flow that checks spock_readonly >=
READONLY_LOCAL, superuser(), sets XactReadOnly, and restores it by reading
transaction_read_only via GetConfigOption and strcmp(value, "off") to detect and
revert Spock-imposed changes. Replace the duplicated blocks in
spock_ropost_parse_analyze and spock_roExecutorStart with a call to the new
static function to avoid drift.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b1824a4f-54be-4418-93b9-88195988dc05
📒 Files selected for processing (4)
include/spock_readonly.hsrc/spock.csrc/spock_apply.csrc/spock_readonly.c
🚧 Files skipped from review as they are similar to previous changes (2)
- src/spock.c
- include/spock_readonly.h
I think it is not good or necessary change. So, it seems we can keep it as is. |
| * execution (remember SPI plan for example). So, additional control makes | ||
| * sense here. | ||
| */ | ||
| if (spock_readonly >= READONLY_LOCAL && !superuser()) |
There was a problem hiding this comment.
When spock.readonly = 'all', a superuser can still execute DDL/DML, Which shouldn't be the case right?
There was a problem hiding this comment.
No, the refactoring precisely clarifies the case: LOCAL means only backends are forbidden to write; ALL means LR processes too. Superusers can write at any time they want. Disallowing that is too dangerous: in case of a bug, the only way to change mode might be SIGKILL. So, this refactoring attempts to make this feature safer.
Instead of manually filtering command types in post_parse_analyze and ExecutorStart hooks, set XactReadOnly directly for non-superuser sessions. This leverages Postgres's built-in read-only transaction enforcement instead of reimplementing it. To distinguish Spock-imposed read-only from user/GUC-imposed read-only, compare XactReadOnly against the transaction_read_only GUC value. Rename READONLY_USER to READONLY_LOCAL to better reflect that the mode restricts local (non-replication) connections. In the apply worker, move CHECK_FOR_INTERRUPTS and config reload to the top of the replay loop so read-only mode changes are detected before applying the next transaction. When READONLY_ALL is active, the worker sends keepalive feedback but skips applying new transactions. Escalate the READONLY_ALL violation in replication_handler from ERROR to PANIC.
fdb7dd1 to
4eba633
Compare
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 2 |
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock_apply.c`:
- Around line 2909-2933: In the xact_had_exception branch where spock_readonly
== READONLY_ALL you must check the WaitLatchOrSocket return (rc) from
WaitLatchOrSocket(&MyProc->procLatch, ...) for WL_POSTMASTER_DEATH and
WL_SOCKET_READABLE before issuing continue so postmaster death and socket events
are handled; update the branch in spock_apply.c to inspect rc and handle
WL_POSTMASTER_DEATH (exit/cleanup) and WL_SOCKET_READABLE (fall through so outer
loop will call PQconsumeInput()/PQgetCopyData) or alternatively move the wait
out of the readonly-inner-branch so PQconsumeInput and PQgetCopyData aren’t
starved — adjust logic around WaitLatchOrSocket, rc, WL_POSTMASTER_DEATH,
WL_SOCKET_READABLE, MyProc->procLatch, PQconsumeInput and PQgetCopyData
accordingly to avoid dropping signals and buffering COPY data.
- Around line 2436-2438: The elog call in spock_apply.c uses PANIC when
spock_readonly == READONLY_ALL which is wrong; change the error level to FATAL
so only the apply worker exits instead of triggering a postmaster restart. Edit
the branch that checks spock_readonly (the condition referencing READONLY_ALL
and MySubscription->name) and replace the PANIC error level passed to elog with
FATAL, leaving the message and MySubscription->name intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 74f765b3-51d4-49c1-9ab8-1653a249d2df
📒 Files selected for processing (4)
include/spock_readonly.hsrc/spock.csrc/spock_apply.csrc/spock_readonly.c
🚧 Files skipped from review as they are similar to previous changes (2)
- include/spock_readonly.h
- src/spock_readonly.c
| if (spock_readonly == READONLY_ALL) | ||
| elog(ERROR, "SPOCK %s: cluster is in read-only mode, not performing replication", | ||
| elog(PANIC, "SPOCK %s: cluster is in read-only mode, not performing replication", | ||
| MySubscription->name); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Let me check the current directory structure
ls -la
git status 2>/dev/null || echo "Not a git repo"Repository: pgEdge/spock
Length of output: 1569
🏁 Script executed:
# Try to locate the file without assuming path
find . -type f -name "spock_apply.c" 2>/dev/nullRepository: pgEdge/spock
Length of output: 75
🏁 Script executed:
# If not found, check git files
git ls-files 2>/dev/null | grep spock_applyRepository: pgEdge/spock
Length of output: 145
🏁 Script executed:
# First, verify the exact content at lines 2436-2438
sed -n '2436,2438p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 224
🏁 Script executed:
# Check the context around that code block
sed -n '2430,2450p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 788
🏁 Script executed:
# Search for other READONLY_ALL usage to understand its nature
rg "READONLY_ALL" src/spock_apply.c -B2 -A2Repository: pgEdge/spock
Length of output: 329
🏁 Script executed:
# Check if there are any comments or definitions related to READONLY_ALL
rg "READONLY_ALL|read.only" src/ --type c | head -30Repository: pgEdge/spock
Length of output: 1254
🏁 Script executed:
# Check for elog PANIC usage in this file to understand current patterns
rg "elog\(PANIC" src/spock_apply.c -B2 -A2Repository: pgEdge/spock
Length of output: 205
🏁 Script executed:
# Look for elog FATAL usage in this file for comparison
rg "elog\(FATAL" src/spock_apply.c -B2 -A2Repository: pgEdge/spock
Length of output: 171
🏁 Script executed:
# Check if there's a git history of this change
git log --oneline -n 20 -- src/spock_apply.c 2>/dev/null || echo "Git history not available"Repository: pgEdge/spock
Length of output: 86
🏁 Script executed:
# Search for READONLY state definitions
rg "enum.*READONLY|#define.*READONLY" --type c -B1 -A3Repository: pgEdge/spock
Length of output: 38
🏁 Script executed:
# Check the second READONLY_ALL guard around line 2909
sed -n '2900,2940p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 1412
🏁 Script executed:
# Check for other PANIC usage in the broader codebase to understand patterns
rg "elog\(PANIC" src/ --type c -B1 -A1 | head -50Repository: pgEdge/spock
Length of output: 386
🏁 Script executed:
# Search for ERROR level usage in similar contexts
rg "READONLY" src/ --type c -A3 -B1 | grep -E "elog|errmsg" -A1 -B1Repository: pgEdge/spock
Length of output: 257
🏁 Script executed:
# Check PostgreSQL headers or comments for error level semantics
rg "PANIC|FATAL|ERROR" include/ --type c -B1 -A1 | head -40Repository: pgEdge/spock
Length of output: 415
🏁 Script executed:
# Look for any comments in the code explaining error level choice
rg "PANIC|error level|ErrorLevel" src/spock_apply.c -B2 -A2Repository: pgEdge/spock
Length of output: 205
🏁 Script executed:
# Check git history of the problematic line
git log -p --follow -S "READONLY_ALL" -- src/spock_apply.c | head -100Repository: pgEdge/spock
Length of output: 38
🏁 Script executed:
# Check the context of the second READONLY_ALL handling - lines 2900-2940
sed -n '2890,2950p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 1949
🏁 Script executed:
# Look for how other apply workers or background workers handle operator-settable conditions
rg "ConfigReloadPending|ProcessConfigFile" src/ --type c -B2 -A2 | head -50Repository: pgEdge/spock
Length of output: 1679
🏁 Script executed:
# Check if there are comments explaining the PANIC choice or comparing error levels
git log --all --grep="READONLY\|PANIC\|read-only" --oneline 2>/dev/null | head -20Repository: pgEdge/spock
Length of output: 466
🏁 Script executed:
# Search for any discussion of PANIC vs FATAL in comments
rg "PANIC|FATAL" src/spock_apply.c -B3 -A3Repository: pgEdge/spock
Length of output: 527
🏁 Script executed:
# Check what happens after a FATAL in background worker context
rg "FATAL" src/ --type c -B2 -A2 | head -60Repository: pgEdge/spock
Length of output: 1838
🏁 Script executed:
# Look for the function signature and context of replication_handler
sed -n '2420,2445p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 863
🏁 Script executed:
# Check if there's a difference in how the two READONLY_ALL guards are logically placed
# The first is at function entry, the second is in the main loop
sed -n '2430,2440p' src/spock_apply.c | cat -n
sed -n '2900,2920p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 1201
🏁 Script executed:
# Check the most relevant commit for context
git show a14acf5 --stat 2>/dev/null | head -30Repository: pgEdge/spock
Length of output: 651
🏁 Script executed:
# Check what changed in that commit for the PANIC line
git show a14acf5 -- src/spock_apply.c 2>/dev/null | grep -A5 -B5 "PANIC\|READONLY_ALL" | head -50Repository: pgEdge/spock
Length of output: 38
🏁 Script executed:
# Get more detail on the commit that changed READONLY_ALL handling
git show a14acf5 src/spock_apply.c 2>/dev/null | head -100Repository: pgEdge/spock
Length of output: 38
🏁 Script executed:
# Check if there's a comment or explanation in the commit message
git show a14acf5 2>/dev/null | head -50Repository: pgEdge/spock
Length of output: 1394
🏁 Script executed:
# Look at the full context of line 2437 - what function is it in and what's the exit strategy?
sed -n '2420,2480p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 1656
🏁 Script executed:
# Confirm whether this is really in replication_handler which is a message processor
grep -n "replication_handler" src/spock_apply.c | head -5Repository: pgEdge/spock
Length of output: 130
🏁 Script executed:
# Final verification: check if replication_handler is called in error-recovery context or normal message processing
sed -n '3050,3065p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 546
Change PANIC to FATAL.
READONLY_ALL is an operator-settable GUC state, not a cluster corruption invariant. This is evidenced by the graceful handling already present in the main replay loop (line ~2909), which allows the worker to continue polling while read-only. Using PANIC causes an unnecessary postmaster restart; FATAL terminates just the apply worker, consistent with PostgreSQL error semantics and the existing guard pattern.
Minimal patch
- elog(PANIC, "SPOCK %s: cluster is in read-only mode, not performing replication",
+ elog(FATAL, "SPOCK %s: cluster is in read-only mode, not performing replication",
MySubscription->name);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/spock_apply.c` around lines 2436 - 2438, The elog call in spock_apply.c
uses PANIC when spock_readonly == READONLY_ALL which is wrong; change the error
level to FATAL so only the apply worker exits instead of triggering a postmaster
restart. Edit the branch that checks spock_readonly (the condition referencing
READONLY_ALL and MySubscription->name) and replace the PANIC error level passed
to elog with FATAL, leaving the message and MySubscription->name intact.
✅ Confirmed as addressed by @mason-sharp
There was a problem hiding this comment.
Makes sense, there is no corruption of data. FATAL will just restart the worker.
There was a problem hiding this comment.
@mason-sharp, glad that aligns! Here's the fix:
- Change
elog(PANIC, ...)toelog(FATAL, ...)inreplication_handler()insrc/spock_apply.c
🧠 Learnings used
Learnt from: danolivo
Repo: pgEdge/spock PR: 400
File: src/spock_apply.c:1292-1314
Timestamp: 2026-03-31T11:37:20.925Z
Learning: In pgEdge/spock (src/spock_apply.c), when TRANSDISCARD or SUB_DISABLE exception behaviour is active and a queued DDL/SQL message is processed, two entries are intentionally written to spock.exception_log: one from the queue-table INSERT path (handle_insert) and one from handle_sql_or_exception. This is by design to capture as much information as possible about the failed operation chain. Do not flag this double-logging as a bug.
Learnt from: mason-sharp
Repo: pgEdge/spock PR: 378
File: src/spock.c:331-335
Timestamp: 2026-03-05T19:22:53.103Z
Learning: In `src/spock.c` (pgEdge/spock), the `suffix` parameter of `spock_connect_base()` (and its public wrappers `spock_connect()` / `spock_connect_replica()`) is always called with short hardcoded string literals ("copy", "create", "sync", "copy_slot"). There is no dynamic or user-supplied suffix. Therefore, flagging the arithmetic `NAMEDATALEN - strlen(suffix) - 2` as an underflow risk is a theoretical concern and the maintainers consider it unnecessary to add a runtime guard.
Learnt from: danolivo
Repo: pgEdge/spock PR: 402
File: tests/regress/sql/spill_transaction.sql:115-117
Timestamp: 2026-03-31T10:03:18.941Z
Learning: In `tests/regress/sql/spill_transaction.sql` (pgEdge/spock PR `#402`), `spock.resolutions` is expected to remain empty throughout the spill_transaction regression test by design. None of the three `exception_behaviour` modes tested (`transdiscard`, `discard`, `sub_disable`) write entries to `spock.resolutions` in this test's conflict scenarios. Do not flag cross-scenario pollution of `spock.resolutions` as a concern for this test.
Learnt from: mason-sharp
Repo: pgEdge/spock PR: 372
File: docs/configuring.md:207-211
Timestamp: 2026-03-05T23:04:27.732Z
Learning: In the spock codebase, replication origin value `0` means the row was changed locally (not via replication). This is correct behavior and should be documented as such. The `NULL` value in `spock.resolutions.local_origin` and "unknown" in PostgreSQL logs are a separate distinct case where the local origin is genuinely unavailable (e.g., for pre-existing data after a pg_restore). Do not conflate origin=0 (local) with NULL/unknown (unavailable origin).
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
Failed to handle agent chat message. Please try again.
| if (spock_readonly == READONLY_ALL) | ||
| { | ||
| /* | ||
| * Send feedback to keep walsender alive - we may avoid it | ||
| * with introduction of TCP keepalive approach. | ||
| */ | ||
| maybe_send_feedback(applyconn, last_received, | ||
| &last_receive_timestamp); | ||
|
|
||
| /* | ||
| * In case of an exception we can't break out of the loop | ||
| * because exception processing code may also modify the | ||
| * database. Wait briefly and continue to the next iteration. | ||
| */ | ||
| if (xact_had_exception) | ||
| { | ||
| rc = WaitLatchOrSocket(&MyProc->procLatch, | ||
| WL_SOCKET_READABLE | WL_LATCH_SET | | ||
| WL_TIMEOUT | WL_POSTMASTER_DEATH, | ||
| fd, 1000L); | ||
|
|
||
| ResetLatch(&MyProc->procLatch); | ||
| continue; | ||
| } | ||
| break; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the exact file and context around the specified lines
wc -l src/spock_apply.cRepository: pgEdge/spock
Length of output: 78
🏁 Script executed:
# Read the context around lines 2909-2933, including surrounding code
sed -n '2850,2980p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 4267
🏁 Script executed:
# Search for the outer loop structure and PQconsumeInput() usage
rg -n 'PQconsumeInput|PQgetCopyData' src/spock_apply.cRepository: pgEdge/spock
Length of output: 140
🏁 Script executed:
# Look for WaitLatchOrSocket and its usage patterns in this file
rg -n -A 5 'WaitLatchOrSocket' src/spock_apply.c | head -80Repository: pgEdge/spock
Length of output: 518
🏁 Script executed:
# Get a broader view of the function containing this code - search for the function definition
rg -n 'apply_work\|process.*connection' src/spock_apply.c | head -20Repository: pgEdge/spock
Length of output: 38
🏁 Script executed:
# Get context for the outer loop structure - lines before 2850
sed -n '2800,2860p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 2009
🏁 Script executed:
# Get the end of inner loop and what happens after - lines after 2930
sed -n '2930,3000p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 2156
🏁 Script executed:
# Search for where this code is - what's the function name?
rg -B 100 'if \(spock_readonly == READONLY_ALL\)' src/spock_apply.c | grep -E '^[0-9]+-.*\(' | tail -5Repository: pgEdge/spock
Length of output: 38
🏁 Script executed:
# Let's see the complete nested loop structure more clearly - search around line 2908-2960
sed -n '2900,2960p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 2046
🏁 Script executed:
# Check what happens at the end of the inner for(;;) loop to understand full flow
sed -n '2960,3050p' src/spock_apply.c | cat -nRepository: pgEdge/spock
Length of output: 3160
🏁 Script executed:
# Verify: when we break from inner loop at line 2932, do we go back to outer while?
# Also check if there's any other handling of WL_POSTMASTER_DEATH in the inner loop
rg -n 'WL_POSTMASTER_DEATH' src/spock_apply.cRepository: pgEdge/spock
Length of output: 193
Check WaitLatchOrSocket result in the xact_had_exception branch before continuing.
The inner WaitLatchOrSocket() call at lines 2925-2928 returns status in rc but this result is discarded before continue at line 2932. This means WL_POSTMASTER_DEATH signals are missed—the outer loop checks this at line 2847, but the inner branch doesn't. Additionally, when READONLY_ALL is true, PQgetCopyData() is suppressed entirely (unreached at line 2941), yet the outer loop still calls PQconsumeInput() at line 2853, allowing libpq to buffer incoming COPY data indefinitely during extended readonly windows.
Either inspect the rc result for WL_POSTMASTER_DEATH and WL_SOCKET_READABLE in this branch, or move the wait to before the inner loop enters to avoid signal loss and buffering during readonly mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/spock_apply.c` around lines 2909 - 2933, In the xact_had_exception branch
where spock_readonly == READONLY_ALL you must check the WaitLatchOrSocket return
(rc) from WaitLatchOrSocket(&MyProc->procLatch, ...) for WL_POSTMASTER_DEATH and
WL_SOCKET_READABLE before issuing continue so postmaster death and socket events
are handled; update the branch in spock_apply.c to inspect rc and handle
WL_POSTMASTER_DEATH (exit/cleanup) and WL_SOCKET_READABLE (fall through so outer
loop will call PQconsumeInput()/PQgetCopyData) or alternatively move the wait
out of the readonly-inner-branch so PQconsumeInput and PQgetCopyData aren’t
starved — adjust logic around WaitLatchOrSocket, rc, WL_POSTMASTER_DEATH,
WL_SOCKET_READABLE, MyProc->procLatch, PQconsumeInput and PQgetCopyData
accordingly to avoid dropping signals and buffering COPY data.
|
read-only-doc.patch |
post_parse_analyzeandExecutorStarthooks with settingXactReadOnlydirectly for non-superusersessions, leveraging Postgres's built-in read-only transaction enforcement
comparing
XactReadOnlyagainst thetransaction_read_onlyGUC valueREADONLY_USERtoREADONLY_LOCALto better reflect that the moderestricts local (non-replication) connections
CHECK_FOR_INTERRUPTSand config reload to thetop of the replay loop so read-only mode changes are detected before applying
the next transaction
READONLY_ALLis active, the apply worker sends keepalive feedback butskips applying new transactions
READONLY_ALLviolation inreplication_handlerfromERRORto
PANIC