Skip to content

Rework read-only mode: use Postgres native DefaultXactReadOnly#376

Open
danolivo wants to merge 2 commits intomainfrom
readonly-refactoring
Open

Rework read-only mode: use Postgres native DefaultXactReadOnly#376
danolivo wants to merge 2 commits intomainfrom
readonly-refactoring

Conversation

@danolivo
Copy link
Copy Markdown
Contributor

@danolivo danolivo commented Mar 4, 2026

  • Replace manual command-type filtering in post_parse_analyze and
    ExecutorStart hooks with setting XactReadOnly directly for non-superuser
    sessions, leveraging Postgres's built-in read-only transaction enforcement
  • Distinguish Spock-imposed read-only from user/GUC-imposed read-only by
    comparing XactReadOnly against the transaction_read_only GUC value
  • Rename READONLY_USER to READONLY_LOCAL to better reflect that the mode
    restricts local (non-replication) connections
  • In the apply worker, move CHECK_FOR_INTERRUPTS and config reload to the
    top of the replay loop so read-only mode changes are detected before applying
    the next transaction
  • When READONLY_ALL is active, the apply worker sends keepalive feedback but
    skips applying new transactions
  • Escalate the READONLY_ALL violation in replication_handler from ERROR
    to PANIC

@danolivo danolivo requested a review from ibrarahmad March 4, 2026 11:49
@danolivo danolivo added the enhancement New feature or request label Mar 4, 2026
@danolivo danolivo self-assigned this Mar 4, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 14a1cdb1-c32b-41ac-8811-91e6a3ed6e2d

📥 Commits

Reviewing files that changed from the base of the PR and between 4eba633 and 5e7d410.

📒 Files selected for processing (1)
  • include/spock_readonly.h
🚧 Files skipped from review as they are similar to previous changes (1)
  • include/spock_readonly.h

📝 Walkthrough

Walkthrough

Renamed exported enum member READONLY_USER to READONLY_LOCAL and updated option mappings; moved interrupt/config checks earlier in the apply loop, added feedback-only behavior for READONLY_ALL, changed replication read-only error to PANIC, and simplified read-only enforcement to set/clear XactReadOnly based on mode, superuser status, and the transaction_read_only GUC.

Changes

Cohort / File(s) Summary
Enum and config mapping
include/spock_readonly.h, src/spock.c
Renamed READONLY_USERREADONLY_LOCAL; added explicit local option mapping to READONLY_LOCAL; retained user as a backward-compatible alias; adjusted option-to-enum mapping logic.
Apply loop and replication handling
src/spock_apply.c
Moved CHECK_FOR_INTERRUPTS() and ConfigReloadPending handling earlier in the per-entry loop; added feedback-only path when spock_readonly == READONLY_ALL to keep walsender alive; changed replication read-only enforcement from ERROR to PANIC; adjusted loop control for prior exceptions.
Read-only enforcement logic
src/spock_readonly.c
Replaced per-statement/command read-only classification with global handling: when spock_readonly >= READONLY_LOCAL and non-superuser, set XactReadOnly = true; otherwise consult transaction_read_only GUC via GetConfigOption(...) to clear XactReadOnly; removed prior command-specific blocking and related checks; updated documentation.

Poem

🐰 I rename a flag and hop through code tonight,
Local now guards where User once held tight.
The applier listens, keeps the beat but skips the dance,
Panics trumpet warnings while feedback gets a chance,
A rabbit nods: rest transactions—spring’s gentle light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main refactoring: using Postgres's native transaction read-only mechanism instead of manual filtering.
Description check ✅ Passed The description is directly related to the changeset, providing specific details about the read-only mode rework and the six key changes implemented.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch readonly-refactoring

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/spock.c (1)

116-120: Consider adding a local GUC literal and keeping user as a hidden compatibility alias.

The enum was renamed to READONLY_LOCAL, but the exposed value is still "user". Accepting "local" improves clarity and keeps backward compatibility.

♻️ Suggested adjustment
 static const struct config_enum_entry readonly_options[] = {
 	{"off", READONLY_OFF, false},
-	{"user", READONLY_LOCAL, false},
+	{"local", READONLY_LOCAL, false},
+	{"user", READONLY_LOCAL, true},
 	{"all", READONLY_ALL, false},
 	{NULL, 0, false}
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock.c` around lines 116 - 120, The readonly_options enum currently
exposes "user" while the internal enum is READONLY_LOCAL; update the GUC
literals to add "local" as the primary visible option and keep "user" as a
hidden/compatibility alias in the readonly_options array that is returned by the
config system; specifically modify the readonly_options initializer (refer to
readonly_options and READONLY_LOCAL) to include an entry for "local" mapping to
READONLY_LOCAL and mark the existing "user" entry as hidden/alias so both parse
to READONLY_LOCAL while preserving backward compatibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_apply.c`:
- Around line 2887-2899: In the READONLY_ALL branch, replacing the inner
"continue" (when xact_had_exception is true) with a break ensures the worker
exits the loop to run the outer post-loop cleanup/exception/subscription
handling; update the code in spock_apply.c so that after
maybe_send_feedback(...) you do "if (xact_had_exception) break;" (or set a local
flag and break) instead of "continue", preserving any per-iteration cleanup and
allowing the outer post-loop handling to execute.

In `@src/spock_readonly.c`:
- Around line 65-68: spock_ropost_parse_analyze() and spock_roExecutorStart()
currently unconditionally reset DefaultXactReadOnly to false; instead, before
you change DefaultXactReadOnly save its current value into a local variable
(e.g., prev_DefaultXactReadOnly) and record whether Spock actually enforced
readonly (e.g., did_force_readonly = (spock_readonly >= READONLY_LOCAL &&
!superuser())); only set DefaultXactReadOnly when did_force_readonly is true,
and on exit restore DefaultXactReadOnly from prev_DefaultXactReadOnly only if
did_force_readonly was true; apply this save/restore pattern in both
spock_ropost_parse_analyze() and spock_roExecutorStart(), referencing the
DefaultXactReadOnly, spock_readonly, READONLY_LOCAL, and superuser() symbols to
locate the places to change.

---

Nitpick comments:
In `@src/spock.c`:
- Around line 116-120: The readonly_options enum currently exposes "user" while
the internal enum is READONLY_LOCAL; update the GUC literals to add "local" as
the primary visible option and keep "user" as a hidden/compatibility alias in
the readonly_options array that is returned by the config system; specifically
modify the readonly_options initializer (refer to readonly_options and
READONLY_LOCAL) to include an entry for "local" mapping to READONLY_LOCAL and
mark the existing "user" entry as hidden/alias so both parse to READONLY_LOCAL
while preserving backward compatibility.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e0080300-bc1e-4d74-ade7-87ba2b6fbdb5

📥 Commits

Reviewing files that changed from the base of the PR and between 50764f4 and 5f69140.

📒 Files selected for processing (4)
  • include/spock_readonly.h
  • src/spock.c
  • src/spock_apply.c
  • src/spock_readonly.c

@danolivo danolivo force-pushed the readonly-refactoring branch from 5f69140 to e60b508 Compare March 4, 2026 13:36
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/spock_apply.c (1)

2901-2910: ⚠️ Potential issue | 🟠 Major

READONLY_ALL exception path can still spin and misses postmaster-death handling.

In this branch, WaitLatchOrSocket(... WL_SOCKET_READABLE ...) + continue (Line 2903 onward) can return immediately on a perpetually readable socket, causing tight-loop churn; it also skips checking WL_POSTMASTER_DEATH in this path. Prefer a latch/timeout wait here and preserve the emergency bailout.

💡 Suggested patch
 					if (xact_had_exception)
 					{
-						rc = WaitLatchOrSocket(&MyProc->procLatch,
-								   WL_SOCKET_READABLE | WL_LATCH_SET |
-								   WL_TIMEOUT | WL_POSTMASTER_DEATH,
-								   fd, 1000L);
-
-						ResetLatch(&MyProc->procLatch);
+						rc = WaitLatch(&MyProc->procLatch,
+									   WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+									   1000L,
+									   WAIT_EVENT_LOGICAL_APPLY_MAIN);
+						if (rc & WL_POSTMASTER_DEATH)
+							proc_exit(1);
+						ResetLatch(&MyProc->procLatch);
 						continue;
 					}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 2901 - 2910, In the xact_had_exception branch
(around the WaitLatchOrSocket call), don't call WaitLatchOrSocket with
WL_SOCKET_READABLE and then continue because a perpetually readable socket can
cause a tight spin and skip postmaster-death handling; replace that call with
WaitLatch(&MyProc->procLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
timeout) (i.e., use WaitLatch instead of WaitLatchOrSocket), keep
ResetLatch(&MyProc->procLatch) after the wait, and explicitly handle the
WL_POSTMASTER_DEATH case (or call proc_exit/emergency bailout) so the emergency
bailout path is preserved instead of looping continuously. Ensure you reference
the xact_had_exception branch, MyProc->procLatch, WaitLatchOrSocket (old) and
WaitLatch (new), ResetLatch, and WL_POSTMASTER_DEATH in your changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock.c`:
- Around line 116-119: The readonly_options array currently only maps the old
name "user" to READONLY_LOCAL, so setting spock.readonly = 'local' is rejected;
update the readonly_options entries to include "local" as the canonical string
mapping to READONLY_LOCAL and keep "user" as a backward-compatible alias (both
mapping to READONLY_LOCAL) so the parser accepts either name; modify the static
const struct config_enum_entry readonly_options[] (and any related
parsing/validation that references READONLY_LOCAL) accordingly.

---

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2901-2910: In the xact_had_exception branch (around the
WaitLatchOrSocket call), don't call WaitLatchOrSocket with WL_SOCKET_READABLE
and then continue because a perpetually readable socket can cause a tight spin
and skip postmaster-death handling; replace that call with
WaitLatch(&MyProc->procLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
timeout) (i.e., use WaitLatch instead of WaitLatchOrSocket), keep
ResetLatch(&MyProc->procLatch) after the wait, and explicitly handle the
WL_POSTMASTER_DEATH case (or call proc_exit/emergency bailout) so the emergency
bailout path is preserved instead of looping continuously. Ensure you reference
the xact_had_exception branch, MyProc->procLatch, WaitLatchOrSocket (old) and
WaitLatch (new), ResetLatch, and WL_POSTMASTER_DEATH in your changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5595b87b-7ef4-4477-90de-6e1d4fe06adb

📥 Commits

Reviewing files that changed from the base of the PR and between 5f69140 and e60b508.

📒 Files selected for processing (4)
  • include/spock_readonly.h
  • src/spock.c
  • src/spock_apply.c
  • src/spock_readonly.c
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spock_readonly.c

@danolivo danolivo force-pushed the readonly-refactoring branch from e60b508 to fdb7dd1 Compare March 4, 2026 14:36
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/spock_apply.c (1)

2887-2912: ⚠️ Potential issue | 🔴 Critical

READONLY_ALL exception path can trap the worker in the inner loop and skip finalization flow.

With xact_had_exception == true, continue at Line 2909 keeps control inside the inner loop, so the post-loop exception/subscription handling block is never reached. This can stall apply progress indefinitely while in READONLY_ALL.

🐛 Minimal fix
-				if (spock_readonly == READONLY_ALL)
+				if (spock_readonly == READONLY_ALL)
 				{
 					/*
 					 * Send feedback to keep walsender alive - we may avoid it
 					 * with introduction of TCP keepalive approach.
 					 */
 					maybe_send_feedback(applyconn, last_received,
 										&last_receive_timestamp);
-
-					/*
-					 * In case of an exception we can't break out of the loop
-					 * because exception processing code may also modify the
-					 * database. Wait briefly and continue to the next iteration.
-					 */
-					if (xact_had_exception)
-					{
-						rc = WaitLatchOrSocket(&MyProc->procLatch,
-								   WL_SOCKET_READABLE | WL_LATCH_SET |
-								   WL_TIMEOUT | WL_POSTMASTER_DEATH,
-								   fd, 1000L);
-
-						ResetLatch(&MyProc->procLatch);
-						continue;
-					}
 					break;
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 2887 - 2912, The READONLY_ALL branch
currently does a continue when xact_had_exception is true, which keeps execution
inside the inner loop and skips the post-loop exception/subscription
finalization; change the control flow so that after maybe_send_feedback and the
WaitLatchOrSocket call the code breaks out of the inner loop (not continue),
ensuring the subsequent post-loop exception/subscription handling runs (update
the block around spock_readonly == READONLY_ALL, referencing xact_had_exception,
maybe_send_feedback and WaitLatchOrSocket to use break instead of continue).
🧹 Nitpick comments (1)
src/spock_readonly.c (1)

68-89: Extract duplicated read-only sync logic into one helper.

spock_ropost_parse_analyze() and spock_roExecutorStart() now carry identical state-sync logic. A shared static helper would reduce drift risk between hooks.

♻️ Suggested refactor
+static inline void
+spock_sync_xact_readonly(void)
+{
+	if (spock_readonly >= READONLY_LOCAL && !superuser())
+		XactReadOnly = true;
+	else if (XactReadOnly)
+	{
+		const char *value = GetConfigOption("transaction_read_only", false, false);
+		if (strcmp(value, "off") == 0)
+			XactReadOnly = false;
+	}
+}
+
 void
 spock_ropost_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
 {
-	...
+	spock_sync_xact_readonly();
 }
 
 void
 spock_roExecutorStart(QueryDesc *queryDesc, int eflags)
 {
-	...
+	spock_sync_xact_readonly();
 }

Also applies to: 95-116

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_readonly.c` around lines 68 - 89, Both spock_ropost_parse_analyze
and spock_roExecutorStart contain duplicated logic that syncs spock_readonly
with the Postgres XactReadOnly state; extract that logic into a single static
helper (e.g., sync_spock_readonly_state) and have both functions call it. The
helper should encapsulate the current conditional flow that checks
spock_readonly >= READONLY_LOCAL, superuser(), sets XactReadOnly, and restores
it by reading transaction_read_only via GetConfigOption and strcmp(value, "off")
to detect and revert Spock-imposed changes. Replace the duplicated blocks in
spock_ropost_parse_analyze and spock_roExecutorStart with a call to the new
static function to avoid drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/spock_apply.c`:
- Around line 2887-2912: The READONLY_ALL branch currently does a continue when
xact_had_exception is true, which keeps execution inside the inner loop and
skips the post-loop exception/subscription finalization; change the control flow
so that after maybe_send_feedback and the WaitLatchOrSocket call the code breaks
out of the inner loop (not continue), ensuring the subsequent post-loop
exception/subscription handling runs (update the block around spock_readonly ==
READONLY_ALL, referencing xact_had_exception, maybe_send_feedback and
WaitLatchOrSocket to use break instead of continue).

---

Nitpick comments:
In `@src/spock_readonly.c`:
- Around line 68-89: Both spock_ropost_parse_analyze and spock_roExecutorStart
contain duplicated logic that syncs spock_readonly with the Postgres
XactReadOnly state; extract that logic into a single static helper (e.g.,
sync_spock_readonly_state) and have both functions call it. The helper should
encapsulate the current conditional flow that checks spock_readonly >=
READONLY_LOCAL, superuser(), sets XactReadOnly, and restores it by reading
transaction_read_only via GetConfigOption and strcmp(value, "off") to detect and
revert Spock-imposed changes. Replace the duplicated blocks in
spock_ropost_parse_analyze and spock_roExecutorStart with a call to the new
static function to avoid drift.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b1824a4f-54be-4418-93b9-88195988dc05

📥 Commits

Reviewing files that changed from the base of the PR and between e60b508 and fdb7dd1.

📒 Files selected for processing (4)
  • include/spock_readonly.h
  • src/spock.c
  • src/spock_apply.c
  • src/spock_readonly.c
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/spock.c
  • include/spock_readonly.h

@danolivo
Copy link
Copy Markdown
Contributor Author

danolivo commented Mar 4, 2026

Actionable comments posted: 1

♻️ Duplicate comments (1)
🤖 Prompt for all review comments with AI agents
ℹ️ Review info

I think it is not good or necessary change. So, it seems we can keep it as is.

* execution (remember SPI plan for example). So, additional control makes
* sense here.
*/
if (spock_readonly >= READONLY_LOCAL && !superuser())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When spock.readonly = 'all', a superuser can still execute DDL/DML, Which shouldn't be the case right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the refactoring precisely clarifies the case: LOCAL means only backends are forbidden to write; ALL means LR processes too. Superusers can write at any time they want. Disallowing that is too dangerous: in case of a bug, the only way to change mode might be SIGKILL. So, this refactoring attempts to make this feature safer.

Instead of manually filtering command types in post_parse_analyze and
ExecutorStart hooks, set XactReadOnly directly for non-superuser
sessions. This leverages Postgres's built-in read-only transaction
enforcement instead of reimplementing it. To distinguish Spock-imposed
read-only from user/GUC-imposed read-only, compare XactReadOnly against
the transaction_read_only GUC value.

Rename READONLY_USER to READONLY_LOCAL to better reflect that the mode
restricts local (non-replication) connections.

In the apply worker, move CHECK_FOR_INTERRUPTS and config reload to the
top of the replay loop so read-only mode changes are detected before
applying the next transaction. When READONLY_ALL is active, the worker
sends keepalive feedback but skips applying new transactions. Escalate
the READONLY_ALL violation in replication_handler from ERROR to PANIC.
@danolivo danolivo force-pushed the readonly-refactoring branch from fdb7dd1 to 4eba633 Compare March 31, 2026 11:23
@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Mar 31, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 2 duplication

Metric Results
Duplication 2

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/spock_apply.c`:
- Around line 2909-2933: In the xact_had_exception branch where spock_readonly
== READONLY_ALL you must check the WaitLatchOrSocket return (rc) from
WaitLatchOrSocket(&MyProc->procLatch, ...) for WL_POSTMASTER_DEATH and
WL_SOCKET_READABLE before issuing continue so postmaster death and socket events
are handled; update the branch in spock_apply.c to inspect rc and handle
WL_POSTMASTER_DEATH (exit/cleanup) and WL_SOCKET_READABLE (fall through so outer
loop will call PQconsumeInput()/PQgetCopyData) or alternatively move the wait
out of the readonly-inner-branch so PQconsumeInput and PQgetCopyData aren’t
starved — adjust logic around WaitLatchOrSocket, rc, WL_POSTMASTER_DEATH,
WL_SOCKET_READABLE, MyProc->procLatch, PQconsumeInput and PQgetCopyData
accordingly to avoid dropping signals and buffering COPY data.
- Around line 2436-2438: The elog call in spock_apply.c uses PANIC when
spock_readonly == READONLY_ALL which is wrong; change the error level to FATAL
so only the apply worker exits instead of triggering a postmaster restart. Edit
the branch that checks spock_readonly (the condition referencing READONLY_ALL
and MySubscription->name) and replace the PANIC error level passed to elog with
FATAL, leaving the message and MySubscription->name intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74f765b3-51d4-49c1-9ab8-1653a249d2df

📥 Commits

Reviewing files that changed from the base of the PR and between fdb7dd1 and 4eba633.

📒 Files selected for processing (4)
  • include/spock_readonly.h
  • src/spock.c
  • src/spock_apply.c
  • src/spock_readonly.c
🚧 Files skipped from review as they are similar to previous changes (2)
  • include/spock_readonly.h
  • src/spock_readonly.c

Comment on lines 2436 to 2438
if (spock_readonly == READONLY_ALL)
elog(ERROR, "SPOCK %s: cluster is in read-only mode, not performing replication",
elog(PANIC, "SPOCK %s: cluster is in read-only mode, not performing replication",
MySubscription->name);
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Let me check the current directory structure
ls -la
git status 2>/dev/null || echo "Not a git repo"

Repository: pgEdge/spock

Length of output: 1569


🏁 Script executed:

# Try to locate the file without assuming path
find . -type f -name "spock_apply.c" 2>/dev/null

Repository: pgEdge/spock

Length of output: 75


🏁 Script executed:

# If not found, check git files
git ls-files 2>/dev/null | grep spock_apply

Repository: pgEdge/spock

Length of output: 145


🏁 Script executed:

# First, verify the exact content at lines 2436-2438
sed -n '2436,2438p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 224


🏁 Script executed:

# Check the context around that code block
sed -n '2430,2450p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 788


🏁 Script executed:

# Search for other READONLY_ALL usage to understand its nature
rg "READONLY_ALL" src/spock_apply.c -B2 -A2

Repository: pgEdge/spock

Length of output: 329


🏁 Script executed:

# Check if there are any comments or definitions related to READONLY_ALL
rg "READONLY_ALL|read.only" src/ --type c | head -30

Repository: pgEdge/spock

Length of output: 1254


🏁 Script executed:

# Check for elog PANIC usage in this file to understand current patterns
rg "elog\(PANIC" src/spock_apply.c -B2 -A2

Repository: pgEdge/spock

Length of output: 205


🏁 Script executed:

# Look for elog FATAL usage in this file for comparison
rg "elog\(FATAL" src/spock_apply.c -B2 -A2

Repository: pgEdge/spock

Length of output: 171


🏁 Script executed:

# Check if there's a git history of this change
git log --oneline -n 20 -- src/spock_apply.c 2>/dev/null || echo "Git history not available"

Repository: pgEdge/spock

Length of output: 86


🏁 Script executed:

# Search for READONLY state definitions
rg "enum.*READONLY|#define.*READONLY" --type c -B1 -A3

Repository: pgEdge/spock

Length of output: 38


🏁 Script executed:

# Check the second READONLY_ALL guard around line 2909
sed -n '2900,2940p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 1412


🏁 Script executed:

# Check for other PANIC usage in the broader codebase to understand patterns
rg "elog\(PANIC" src/ --type c -B1 -A1 | head -50

Repository: pgEdge/spock

Length of output: 386


🏁 Script executed:

# Search for ERROR level usage in similar contexts
rg "READONLY" src/ --type c -A3 -B1 | grep -E "elog|errmsg" -A1 -B1

Repository: pgEdge/spock

Length of output: 257


🏁 Script executed:

# Check PostgreSQL headers or comments for error level semantics
rg "PANIC|FATAL|ERROR" include/ --type c -B1 -A1 | head -40

Repository: pgEdge/spock

Length of output: 415


🏁 Script executed:

# Look for any comments in the code explaining error level choice
rg "PANIC|error level|ErrorLevel" src/spock_apply.c -B2 -A2

Repository: pgEdge/spock

Length of output: 205


🏁 Script executed:

# Check git history of the problematic line
git log -p --follow -S "READONLY_ALL" -- src/spock_apply.c | head -100

Repository: pgEdge/spock

Length of output: 38


🏁 Script executed:

# Check the context of the second READONLY_ALL handling - lines 2900-2940
sed -n '2890,2950p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 1949


🏁 Script executed:

# Look for how other apply workers or background workers handle operator-settable conditions
rg "ConfigReloadPending|ProcessConfigFile" src/ --type c -B2 -A2 | head -50

Repository: pgEdge/spock

Length of output: 1679


🏁 Script executed:

# Check if there are comments explaining the PANIC choice or comparing error levels
git log --all --grep="READONLY\|PANIC\|read-only" --oneline 2>/dev/null | head -20

Repository: pgEdge/spock

Length of output: 466


🏁 Script executed:

# Search for any discussion of PANIC vs FATAL in comments
rg "PANIC|FATAL" src/spock_apply.c -B3 -A3

Repository: pgEdge/spock

Length of output: 527


🏁 Script executed:

# Check what happens after a FATAL in background worker context
rg "FATAL" src/ --type c -B2 -A2 | head -60

Repository: pgEdge/spock

Length of output: 1838


🏁 Script executed:

# Look for the function signature and context of replication_handler
sed -n '2420,2445p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 863


🏁 Script executed:

# Check if there's a difference in how the two READONLY_ALL guards are logically placed
# The first is at function entry, the second is in the main loop
sed -n '2430,2440p' src/spock_apply.c | cat -n
sed -n '2900,2920p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 1201


🏁 Script executed:

# Check the most relevant commit for context
git show a14acf5 --stat 2>/dev/null | head -30

Repository: pgEdge/spock

Length of output: 651


🏁 Script executed:

# Check what changed in that commit for the PANIC line
git show a14acf5 -- src/spock_apply.c 2>/dev/null | grep -A5 -B5 "PANIC\|READONLY_ALL" | head -50

Repository: pgEdge/spock

Length of output: 38


🏁 Script executed:

# Get more detail on the commit that changed READONLY_ALL handling
git show a14acf5 src/spock_apply.c 2>/dev/null | head -100

Repository: pgEdge/spock

Length of output: 38


🏁 Script executed:

# Check if there's a comment or explanation in the commit message
git show a14acf5 2>/dev/null | head -50

Repository: pgEdge/spock

Length of output: 1394


🏁 Script executed:

# Look at the full context of line 2437 - what function is it in and what's the exit strategy?
sed -n '2420,2480p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 1656


🏁 Script executed:

# Confirm whether this is really in replication_handler which is a message processor
grep -n "replication_handler" src/spock_apply.c | head -5

Repository: pgEdge/spock

Length of output: 130


🏁 Script executed:

# Final verification: check if replication_handler is called in error-recovery context or normal message processing
sed -n '3050,3065p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 546


Change PANIC to FATAL.

READONLY_ALL is an operator-settable GUC state, not a cluster corruption invariant. This is evidenced by the graceful handling already present in the main replay loop (line ~2909), which allows the worker to continue polling while read-only. Using PANIC causes an unnecessary postmaster restart; FATAL terminates just the apply worker, consistent with PostgreSQL error semantics and the existing guard pattern.

Minimal patch
-		elog(PANIC, "SPOCK %s: cluster is in read-only mode, not performing replication",
+		elog(FATAL, "SPOCK %s: cluster is in read-only mode, not performing replication",
 			 MySubscription->name);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 2436 - 2438, The elog call in spock_apply.c
uses PANIC when spock_readonly == READONLY_ALL which is wrong; change the error
level to FATAL so only the apply worker exits instead of triggering a postmaster
restart. Edit the branch that checks spock_readonly (the condition referencing
READONLY_ALL and MySubscription->name) and replace the PANIC error level passed
to elog with FATAL, leaving the message and MySubscription->name intact.

✅ Confirmed as addressed by @mason-sharp

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, there is no corruption of data. FATAL will just restart the worker.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mason-sharp, glad that aligns! Here's the fix:

  • Change elog(PANIC, ...) to elog(FATAL, ...) in replication_handler() in src/spock_apply.c

🧠 Learnings used
Learnt from: danolivo
Repo: pgEdge/spock PR: 400
File: src/spock_apply.c:1292-1314
Timestamp: 2026-03-31T11:37:20.925Z
Learning: In pgEdge/spock (src/spock_apply.c), when TRANSDISCARD or SUB_DISABLE exception behaviour is active and a queued DDL/SQL message is processed, two entries are intentionally written to spock.exception_log: one from the queue-table INSERT path (handle_insert) and one from handle_sql_or_exception. This is by design to capture as much information as possible about the failed operation chain. Do not flag this double-logging as a bug.

Learnt from: mason-sharp
Repo: pgEdge/spock PR: 378
File: src/spock.c:331-335
Timestamp: 2026-03-05T19:22:53.103Z
Learning: In `src/spock.c` (pgEdge/spock), the `suffix` parameter of `spock_connect_base()` (and its public wrappers `spock_connect()` / `spock_connect_replica()`) is always called with short hardcoded string literals ("copy", "create", "sync", "copy_slot"). There is no dynamic or user-supplied suffix. Therefore, flagging the arithmetic `NAMEDATALEN - strlen(suffix) - 2` as an underflow risk is a theoretical concern and the maintainers consider it unnecessary to add a runtime guard.

Learnt from: danolivo
Repo: pgEdge/spock PR: 402
File: tests/regress/sql/spill_transaction.sql:115-117
Timestamp: 2026-03-31T10:03:18.941Z
Learning: In `tests/regress/sql/spill_transaction.sql` (pgEdge/spock PR `#402`), `spock.resolutions` is expected to remain empty throughout the spill_transaction regression test by design. None of the three `exception_behaviour` modes tested (`transdiscard`, `discard`, `sub_disable`) write entries to `spock.resolutions` in this test's conflict scenarios. Do not flag cross-scenario pollution of `spock.resolutions` as a concern for this test.

Learnt from: mason-sharp
Repo: pgEdge/spock PR: 372
File: docs/configuring.md:207-211
Timestamp: 2026-03-05T23:04:27.732Z
Learning: In the spock codebase, replication origin value `0` means the row was changed locally (not via replication). This is correct behavior and should be documented as such. The `NULL` value in `spock.resolutions.local_origin` and "unknown" in PostgreSQL logs are a separate distinct case where the local origin is genuinely unavailable (e.g., for pre-existing data after a pg_restore). Do not conflate origin=0 (local) with NULL/unknown (unavailable origin).

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!


Failed to handle agent chat message. Please try again.

Comment on lines +2909 to +2933
if (spock_readonly == READONLY_ALL)
{
/*
* Send feedback to keep walsender alive - we may avoid it
* with introduction of TCP keepalive approach.
*/
maybe_send_feedback(applyconn, last_received,
&last_receive_timestamp);

/*
* In case of an exception we can't break out of the loop
* because exception processing code may also modify the
* database. Wait briefly and continue to the next iteration.
*/
if (xact_had_exception)
{
rc = WaitLatchOrSocket(&MyProc->procLatch,
WL_SOCKET_READABLE | WL_LATCH_SET |
WL_TIMEOUT | WL_POSTMASTER_DEATH,
fd, 1000L);

ResetLatch(&MyProc->procLatch);
continue;
}
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the exact file and context around the specified lines
wc -l src/spock_apply.c

Repository: pgEdge/spock

Length of output: 78


🏁 Script executed:

# Read the context around lines 2909-2933, including surrounding code
sed -n '2850,2980p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 4267


🏁 Script executed:

# Search for the outer loop structure and PQconsumeInput() usage
rg -n 'PQconsumeInput|PQgetCopyData' src/spock_apply.c

Repository: pgEdge/spock

Length of output: 140


🏁 Script executed:

# Look for WaitLatchOrSocket and its usage patterns in this file
rg -n -A 5 'WaitLatchOrSocket' src/spock_apply.c | head -80

Repository: pgEdge/spock

Length of output: 518


🏁 Script executed:

# Get a broader view of the function containing this code - search for the function definition
rg -n 'apply_work\|process.*connection' src/spock_apply.c | head -20

Repository: pgEdge/spock

Length of output: 38


🏁 Script executed:

# Get context for the outer loop structure - lines before 2850
sed -n '2800,2860p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 2009


🏁 Script executed:

# Get the end of inner loop and what happens after - lines after 2930
sed -n '2930,3000p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 2156


🏁 Script executed:

# Search for where this code is - what's the function name?
rg -B 100 'if \(spock_readonly == READONLY_ALL\)' src/spock_apply.c | grep -E '^[0-9]+-.*\(' | tail -5

Repository: pgEdge/spock

Length of output: 38


🏁 Script executed:

# Let's see the complete nested loop structure more clearly - search around line 2908-2960
sed -n '2900,2960p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 2046


🏁 Script executed:

# Check what happens at the end of the inner for(;;) loop to understand full flow
sed -n '2960,3050p' src/spock_apply.c | cat -n

Repository: pgEdge/spock

Length of output: 3160


🏁 Script executed:

# Verify: when we break from inner loop at line 2932, do we go back to outer while?
# Also check if there's any other handling of WL_POSTMASTER_DEATH in the inner loop
rg -n 'WL_POSTMASTER_DEATH' src/spock_apply.c

Repository: pgEdge/spock

Length of output: 193


Check WaitLatchOrSocket result in the xact_had_exception branch before continuing.

The inner WaitLatchOrSocket() call at lines 2925-2928 returns status in rc but this result is discarded before continue at line 2932. This means WL_POSTMASTER_DEATH signals are missed—the outer loop checks this at line 2847, but the inner branch doesn't. Additionally, when READONLY_ALL is true, PQgetCopyData() is suppressed entirely (unreached at line 2941), yet the outer loop still calls PQconsumeInput() at line 2853, allowing libpq to buffer incoming COPY data indefinitely during extended readonly windows.

Either inspect the rc result for WL_POSTMASTER_DEATH and WL_SOCKET_READABLE in this branch, or move the wait to before the inner loop enters to avoid signal loss and buffering during readonly mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/spock_apply.c` around lines 2909 - 2933, In the xact_had_exception branch
where spock_readonly == READONLY_ALL you must check the WaitLatchOrSocket return
(rc) from WaitLatchOrSocket(&MyProc->procLatch, ...) for WL_POSTMASTER_DEATH and
WL_SOCKET_READABLE before issuing continue so postmaster death and socket events
are handled; update the branch in spock_apply.c to inspect rc and handle
WL_POSTMASTER_DEATH (exit/cleanup) and WL_SOCKET_READABLE (fall through so outer
loop will call PQconsumeInput()/PQgetCopyData) or alternatively move the wait
out of the readonly-inner-branch so PQconsumeInput and PQgetCopyData aren’t
starved — adjust logic around WaitLatchOrSocket, rc, WL_POSTMASTER_DEATH,
WL_SOCKET_READABLE, MyProc->procLatch, PQconsumeInput and PQgetCopyData
accordingly to avoid dropping signals and buffering COPY data.

@mason-sharp
Copy link
Copy Markdown
Member

read-only-doc.patch
Attaching suggested documentation changes to include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants