Use non-transactional message for sync_event to eliminate WAL send delay#403
Use non-transactional message for sync_event to eliminate WAL send delay#403ibrarahmad merged 1 commit intomainfrom
Conversation
…AL send delay sync_event() previously used a transactional LogLogicalMessage which required the reorder buffer to process a full BEGIN/COMMIT cycle before delivery to subscribers. This caused the sync event LSN to not appear in pg_replication_origin_status until another unrelated COMMIT advanced the WAL sender. Switch to a non-transactional message with flush=true, which: - Bypasses the reorder buffer for immediate delivery - Explicitly flushes WAL to wake the walsender via WalSndWakeupProcessRequests - Uses (LogLogicalMessage)() to bypass the 4-arg compat macro and pass the 5th flush parameter directly on PG17+; calls XLogFlush on PG15/16 On the output plugin side, send the startup message before the sync event if it hasn't been sent yet. This is necessary because the startup message is normally sent during pg_decode_begin_txn, but a non-transactional message can arrive before any transaction is decoded on a newly-created subscription, causing a protocol version mismatch that crashes the apply worker. On the apply side, accept non-transactional messages and explicitly call replorigin_session_advance() so pg_replication_origin_status reflects the sync position immediately. Diagnostic logging added to trace sync_event flow across provider, output plugin, and subscriber.
📝 WalkthroughWalkthroughThe changes modify sync event message handling to support non-transactional operations. Validation rules are relaxed, sync markers are emitted non-transactionally with earlier WAL flush, and the output plugin treats transaction context as optional for non-transactional messages. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/spock_functions.c`:
- Around line 3296-3312: spock_create_sync_event currently emits a
non-transactional logical message which can be observed before surrounding
transactional changes are committed; add an explicit guard at the start of
spock_create_sync_event (immediately after the existing check_local_node() call)
that rejects being called inside an explicit transaction block or a
subtransaction (use the same check/ereport pattern used in
alter_subscription_disable() and alter_subscription_enable() to detect
IsInTransactionBlock()/IsInSubTransaction() and raise an ERROR with a clear
message), so the function can only be run in a top-level non-transactional
context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2f914511-eb93-486c-9da0-e331b31bac28
📒 Files selected for processing (3)
src/spock_apply.csrc/spock_functions.csrc/spock_output_plugin.c
| /* | ||
| * Use a non-transactional message with flush. This bypasses the reorder | ||
| * buffer so the walsender delivers it immediately, and the explicit WAL | ||
| * flush wakes the walsender via WalSndWakeupProcessRequests. | ||
| * | ||
| * PG17+ has a 5th flush parameter; on PG15/16 we call XLogFlush manually. | ||
| * Parentheses around the function name bypass the 4-arg compat macro on | ||
| * PG17/18 so we can pass the 5th parameter directly. | ||
| */ | ||
| #if PG_VERSION_NUM >= 170000 | ||
| lsn = (LogLogicalMessage)(SPOCK_MESSAGE_PREFIX, (char *) &message, | ||
| sizeof(message), false, true); | ||
| #else | ||
| lsn = LogLogicalMessage(SPOCK_MESSAGE_PREFIX, (char *) &message, | ||
| sizeof(message), true); | ||
| sizeof(message), false); | ||
| XLogFlush(lsn); | ||
| #endif |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and examine the spock_create_sync_event function around the flagged lines
fd -t f spock_functions.c | head -5Repository: pgEdge/spock
Length of output: 77
🏁 Script executed:
# Get context around lines 3296-3312
head -3320 src/spock_functions.c | tail -100 | cat -nRepository: pgEdge/spock
Length of output: 4001
🏁 Script executed:
# Search for IsTransactionBlock in the codebase and PostgreSQL
rg -i "IsTransactionBlock|IsSubTransaction" --max-count=20Repository: pgEdge/spock
Length of output: 215
🏁 Script executed:
# Find callers of spock_create_sync_event to understand usage patterns
rg "spock_create_sync_event" -A 3 -B 3Repository: pgEdge/spock
Length of output: 1492
🏁 Script executed:
# Check the function definition and any comments about barrier semantics
ast-grep --pattern $'Datum
spock_create_sync_event($_) {
$$$
}'Repository: pgEdge/spock
Length of output: 2102
🏁 Script executed:
# Find the exact lines with IsTransactionBlock usage
rg -n "IsTransactionBlock|IsSubTransaction" src/spock_functions.cRepository: pgEdge/spock
Length of output: 179
🏁 Script executed:
# Get context around those lines to see the pattern
rg -B 10 -A 5 "IsTransactionBlock" src/spock_functions.cRepository: pgEdge/spock
Length of output: 857
🏁 Script executed:
# Check if there's any documentation or design notes about sync_event in comments
rg -B 5 -A 15 "spock_create_sync_event|sync_event|synchronization event" src/spock_functions.c | head -80Repository: pgEdge/spock
Length of output: 2697
Add transaction-block guard to spock_create_sync_event().
The function documents barrier semantics ("ensuring that all changes up to and including the event have been replicated"), but uses non-transactional LogLogicalMessage() which bypasses the reorder buffer and is decoded immediately. This creates a correctness hole: BEGIN; INSERT ...; SELECT spock_create_sync_event(); COMMIT; exposes the sync marker before the transactional changes are replayed on the subscriber, violating the barrier contract.
Add a guard at the start of the function (lines 3284–3290, after check_local_node()) to reject explicit transaction blocks and subtransactions, matching the pattern already used in alter_subscription_disable() and alter_subscription_enable() at lines 781 and 817:
Suggested guard
Datum
spock_create_sync_event(PG_FUNCTION_ARGS)
{
SpockLocalNode *node;
SpockSyncEventMessage message;
XLogRecPtr lsn;
node = check_local_node(true);
+ if (IsTransactionBlock() || IsSubTransaction())
+ ereport(ERROR,
+ (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
+ errmsg("spock_create_sync_event() cannot be run inside a transaction block")));
+
message.mtype = SPOCK_SYNC_EVENT_MSG;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/spock_functions.c` around lines 3296 - 3312, spock_create_sync_event
currently emits a non-transactional logical message which can be observed before
surrounding transactional changes are committed; add an explicit guard at the
start of spock_create_sync_event (immediately after the existing
check_local_node() call) that rejects being called inside an explicit
transaction block or a subtransaction (use the same check/ereport pattern used
in alter_subscription_disable() and alter_subscription_enable() to detect
IsInTransactionBlock()/IsInSubTransaction() and raise an ERROR with a clear
message), so the function can only be run in a top-level non-transactional
context.
sync_event() previously used a transactional LogLogicalMessage which required the reorder buffer to process a full BEGIN/COMMIT cycle before delivery to subscribers. This caused the sync event LSN to not appear in pg_replication_origin_status until another unrelated COMMIT advanced the WAL sender.
Switch to a non-transactional message with flush=true, which:
On the output plugin side, send the startup message before the sync event if it hasn't been sent yet. This is necessary because the startup message is normally sent during pg_decode_begin_txn, but a non-transactional message can arrive before any transaction is decoded on a newly-created subscription, causing a protocol version mismatch that crashes the apply worker.
On the apply side, accept non-transactional messages and explicitly call replorigin_session_advance() so pg_replication_origin_status reflects the sync position immediately. Diagnostic logging added to trace sync_event flow across provider, output plugin, and subscriber.