Skip to content

Conversation

@chrismccord
Copy link

sync_register/sync_join messages from multicast_loop can arrive before ack_sync from gen_server since they're different senders (no ordering guarantee). When this happens, the message was dropped because the remote node wasn't in nodes_map yet, leaving stale data from ack_sync which is just about to arrive (containing stale data that lacks the raced registrations).

Fix: Include RemoteScopePid in broadcasts to allow inline discovery when sync arrives before ack_sync. Old message format still supported for rolling upgrades.

Note: I wasn't able to run the multinode tests regardless of OTP 25/26/28. ct_slave was failing to connect nodes for whatever reason.

The other option than including the scope pid in all broadcasts would be to buffer the received broadcasts for nodes that we are awaiting ack_sync, then "replay" them, but that seemed like a more complex change and would require cleanup/sweeping to avoid unbounded buffer if a node failed during the discover/ack handshake. Thanks!

sync_register/sync_join messages from multicast_loop can arrive before
ack_sync from gen_server since they're different senders (no ordering
guarantee). When this happens, the message was dropped because the
remote node wasn't in nodes_map yet, leaving stale data from ack_sync.

Fix: Include RemoteScopePid in broadcasts to allow inline discovery
when sync arrives before ack_sync. Old message format still supported
for rolling upgrades.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant