Fix race condition causing stale pids in syn lookup #87
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
sync_register/sync_join messages from multicast_loop can arrive before ack_sync from gen_server since they're different senders (no ordering guarantee). When this happens, the message was dropped because the remote node wasn't in nodes_map yet, leaving stale data from ack_sync which is just about to arrive (containing stale data that lacks the raced registrations).
Fix: Include RemoteScopePid in broadcasts to allow inline discovery when sync arrives before ack_sync. Old message format still supported for rolling upgrades.
Note: I wasn't able to run the multinode tests regardless of OTP 25/26/28. ct_slave was failing to connect nodes for whatever reason.
The other option than including the scope pid in all broadcasts would be to buffer the received broadcasts for nodes that we are awaiting ack_sync, then "replay" them, but that seemed like a more complex change and would require cleanup/sweeping to avoid unbounded buffer if a node failed during the discover/ack handshake. Thanks!