WIP: batched heartbeats#15
Conversation
| :log_entry, | ||
| :snapshot_transfer | ||
| :snapshot_transfer, | ||
| :batch_id |
There was a problem hiding this comment.
don't love adding this to the messages themselves since it feels like a leaky abstraction that's not relevant to the actual heartbeat, but it felt cleaner than updating all the handlers in Consensus to have a tuple with the batch_id and the message.
| groups = Craft.Application.running_groups() | ||
|
|
||
| for group <- groups do | ||
| :gen_statem.cast(Craft.Application.lookup(group, Craft.Consensus), {:tick, tick_ref}) |
There was a problem hiding this comment.
This approach uses a centralized ticker (this process) rather than each group running its own heartbeat interval, and then we ask all the groups to generate their heartbeat message. This way, we can syncronize the heartbeats and send them all as one, rather than collecting them ad-hoc as they are generated and introducing some kind of wait before sending them out. (we also have a timeout in this method, but in the happy path, if it gets responses from everyone it immediately sends out the batch rather than waiting for the timeout)
| messages = | ||
| state.members | ||
| |> Members.other_nodes() | ||
| |> Map.new(fn to_node -> |
There was a problem hiding this comment.
Doesn't need to be Task.async_stream anymore since we aren't actually doing side-effects, just creating structs. likely faster to just do it as enum.
dcb7e5c to
ffc0a9b
Compare
| if membership_change.action == :remove && membership_change.node == node() do | ||
| data = LeaderState.transfer_leadership(data) | ||
| {data, messages} = heartbeat(data) | ||
| Message.send_heartbeats_now(data, messages) |
There was a problem hiding this comment.
we still sometimes directly trigger sending a heartbeat from a consensus process, but I think these only happen at random times so not as likely to get thundering heart of traffic like we do from a heartbeat that happens all the time for all groups. I'd like some feedback on what situations this might be a problem in (i.e. adding a member to all groups at the same time?)
No description provided.