Skip to content

WIP: batched heartbeats#15

Draft
enewbury wants to merge 1 commit into
chassisframework:mainfrom
enewbury:enewbury/batched-heartbeats
Draft

WIP: batched heartbeats#15
enewbury wants to merge 1 commit into
chassisframework:mainfrom
enewbury:enewbury/batched-heartbeats

Conversation

@enewbury
Copy link
Copy Markdown
Contributor

@enewbury enewbury commented Mar 6, 2026

No description provided.

@enewbury enewbury mentioned this pull request Mar 6, 2026
:log_entry,
:snapshot_transfer
:snapshot_transfer,
:batch_id
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't love adding this to the messages themselves since it feels like a leaky abstraction that's not relevant to the actual heartbeat, but it felt cleaner than updating all the handlers in Consensus to have a tuple with the batch_id and the message.

groups = Craft.Application.running_groups()

for group <- groups do
:gen_statem.cast(Craft.Application.lookup(group, Craft.Consensus), {:tick, tick_ref})
Copy link
Copy Markdown
Contributor Author

@enewbury enewbury Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach uses a centralized ticker (this process) rather than each group running its own heartbeat interval, and then we ask all the groups to generate their heartbeat message. This way, we can syncronize the heartbeats and send them all as one, rather than collecting them ad-hoc as they are generated and introducing some kind of wait before sending them out. (we also have a timeout in this method, but in the happy path, if it gets responses from everyone it immediately sends out the batch rather than waiting for the timeout)

Comment thread lib/craft/consensus.ex
messages =
state.members
|> Members.other_nodes()
|> Map.new(fn to_node ->
Copy link
Copy Markdown
Contributor Author

@enewbury enewbury Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be Task.async_stream anymore since we aren't actually doing side-effects, just creating structs. likely faster to just do it as enum.

@enewbury enewbury force-pushed the enewbury/batched-heartbeats branch from dcb7e5c to ffc0a9b Compare March 6, 2026 19:35
Comment thread lib/craft/consensus.ex
if membership_change.action == :remove && membership_change.node == node() do
data = LeaderState.transfer_leadership(data)
{data, messages} = heartbeat(data)
Message.send_heartbeats_now(data, messages)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still sometimes directly trigger sending a heartbeat from a consensus process, but I think these only happen at random times so not as likely to get thundering heart of traffic like we do from a heartbeat that happens all the time for all groups. I'd like some feedback on what situations this might be a problem in (i.e. adding a member to all groups at the same time?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant