Auto-Heal Behavior in RabbitMQ 4.0.9 (3-Node Cluster) #14933

uk1988 · 2025-11-11T09:07:52Z

uk1988
Nov 11, 2025

Hej,

We're trying to better understand the auto-heal functionality in RabbitMQ.
We are running a 3-node RabbitMQ cluster (v4.0.9) in an air-gapped environment that must survive node failures, such as unplanned shutdowns or complete network partitions. After extensive automated failover testing, we've determined that auto-heal mode is the best option for our use case. However, it exhibits one unexpected behavior.
Our main queues are quorum queues, and the application is a web frontend.

Here's the scenario:

VM with Node 1 goes down unexpectedly.
- → Users connected to Node 1 are notified of the failover (expected behavior).
After a few minutes, Node 1 rejoins the cluster, triggering auto-heal.
At this point, the user is now connected to Node 2, which has been running normally alongside Node 3.
During auto-heal, RabbitMQ determines that Node 3 represents the "winning" partition and restarts Node 2.

This is the part we don't understand:
Why does RabbitMQ restart Node 2, when both Node 2 and Node 3 were operational and in full contact during the outage?
Any insight into the auto-heal partition resolution logic would be greatly appreciated.

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Auto-Heal Behavior in RabbitMQ 4.0.9 (3-Node Cluster) #14933

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Auto-Heal Behavior in RabbitMQ 4.0.9 (3-Node Cluster) #14933

Uh oh!

uk1988 Nov 11, 2025

Replies: 0 comments

uk1988
Nov 11, 2025