-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Motivation
Currently, Druid can serve partial result sets unbeknownst to the user. This can occur due to many reasons:
- Data node crash/failure/unavailability
- Broker missing announcements from historicals
- Other synchronization issues
Proposed changes
The core issue of this problem is because Brokers automatically remove segments from their timeline when they receive drop notifications from the data nodes. This removes the ability to audit and fail queries if an complete timeline is being served (because parts of it are missing).
Segment-level changes
To solve this, introduce concept of a "queryable" segment in Broker timeline. A segment is marked as "queryable" when it is loaded by at least one data node. Once a segment is marked as loaded, it remains as such until it is marked as unused, metadata is reset, the cluster is re-deployed with downtime, or other similar situations. The loaded property is maintained in memory by the coordinator. A diagram of a segment's lifetime is described below:
Broker/Coordinator changes
To fetch the latest used status for segments, the broker will do an initial full sync, followed by periodic delta syncs, with the coordinator to keep its knowledge of what segments are used/queryable up-to-date.
Queries that touch segments marked as queryable will fail if no announcing servers are found.
In order to mark a segment as queryable in the timeline, the broker needs to hear both sync callbacks from data node/coordinator that the segment has loaded. This extra sync with the coordinator is needed because broker needs direct confirmation the segment is loaded as well as that it is currently marked as used in the cluster.
The broker will place the segment in its timeline for any given callback/sync (historical/coordinator) but will ONLY mark the segment as queryable once:
- A data (historical/peon) node has given a loaded callback for the segment
- A sync from the coordinator shows the segment as
usedand loaded onto a node.
The following segment movement cases outline the "successful" scenarios.
Segment Creation/Loading
- Newly created segment
Sloading for first time on historical server A
T0: Broker waits until it receives announcement callback from coordinator (interchangeable with T1)
T1: Broker waits until it receives loaded callback from historical (interchangeable with T0)
T2: Broker marksSas queryable - Previously created segment S loading on server A
T0: Broker waits until it receives announcement callback from coordinator (interchangeable with T1)
T1: Broker waits until it receives loaded callback from historical (interchangeable with T0)
T2: Broker marksSas queryable
Segment Deletion/Drop on Data Node
- Server A removed segment because it was dropped (unused)
T0: Broker
Case 2: Server A removed segment because it was moved (used) => keep in timeline (assert there are n > 0 servers serving it, otherwise that's a race and should be fixed).
Case 3: Server removed segment because its replication factor was changed (used) => keep in timeline (assert there are n > 0 servers serving it, otherwise that's a race and should be fixed).
Case 4: Server removed segment because the server died/stopped responding (used) => keep in timeline