You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: en_US/design/durable-storage.md
+16-10Lines changed: 16 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,10 @@ EMQX provides two embedded backends that do not rely on third-party services:
15
15
16
16
### Data Storage Hierarchy
17
17
18
+
The database storage engine powering EMQX's built-in durability facilities organizes data into a hierarchical structure. The following figure illustrates how the durable storage databases are distributed across an EMQX cluster:
Internally, DS organizes data into a multi-layered hierarchy designed for both horizontal scalability and temporal partitioning. The structure is transparent to applications and ensures efficient data management across distributed EMQX nodes.
19
23
20
24
The complete DS hierarchy can be represented as follows:
@@ -26,7 +30,7 @@ flowchart TB
26
30
SH["Shard"]
27
31
GEN(["Generation (logical, time-based)"])
28
32
SL["Slab (physical container)"]
29
-
ST["Stream"]
33
+
ST["Durable Storage Stream"]
30
34
TTV["Topic–Timestamp–Value (TTV)"]
31
35
32
36
%% Main hierarchy
@@ -46,7 +50,7 @@ flowchart TB
46
50
47
51
#### Database (DB)
48
52
49
-
The top-level logical container for data. Each DS database is independent and manages its own shards, slabs, and streams, and it can be created, managed, and dropped as needed. For instance:
53
+
A database is the top-level logical container for data. Each DS database is independent and manages its own shards, slabs, and streams, and it can be created, managed, and dropped as needed. For instance:
50
54
51
55
-**Sessions DB** stores durable session states.
52
56
-**Messages DB** holds the corresponding MQTT message data.
@@ -55,30 +59,32 @@ A single EMQX cluster can host multiple DS databases.
55
59
56
60
#### Shard
57
61
58
-
A horizontal partition of the database. Data from different MQTT clients is distributed across shards to support parallelism and high availability.
62
+
A shard is the horizontal partition of a durable storage database. Data is distributed across shards based on the publisher's client ID, enabling parallel processing and high availability. Each EMQX node can host one or more shards, and the total number of shards is determined by [n_shards](./managing-replication.md#number-of-shards) configuration parameter during the initial startup of EMQX.
59
63
60
-
Each EMQX node can host one or more shards, and shards can be replicated across nodes for redundancy.
64
+
Shards also serve as the fundamental unit of replication. Each shard is replicated across multiple nodes according to the `durable_storage.messages.replication_factor` setting, ensuring that all replicas maintain identical message sets for redundancy and fault tolerance.
61
65
62
66
#### Generation
63
67
64
-
A logicalpartition of the database by time. Data written at different times may be assigned to different generations. Periodically creating new generations serves several main purposes:
68
+
A generation is a logical, time-based partition of the database. Data written during different time periods is grouped into separate generations. New messages are always written to the current generation, while older generations become immutable and read-only. EMQX periodically creates new generations for several main purposes:
65
69
66
70
1.**Backward compatibility and data migrations:** New data is appended to new generations, possibly with improved encoding, while old generations remain immutable and read-only.
67
-
2.**Time-based data retention:** Since each generation covers a specific time period, old data can be removed by dropping entire generations.
71
+
2.**Time-based data retention:** Because each generation corresponds to a specific time range, expired data can be efficiently removed by dropping entire generations.
72
+
73
+
Although conceptually related to slabs, generations are not physical storage units. Instead, they define temporal boundaries that organize slabs within each shard.
68
74
69
-
Although conceptually related to slabs, generations are not physical containers. They serve as temporal boundaries that organize slabs within each shard.
75
+
Generations may also differ in how they internally structure and store data, depending on the configured storage layout. Currently, DS supports a single layout optimized for high-throughput wildcard and single-topic subscriptions. Future releases will introduce additional layouts designed for different workloads. The layout used for new generations is configured via the `durable_storage.messages.layout` parameter, with each layout engine providing its own set of configuration options.
70
76
71
77
#### Slab
72
78
73
-
A physical partition of data identified by both shard ID and generation ID. Each slab acts as a durable container for one or more streams. All data in a slab shares the same encoding schema, eliminating the need for storing extra metadata. Atomicity and consistency properties are guaranteed within a slab.
79
+
A slab is a physical partition of data identified by both shard ID and generation ID. Each slab acts as a durable container for one or more streams. All data in a slab shares the same encoding schema, eliminating the need for storing extra metadata. Atomicity and consistency properties are guaranteed within a slab.
74
80
75
81
Example: `shard 2, gen 3` represents a distinct slab that stores all streams written during that generation’s time range.
76
82
77
83
#### Stream
78
84
79
-
Logical units of batching and serialization inside each slab. Streams group **Topic–Timestamp–Value (TTV)** triples with similar topics, allowing data to be read in time-ordered, deterministic chunks.
85
+
A durable storage stream is a logical unit of batching and serialization inside each slab. Streams group **Topic–Timestamp–Value (TTV)** triples with similar topic structures, allowing data to be read in time-ordered, deterministic chunks. A single durable storage stream may contain messages from multiple topics, and different storage layouts may apply different strategies for mapping topics into streams.
80
86
81
-
Streams are also the unit of subscription and iteration in DS, enabling efficient handling of wildcard topic filters.
87
+
Durable storage streams are also the fundamental unit of subscription and iteration in Durable Storage, enabling efficient handling of wildcard topic filters and consistent replay of ordered data. [Durable sessions](../durability/durability_introduction.md#durable-sessions) read messages from streams in batches, with the batch size controlled by the `durable_sessions.batch_size` configuration parameter.
Copy file name to clipboardExpand all lines: en_US/durability/durability_introduction.md
+2-49Lines changed: 2 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -159,56 +159,9 @@ Even if durable sessions are not enabled, following steps 2-4 will still retain
159
159
160
160
## Durable Storage Architecture
161
161
162
-
The database engine powering EMQX's built-in durability facilities organizes data into a hierarchical structure. The following figure illustrates how the durable storage databases are distributed across an EMQX cluster:
162
+
Durable Sessions rely on the Durable Storage for persisting session state and messages. To understand how this storage layer is structured and operates, refer to the *Architecture: Backends and Storage Hierarchy* section in [Design for Durable Storage](../design/durable-storage.md).
163
163
164
-

165
-
166
-
### Database (DS)
167
-
168
-
The top-level logical container for data. Each DS database is independent and manages its own shards, slabs, and streams, and it can be created, managed, and dropped as needed. For instance:
169
-
170
-
-**Sessions DB** stores durable session states.
171
-
-**Messages DB** holds the corresponding MQTT message data.
172
-
173
-
A single EMQX cluster can host multiple DS databases.
174
-
175
-
### Shard
176
-
177
-
Messages are segregated by clients and stored in shards based on the publisher's client ID. The number of shards is determined by [n_shards](./managing-replication.md#number-of-shards) configuration parameter during the initial startup of EMQX. A shard is also a unit of replication. Each shard is consistently replicated the number of times specified by `durable_storage.messages.replication_factor` across different nodes, ensuring identical message sets in each replica.
178
-
179
-
### Generation
180
-
181
-
A generation is a logical partition of the database by time. Messages are segmented into generations corresponding to specific time frames. New messages are written to the current generation, while previous generations are read-only. EMQX cleans up old MQTT messages by deleting old generations in their entirety. The retention period for old MQTT messages is determined by the `durable_sessions.message_retention_period` parameter.
182
-
183
-
Generations can organize data differently according to the storage layout specification. Currently, only one layout is supported, optimized for high throughput of wildcard and single-topic subscriptions. Future updates will introduce layouts optimized for different workloads.
184
-
185
-
The storage layout for new generations is configured by the `durable_storage.messages.layout` parameter, with each layout engine defining its own configuration parameters.
186
-
187
-
### Slab
188
-
189
-
A slab is a physical partition of data identified by both shard ID and generation ID. Each slab acts as a durable container for one or more streams. All data in a slab share the same encoding schema, and writes are atomic, eliminating the need for extra metadata.
190
-
191
-
Example: `shard 2, gen 3` represents a distinct slab that stores all streams written during that generation’s time range.
192
-
193
-
### Stream
194
-
195
-
A stream is a logical unit of batching and serialization inside each slab. Streams group **Topic–Timestamp–Value (TTV)** triples with similar topics, allowing data to be read in time-ordered, deterministic chunks.
196
-
197
-
Streams can contain messages from multiple topics. Various storage layouts can employ different strategies for mapping topics into streams.
198
-
199
-
Durable sessions fetch messages in batches from the streams, with batch size adjustable via the `durable_sessions.batch_size` parameter.
200
-
201
-
### Topic–Timestamp–Value
202
-
203
-
A TTV is a minimal storage unit, representing a single MQTT record. Each TTV includes:
204
-
205
-
-**Topic:** Follows MQTT semantics.
206
-
207
-
-**Timestamp:** Write time or logical ordering key.
208
-
209
-
-**Value:** an arbitrary binary blob.
210
-
211
-
## Durable Storages Across Cluster
164
+
## Durable Storage Across Cluster
212
165
213
166
Each node within an EMQX cluster is assigned a unique *Site ID*, which serves as a stable identifier, independent of the Erlang node name (`emqx@...`). Site IDs are persistent, and they are randomly generated at the first startup of the node. This stability maintains the integrity of the data, especially in scenarios where nodes might undergo name modifications or reconfigurations.
0 commit comments