You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ja_JP/design/durable-storage.md
+29-33Lines changed: 29 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
# Design for Durable Storage
2
2
3
-
EMQX 6.0 introduces Optimized Durable Storage (DS), a purpose-built application designed to ensure high reliability and persistence for MQTT message delivery. DS combines the strengths of a streaming service (like Kafka) and a key-value store, providing a robust, highly optimized foundation for storing, replaying, and managing MQTT data.
3
+
EMQX 6.0 introduces Optimized Durable Storage (DS), a purpose-built database abstraction layer designed to ensure high reliability and persistence for MQTT message delivery. DS combines the strengths of a streaming service (like Kafka) and a key-value store, providing a robust, highly optimized foundation for storing, replaying, and managing MQTT data.
4
4
5
5
## Architecture: Backends and Storage Hierarchy
6
6
7
-
Durable Storage is implementation-agnostic, using a backend layer to allow data to be stored across different database management systems.
7
+
Durable Storage is implementation-agnostic, using a backend concept to allow data to be stored across different database management systems.
8
8
9
9
### Embedded Backends
10
10
@@ -15,6 +15,10 @@ EMQX provides two embedded backends that do not rely on third-party services:
15
15
16
16
### Data Storage Hierarchy
17
17
18
+
The database storage engine powering EMQX's built-in durability facilities organizes data into a hierarchical structure. The following figure illustrates how the durable storage databases are distributed across an EMQX cluster:
Internally, DS organizes data into a multi-layered hierarchy designed for both horizontal scalability and temporal partitioning. The structure is transparent to applications and ensures efficient data management across distributed EMQX nodes.
19
23
20
24
The complete DS hierarchy can be represented as follows:
The top-level logical container for data. Each DS database is independent and manages its own shards, slabs, and streams, and it can be created, managed, and dropped as needed. For instance:
53
+
A database is the top-level logical container for data. Each DS database is independent and manages its own shards, slabs, and streams, and it can be created, managed, and dropped as needed. For instance:
50
54
51
55
-**Sessions DB** stores durable session states.
52
56
-**Messages DB** holds the corresponding MQTT message data.
53
57
54
-
A single EMQX cluster may host multiple DS databases.
58
+
A single EMQX cluster can host multiple DS databases.
55
59
56
60
#### Shard
57
61
58
-
A horizontal partition of the database. Data from different MQTT clients or topics is distributed across shards to support parallelism and high availability.
62
+
A shard is the horizontal partition of a durable storage database. Data is distributed across shards based on the publisher's client ID, enabling parallel processing and high availability. Each EMQX node can host one or more shards, and the total number of shards is determined by [n_shards](../durability/managing-replication.md#number-of-shards) configuration parameter during the initial startup of EMQX.
59
63
60
-
Each EMQX node can host one or more shards, and shards can be replicated across nodes for redundancy.
64
+
Shards also serve as the fundamental unit of replication. Each shard is replicated across multiple nodes according to the `durable_storage.messages.replication_factor` setting, ensuring that all replicas maintain identical message sets for redundancy and fault tolerance.
61
65
62
66
#### Generation
63
67
64
-
A logicalpartition of the database by time. Data written at different times may be assigned to different generations. Periodically creating new generations serves several main purposes:
68
+
A generation is a logical, time-based partition of the database. Data written during different time periods is grouped into separate generations. New messages are always written to the current generation, while older generations become immutable and read-only. EMQX periodically creates new generations for several main purposes:
65
69
66
70
1.**Backward compatibility and data migrations:** New data is appended to new generations, possibly with improved encoding, while old generations remain immutable and read-only.
67
-
2.**Time-based data retention:** Since each generation covers a specific time period, old data can be removed by dropping entire generations.
71
+
2.**Time-based data retention:** Because each generation corresponds to a specific time range, expired data can be efficiently removed by dropping entire generations.
72
+
73
+
Although conceptually related to slabs, generations are not physical storage units. Instead, they define temporal boundaries that organize slabs within each shard.
68
74
69
-
Although conceptually related to slabs, generations are not physical containers. They serve as temporal boundaries that organize slabs within each shard.
75
+
Generations may also differ in how they internally structure and store data, depending on the configured storage layout. Currently, DS supports a single layout optimized for high-throughput wildcard and single-topic subscriptions. Future releases will introduce additional layouts designed for different workloads. The layout used for new generations is configured via the `durable_storage.messages.layout` parameter, with each layout engine providing its own set of configuration options.
70
76
71
77
#### Slab
72
78
73
-
A physical partition of data identified by both shard ID and generation ID. Each slab acts as a durable container for one or more streams. All data in a slab share the same encoding schema, and writes are atomic, eliminating the need for extra metadata.
79
+
A slab is a physical partition of data identified by both shard ID and generation ID. Each slab acts as a durable container for one or more durable storage streams. All data in a slab shares the same encoding schema, eliminating the need for storing extra metadata. Atomicity and consistency properties are guaranteed within a slab.
74
80
75
81
Example: `shard 2, gen 3` represents a distinct slab that stores all streams written during that generation’s time range.
76
82
77
83
#### Stream
78
84
79
-
Logical units of batching and serialization inside each slab. Streams group **Topic–Timestamp–Value (TTV)** triples with similar structures, allowing data to be read in time-ordered, deterministic chunks.
85
+
A durable storage stream is a logical unit of batching and serialization inside each slab. Streams group **Topic–Timestamp–Value (TTV)** triples with similar topic structures, allowing data to be read in time-ordered, deterministic chunks. A single durable storage stream may contain messages from multiple topics, and different storage layouts may apply different strategies for mapping topics into streams.
80
86
81
-
Streams are also the unit of subscription and iteration in DS, enabling efficient handling of wildcard topic filters.
87
+
Durable storage streams are also the fundamental unit of subscription and iteration in Durable Storage, enabling efficient handling of wildcard topic filters and consistent replay of ordered data. [Durable sessions](../durability/durability_introduction.md#durable-sessions) read messages from streams in batches, with the batch size controlled by the `durable_sessions.batch_size` configuration parameter.
82
88
83
89
#### Topic–Timestamp–Value
84
90
85
-
The minimal storage unit, representing a single MQTT record. Each TTV includes:
91
+
The minimal storage unit, representing a single MQTT message. Each TTV includes:
86
92
87
93
-**Topic:** Follows MQTT semantics.
88
94
89
95
-**Timestamp:** Write time or logical ordering key.
90
96
91
-
-**Value:**an arbitrary binary blob.
97
+
-**Value:**An arbitrary binary blob.
92
98
93
99
## Write Path
94
100
@@ -154,24 +160,14 @@ Both pools group subscribers by stream and topic, reusing resources to serve mul
## Applications: Durable Sessions and Shared Subscriptions
158
-
159
-
Durable Storage is the backbone for EMQX's advanced reliability features:
160
-
161
-
### Durable Sessions (EMQX 5+)
162
-
163
-
Durable sessions are a parallel session implementation that uses DS for message routing.
164
-
165
-
-**Mechanism:** When a client connects with a session expiry interval greater than zero and subscribes to a topic, the filter is marked as durable. Messages published to matching topics are saved to DS *in addition* to being dispatched.
166
-
-**State:** Durable sessions access saved messages via the DS subscription mechanism. Their state includes a set of iterators for each matching stream, allowing them to precisely track their progress. Only one copy of each message is stored per database replica, regardless of how many durable sessions share it.
167
-
168
-
### Shared Subscriptions (EMQX 6.0)
163
+
## Conclusion: The Foundation of High-Reliability MQTT
169
164
170
-
EMQX 6.0 extended DS to shared subscriptions for enhanced load balancing and reliability.
165
+
The Optimized Durable Storage in EMQX 6.0 is the resilient foundation for high-reliability MQTT messaging. By re-engineering RocksDB and embedding concepts like TTVs and Streams, DS provides a purpose-built, highly available, and persistent internal database. This architecture, coupled with sophisticated features like the LTS algorithm and Raft replication, ensures lossless message delivery and optimal retrieval for complex wildcard and shared subscriptions, solidifying EMQX's position as a leading solution for demanding IoT infrastructure.
171
166
172
-
-**Iterator Management:** The iterator sets for shared subscriptions are managed by a separate entity called the **shared sub leader**.
173
-
-**Replay and Rebalancing:** Sessions subscribing to a shared topic communicate with the leader, which **lends them iterators** for message replay. Updated iterators are reported back. If a client disconnects or the group is rebalanced, the leader **revokes the iterators** and redistributes them to other members, ensuring consumption continuity and load distribution.
167
+
## More Information
174
168
175
-
## Conclusion: The Foundation of High-Reliability MQTT
169
+
Durable Storage serves as the core data foundation for several high-reliability and persistence-related features in EMQX, providing unified storage, replay, and consistency guarantees for upper-layer functionality, including:
176
170
177
-
The Optimized Durable Storage in EMQX 6.0 is the resilient foundation for high-reliability MQTT messaging. By re-engineering RocksDB and embedding concepts like TTVs and Streams, DS provides a purpose-built, highly available, and persistent internal database. This architecture, coupled with sophisticated features like the LTS algorithm and Raft replication, ensures lossless message delivery and optimal retrieval for complex wildcard and shared subscriptions, solidifying EMQX's position as a leading solution for demanding IoT infrastructure.
171
+
-**[MQTT Durable Sessions](../durability/durability_introduction.md)**: A DS-based mechanism for persisting session state and undelivered messages.
172
+
-**[Message Queue](../message-queue/message-queue-concept.md)**: A built-in message queueing feature that provides ordered message delivery, message replay, and high availability across the EMQX cluster.
173
+
-**[Shared Subscription](../messaging/mqtt-shared-subscription.md)**: A load-balancing subscription mechanism that distributes messages among multiple subscribers in the same group.
0 commit comments