Skip to content

Commit 3acb421

Browse files
committed
Sanitize min_write_buffer_number_to_merge to 1 with atomic_flush (#10773)
Summary: With current implementation, within the same RocksDB instance, all column families with non-empty memtables will be scheduled for flush if RocksDB determines that any column family needs to be flushed, e.g. memtable full, write buffer manager, etc., if atomic flush is enabled. Not doing so can lead to data loss and inconsistency when WAL is disabled, which is a common setting when atomic flush is enabled. Therefore, setting a per-column-family knob, min_write_buffer_number_to_merge to a value greater than 1 is not compatible with atomic flush, and should be sanitized during column family creation and db open. Pull Request resolved: #10773 Test Plan: Reproduce: D39993203 has detailed steps. Run the test with and without the fix. Reviewed By: cbi42 Differential Revision: D40077955 Pulled By: cbi42 fbshipit-source-id: 451a9179eb531ac42eaccf40b451b9dec4085240
1 parent 01cd296 commit 3acb421

File tree

3 files changed

+22
-1
lines changed

3 files changed

+22
-1
lines changed

HISTORY.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@
99
* Fixed an optimistic transaction validation bug caused by DBImpl::GetLatestSequenceForKey() returning non-latest seq for merge (#10724).
1010
* Fixed a bug in iterator refresh which could segfault for DeleteRange users (#10739).
1111

12+
### Behavior Changes
13+
* Sanitize min_write_buffer_number_to_merge to 1 if atomic flush is enabled to prevent unexpected data loss when WAL is disabled in a multi-column-family setting (#10773).
14+
1215
## 7.7.0 (09/18/2022)
1316
### Bug Fixes
1417
* Fixed a hang when an operation such as `GetLiveFiles` or `CreateNewBackup` is asked to trigger and wait for memtable flush on a read-only DB. Such indirect requests for memtable flush are now ignored on a read-only DB.

db/column_family.cc

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,21 @@ ColumnFamilyOptions SanitizeOptions(const ImmutableDBOptions& db_options,
233233
result.min_write_buffer_number_to_merge = 1;
234234
}
235235

236+
if (db_options.atomic_flush && result.min_write_buffer_number_to_merge > 1) {
237+
ROCKS_LOG_WARN(
238+
db_options.logger,
239+
"Currently, if atomic_flush is true, then triggering flush for any "
240+
"column family internally (non-manual flush) will trigger flushing "
241+
"all column families even if the number of memtables is smaller "
242+
"min_write_buffer_number_to_merge. Therefore, configuring "
243+
"min_write_buffer_number_to_merge > 1 is not compatible and should "
244+
"be satinized to 1. Not doing so will lead to data loss and "
245+
"inconsistent state across multiple column families when WAL is "
246+
"disabled, which is a common setting for atomic flush");
247+
248+
result.min_write_buffer_number_to_merge = 1;
249+
}
250+
236251
if (result.num_levels < 1) {
237252
result.num_levels = 1;
238253
}

include/rocksdb/advanced_options.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,10 @@ struct AdvancedColumnFamilyOptions {
271271
// read amplification because a get request has to check in all of these
272272
// files. Also, an in-memory merge may result in writing lesser
273273
// data to storage if there are duplicate records in each of these
274-
// individual write buffers. Default: 1
274+
// individual write buffers.
275+
// If atomic flush is enabled (options.atomic_flush == true), then this
276+
// option will be sanitized to 1.
277+
// Default: 1
275278
int min_write_buffer_number_to_merge = 1;
276279

277280
// DEPRECATED

0 commit comments

Comments
 (0)