-
Notifications
You must be signed in to change notification settings - Fork 621
fix txn replay for fuzzy region #1412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
233a1f0 to
a34033c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes transaction replay issues during fuzzy region processing in the AOF (Append-Only File) recovery mechanism. The main problem addressed is that the previous implementation could intermingle operations from different transactions with the same session ID, and lacked proper locking during replay which could expose partial transaction results to readers.
Key changes:
- Introduced three new classes (
TransactionGroup,AofReplayContext,AofReplayCoordinator) to properly track and coordinate transaction replay operations - Refactored AOF recovery logic to use the new coordinator pattern with proper transaction isolation
- Updated test infrastructure to support testing both main store and object store replication scenarios
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
| libs/server/AOF/ReplayCoordinator/TransactionGroup.cs | New class to track operations for individual transactions |
| libs/server/AOF/ReplayCoordinator/AofReplayContext.cs | New class to maintain replay state including fuzzy region buffers and active transactions |
| libs/server/AOF/ReplayCoordinator/AofReplayCoordinator.cs | New coordinator class implementing transaction tracking, buffering, and replay logic with proper locking |
| libs/server/AOF/AofProcessor.cs | Refactored to use the new coordinator pattern; simplified recovery logic and removed inline transaction handling |
| test/Garnet.test.cluster/ClusterTestContext.cs | Added helper methods SimplePopulateDB and SimpleValidateDB to streamline test database operations |
| test/Garnet.test.cluster/ReplicationTests/ClusterReplicationBaseTests.cs | Updated tests to use new helper methods and support both object store and main store validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This PR fixes couple of issues:
In that situation, our current solution might intermingle operations between two different transactions from the same session.
In this PR, I am adding the following classes:
Keeps track of the list of transaction operations
Keeps track of the replay context and associated objects/buffers used to replay individual operations and transactions.
Implements the mechanism to keep track of active transactions, gather transaction operations and coordinate transaction replay.