-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add new force flag to DemotePrimary to force a demotion even when blocked on waiting for semi-sync acks
#18714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mergencyReparentShard`. During `EmergencyReparentShard` we don't want to wait for the old primary to get acks for all pending changes, because we're performing a demotation at the same time as unlinking existing replicas from the primary. If all replicas are unlinked while commits are still pending, the primary will never be able to receive the semi sync acks to be unblocked, causing ERS to never be able to complete. Signed-off-by: Arthur Schreiber <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #18714 +/- ##
==========================================
- Coverage 69.77% 69.76% -0.02%
==========================================
Files 1608 1608
Lines 214865 214908 +43
==========================================
Hits 149922 149922
- Misses 64943 64986 +43 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
force flag to DemotePrimary to force a failover during `E…force flag to DemotePrimary to force a failover during EmergencyReparentShard
…/add-forced-demotion Signed-off-by: Arthur Schreiber <[email protected]>
Signed-off-by: Arthur Schreiber <[email protected]>
Signed-off-by: Arthur Schreiber <[email protected]>
shlomi-noach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things I've seen:
- When primary is blocked on semi-sync, you sometimes can't create a new connection. This shouldn't strictly happen, but it does, on my local sandbox
- When primary is blocked, setting
rpl_semi_sync_source_enabled = 0should be fine and has the intended effect.
Another approach would be to actively kill queries/transactions that are waiting on semi-sync, but frankly that in itself could be blocking.
My experience is that when things are blocked on semi sync, the kill flag is not actively probed and the query can't be killed. 😞 |
Do you mean a new connection through the connection pool? Or a new connection against MySQL? The connection pool won't allow opening a new connection if all slots are used up (unlikely, but can happen if other work gets stuck waiting too and we hit the limit). Connections to MySQL should be allowed as long as we don't hit the overall connection limit, no? |
A new connection against MySQL. Tested on my local dev host using
|
Yes, I agree! And yet this is the behavior I'm seeing locally. It surprised me, too. I don't have recollection of such behavior. |
Signed-off-by: Arthur Schreiber <[email protected]>
Signed-off-by: Arthur Schreiber <[email protected]>
force flag to DemotePrimary to force a failover during EmergencyReparentShardforce flag to DemotePrimary to force a demotion even when blocked on waiting for semi-sync acks
Signed-off-by: Arthur Schreiber <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
This reverts commit f2622c4. Signed-off-by: Tim Vaillancourt <[email protected]>
714a20a to
a805c4a
Compare
| var primaryStatus *replicationdatapb.PrimaryStatus | ||
|
|
||
| primaryStatus, err = tmc.DemotePrimary(groupCtx, tabletInfo.Tablet) | ||
| primaryStatus, err = tmc.DemotePrimary(groupCtx, tabletInfo.Tablet, durability.HasSemiSync() /* force */) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that this impacts PRS as well as ERS.
| } | ||
|
|
||
| func (mysqld *Mysqld) IsSemiSyncBlocked(ctx context.Context) (bool, error) { | ||
| conn, err := getPoolReconnect(ctx, mysqld.dbaPool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I don't think we should use DBA everywhere here, but it does line up with the other uses. And using a single connection for all of the work might be a good idea here too. Not related though so we can investigate this more separately.
| return conn.Conn.SemiSyncExtensionLoaded() | ||
| } | ||
|
|
||
| func (mysqld *Mysqld) IsSemiSyncBlocked(ctx context.Context) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we have another method to check this when the monitor exposes methods for this as well.
Signed-off-by: Arthur Schreiber <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Setting this to be based on the durability policy might actually end up being wrong, because that reflects what the durability policy should be _after_ the failover, but `DemotePrimary` needs to operate based on the _current_ state. Signed-off-by: Arthur Schreiber <[email protected]>
…/add-forced-demotion Signed-off-by: Arthur Schreiber <[email protected]>
Signed-off-by: Arthur Schreiber <[email protected]>
mattlord
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just the one minor comment. Thank you for working on this @arthurschreiber and @timvaillancourt ! ❤️
| var primaryStatus *replicationdatapb.PrimaryStatus | ||
|
|
||
| primaryStatus, err = tmc.DemotePrimary(groupCtx, tabletInfo.Tablet) | ||
| primaryStatus, err = tmc.DemotePrimary(groupCtx, tabletInfo.Tablet, true /* force */) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, stopReplicationAndBuildStatusMaps appears to be used only for ERS. The function does not appear to be specific to ERS though. So IMO it would be better to make this explicitly tied to ERS. Meaning:
- This function takes a new bool for
forceDemotion - It then passes that on to DemotePrimary
It's not required, but might potentially prevent future unintended behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could move this function from replication.go to emergency_reparenter.go, as that's the only place it's used from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that in a follow up PR, just to make sure the diffs are easier to read (and so this doesn't require another review round on this PR).
…/add-forced-demotion Signed-off-by: Arthur Schreiber <[email protected]>
… blocked on waiting for semi-sync acks (vitessio#18714) Signed-off-by: Arthur Schreiber <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Signed-off-by: siddharth16396 <[email protected]>
Description
During
EmergencyReparentShardwe don't want to wait for the old primary to get acks for all pending changes, because we're performing a demotion at the same time as unlinking existing replicas from the primary. If all replicas are unlinked while commits are still pending, the primary will never be able to receive the semi sync acks to be unblocked, causing the demoted host to stick around as aPRIMARY.The old primary won't directly cause any issues, as vtgates will notice that it's
PrimaryTermStartTimewill be lower than the newly elected primary's time, so traffic will be cut off.But
vtorcwill notice that an unhealthy primary is still around and will continuously try to demote it.The tablet itself will also notice that there's a different tablet that has assumed the
PRIMARYrole, and will try to demote itself as well.Both of these attempts will continuously fail if there's any pending writes that are stuck waiting for a semi-sync ack, as without any replicas, the old primary won't receive any acks. These pending writes will cause other operations like setting the primary to read-only to fail (because the read-only change requires a lock that can't be granted while semi-sync acks are pending).
This PR tries to resolve this behavior by adding a new option to the
DemotePrimaryRequestgrpc call that allows "forcing" a demotion by detecting whether the old primary is blocked waiting for semi-sync writes, and if it is, then simply disabling the "source" side of semi-sync.Disabling the "source" side of semi-sync will allow all pending commits to be written, unblocking other operations. This will very likely cause errant GTIDs on the primary, as those commits won't have been acknowledged by any of the replicas.
These errant GTIDs will be noticed by
vtorcand the demoted primary will be moved intoDRAINtype eventually.Because of an unrelated issue (#18763), this won't directly help with the initial call to ERS, but it should allow old primary tablets to properly demote themselves once they notice that a different tablet has been promoted.
Details
Here's my understanding on how a semi-sync blocked primary will behave when
DemotePrimarygets called with these new changes:The primary will be set to non-serving. This operation can take up to
--shutdown-grace-period. For queries/transactions that haven't finished by the time--shutdown-grace-periodhas passed, they should return an error to clients (even if at that point they might still be stuck on the MySQL side waiting for a semi-sync side). This is important because that means we never acknowledge that these writes have happened to the client that tried to perform this write. At the same time, no new reads can happen.We check if a query / transaction is blocked on semi-sync. If it is, and we're forcing a demotion, we will disable source side semi-sync. This will unblock all of the blocked queries / transactions, and very likely lead to the new primary having one or more errant gtids.
super_read_onlyis enabled.the latest GTID of the demoted primary is returned.
Reproduction
It took me a while to come up with a test case that demonstrates the issue, and shows that this fix works.
Here's one way how to reproduce the problem:
EmergencyReparentShardgets called to elect a new primary and fix the situation.--wait-replicas-timeouthas passed. This is important as there seems to be a separate bug where a leftover goroutine tries to executeSetReplicationSourceon the old primary even afterEmergencyReparentShardhas finished executing.DemotePrimary.PRIMARYmode (but is no longer serving).vtorcis setup, it will continue to try and demote the primary viaDemotePrimary(which will continue failing).With the new force option, the self-demotion of the old primary will skip the semi-sync unblock step and instead disable the primary side semi sync to unblock any commits waiting for an ack. The primary will demote itself and switch itself to a
REPLICA. It will fail re-attaching to the cluster, because of errant GTIDS, and will stay inNOT_SERVINGmode untilvtorcnotices the errant GTIDs and puts it intoDRAINED.Related Issue(s)
Checklist
Deployment Notes
AI Disclosure