Add new `force` flag to `DemotePrimary` to force a demotion even when blocked on waiting for semi-sync acks #18714

arthurschreiber · 2025-10-02T23:30:34Z

Description

During EmergencyReparentShard we don't want to wait for the old primary to get acks for all pending changes, because we're performing a demotion at the same time as unlinking existing replicas from the primary. If all replicas are unlinked while commits are still pending, the primary will never be able to receive the semi sync acks to be unblocked, causing the demoted host to stick around as a PRIMARY.

The old primary won't directly cause any issues, as vtgates will notice that it's PrimaryTermStartTime will be lower than the newly elected primary's time, so traffic will be cut off.

But vtorc will notice that an unhealthy primary is still around and will continuously try to demote it.
The tablet itself will also notice that there's a different tablet that has assumed the PRIMARY role, and will try to demote itself as well.

Both of these attempts will continuously fail if there's any pending writes that are stuck waiting for a semi-sync ack, as without any replicas, the old primary won't receive any acks. These pending writes will cause other operations like setting the primary to read-only to fail (because the read-only change requires a lock that can't be granted while semi-sync acks are pending).

This PR tries to resolve this behavior by adding a new option to the DemotePrimaryRequest grpc call that allows "forcing" a demotion by detecting whether the old primary is blocked waiting for semi-sync writes, and if it is, then simply disabling the "source" side of semi-sync.

Disabling the "source" side of semi-sync will allow all pending commits to be written, unblocking other operations. This will very likely cause errant GTIDs on the primary, as those commits won't have been acknowledged by any of the replicas.

These errant GTIDs will be noticed by vtorc and the demoted primary will be moved into DRAIN type eventually.

Because of an unrelated issue (#18763), this won't directly help with the initial call to ERS, but it should allow old primary tablets to properly demote themselves once they notice that a different tablet has been promoted.

Details

Here's my understanding on how a semi-sync blocked primary will behave when DemotePrimary gets called with these new changes:

The primary will be set to non-serving. This operation can take up to --shutdown-grace-period. For queries/transactions that haven't finished by the time --shutdown-grace-period has passed, they should return an error to clients (even if at that point they might still be stuck on the MySQL side waiting for a semi-sync side). This is important because that means we never acknowledge that these writes have happened to the client that tried to perform this write. At the same time, no new reads can happen.
We check if a query / transaction is blocked on semi-sync. If it is, and we're forcing a demotion, we will disable source side semi-sync. This will unblock all of the blocked queries / transactions, and very likely lead to the new primary having one or more errant gtids.
super_read_only is enabled.
the latest GTID of the demoted primary is returned.

Reproduction

It took me a while to come up with a test case that demonstrates the issue, and shows that this fix works.

Here's one way how to reproduce the problem:

The primary of a cluster loses connection to all semi sync acking replicas. New / in-flight writes will start blocking.
EmergencyReparentShard gets called to elect a new primary and fix the situation.
The old primary does not come back online before --wait-replicas-timeout has passed. This is important as there seems to be a separate bug where a leftover goroutine tries to execute SetReplicationSource on the old primary even after EmergencyReparentShard has finished executing.
Once the old primary notices that it's no longer supposed to be a primary, it will try to demote itself via DemotePrimary.
This is where things go sideways. The old primary can't be demoted, because the semi-sync unblock wait will timeout, as there are no replicas attached to the old primary to allow unblocking semi-sync. This makes the demotion fail / timeout, and the old primary stays in PRIMARY mode (but is no longer serving).
If vtorc is setup, it will continue to try and demote the primary via DemotePrimary (which will continue failing).

With the new force option, the self-demotion of the old primary will skip the semi-sync unblock step and instead disable the primary side semi sync to unblock any commits waiting for an ack. The primary will demote itself and switch itself to a REPLICA. It will fail re-attaching to the cluster, because of errant GTIDS, and will stay in NOT_SERVING mode until vtorc notices the errant GTIDs and puts it into DRAINED.

Related Issue(s)

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

Deployment Notes

AI Disclosure

…mergencyReparentShard`. During `EmergencyReparentShard` we don't want to wait for the old primary to get acks for all pending changes, because we're performing a demotation at the same time as unlinking existing replicas from the primary. If all replicas are unlinked while commits are still pending, the primary will never be able to receive the semi sync acks to be unblocked, causing ERS to never be able to complete. Signed-off-by: Arthur Schreiber <[email protected]>

vitess-bot · 2025-10-02T23:30:36Z

codecov · 2025-10-03T00:17:39Z

Codecov Report

❌ Patch coverage is 51.78571% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.76%. Comparing base (8230b8a) to head (3a4c5a7).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
go/vt/mysqlctl/replication.go	0.00%	15 Missing ⚠️
go/vt/vttablet/tabletmanager/rpc_replication.go	37.50%	10 Missing ⚠️
go/vt/vtcombo/tablet_map.go	0.00%	1 Missing ⚠️
go/vt/vttablet/faketmclient/fake_client.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #18714      +/-   ##
==========================================
- Coverage   69.77%   69.76%   -0.02%     
==========================================
  Files        1608     1608              
  Lines      214865   214908      +43     
==========================================
  Hits       149922   149922              
- Misses      64943    64986      +43

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…/add-forced-demotion Signed-off-by: Arthur Schreiber <[email protected]>

Signed-off-by: Arthur Schreiber <[email protected]>

shlomi-noach

Things I've seen:

When primary is blocked on semi-sync, you sometimes can't create a new connection. This shouldn't strictly happen, but it does, on my local sandbox
When primary is blocked, setting rpl_semi_sync_source_enabled = 0 should be fine and has the intended effect.

Another approach would be to actively kill queries/transactions that are waiting on semi-sync, but frankly that in itself could be blocking.

arthurschreiber · 2025-10-19T12:48:05Z

Another approach would be to actively kill queries/transactions that are waiting on semi-sync, but frankly that in itself could be blocking.

My experience is that when things are blocked on semi sync, the kill flag is not actively probed and the query can't be killed. 😞

arthurschreiber · 2025-10-19T12:49:45Z

When primary is blocked on semi-sync, you sometimes can't create a new connection. This shouldn't strictly happen, but it does, on my local sandbox

Do you mean a new connection through the connection pool? Or a new connection against MySQL?

The connection pool won't allow opening a new connection if all slots are used up (unlikely, but can happen if other work gets stuck waiting too and we hit the limit).

Connections to MySQL should be allowed as long as we don't hit the overall connection limit, no?

shlomi-noach · 2025-10-19T16:33:02Z

Do you mean a new connection through the connection pool? Or a new connection against MySQL?

A new connection against MySQL. Tested on my local dev host using dbdeployer and a mysql80 replication setup. Steps:

install plugin on primary
configure semi sync on primary (enable, timeout, number of acknowledgements)
create table t1(id int) - hangs
Try to open new connection to the primary - hangs.

shlomi-noach · 2025-10-19T16:34:25Z

Connections to MySQL should be allowed as long as we don't hit the overall connection limit, no?

Yes, I agree! And yet this is the behavior I'm seeing locally. It surprised me, too. I don't have recollection of such behavior.

Signed-off-by: Arthur Schreiber <[email protected]>

Signed-off-by: Tim Vaillancourt <[email protected]>

go/test/endtoend/reparent/emergencyreparent/ers_test.go

Signed-off-by: Tim Vaillancourt <[email protected]>

go/test/endtoend/reparent/emergencyreparent/ers_test.go

Signed-off-by: Tim Vaillancourt <[email protected]>

This reverts commit f2622c4. Signed-off-by: Tim Vaillancourt <[email protected]>

mattlord · 2025-11-18T23:24:43Z

go/vt/vtctl/reparentutil/replication.go

 				var primaryStatus *replicationdatapb.PrimaryStatus

-				primaryStatus, err = tmc.DemotePrimary(groupCtx, tabletInfo.Tablet)
+				primaryStatus, err = tmc.DemotePrimary(groupCtx, tabletInfo.Tablet, durability.HasSemiSync() /* force */)


Just noting that this impacts PRS as well as ERS.

go/test/endtoend/reparent/utils/utils.go

mattlord · 2025-11-18T23:40:35Z

go/vt/mysqlctl/replication.go

 }
+
+func (mysqld *Mysqld) IsSemiSyncBlocked(ctx context.Context) (bool, error) {
+	conn, err := getPoolReconnect(ctx, mysqld.dbaPool)


That's fair. I don't think we should use DBA everywhere here, but it does line up with the other uses. And using a single connection for all of the work might be a good idea here too. Not related though so we can investigate this more separately.

mattlord · 2025-11-18T23:51:02Z

go/vt/mysqlctl/replication.go

 	return conn.Conn.SemiSyncExtensionLoaded()
 }
+
+func (mysqld *Mysqld) IsSemiSyncBlocked(ctx context.Context) (bool, error) {


I don't like that we have another method to check this when the monitor exposes methods for this as well.

go/vt/vttablet/tabletmanager/rpc_replication.go

Signed-off-by: Arthur Schreiber <[email protected]>

Signed-off-by: Tim Vaillancourt <[email protected]>

Setting this to be based on the durability policy might actually end up being wrong, because that reflects what the durability policy should be _after_ the failover, but `DemotePrimary` needs to operate based on the _current_ state. Signed-off-by: Arthur Schreiber <[email protected]>

…/add-forced-demotion Signed-off-by: Arthur Schreiber <[email protected]>

Signed-off-by: Arthur Schreiber <[email protected]>

mattlord

LGTM! Just the one minor comment. Thank you for working on this @arthurschreiber and @timvaillancourt ! ❤️

mattlord · 2025-11-19T19:08:33Z

go/vt/vtctl/reparentutil/replication.go

 				var primaryStatus *replicationdatapb.PrimaryStatus

-				primaryStatus, err = tmc.DemotePrimary(groupCtx, tabletInfo.Tablet)
+				primaryStatus, err = tmc.DemotePrimary(groupCtx, tabletInfo.Tablet, true /* force */)


Today, stopReplicationAndBuildStatusMaps appears to be used only for ERS. The function does not appear to be specific to ERS though. So IMO it would be better to make this explicitly tied to ERS. Meaning:

This function takes a new bool for forceDemotion

It then passes that on to DemotePrimary

It's not required, but might potentially prevent future unintended behavior.

Alternatively, we could move this function from replication.go to emergency_reparenter.go, as that's the only place it's used from.

I'll do that in a follow up PR, just to make sure the diffs are easier to read (and so this doesn't require another review round on this PR).

…/add-forced-demotion Signed-off-by: Arthur Schreiber <[email protected]>

… blocked on waiting for semi-sync acks (vitessio#18714) Signed-off-by: Arthur Schreiber <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Signed-off-by: siddharth16396 <[email protected]>

github-actions bot added this to the v23.0.0 milestone Oct 2, 2025

arthurschreiber changed the title ~~Add new force flag to DemotePrimary to force a failover during `E…~~ Add new force flag to DemotePrimary to force a failover during EmergencyReparentShard Oct 3, 2025

systay modified the milestones: v23.0.0, v24.0.0 Oct 8, 2025

arthurschreiber added 3 commits October 15, 2025 12:01

Merge branch 'main' of https://github.com/vitessio/vitess into arthur…

8a54593

…/add-forced-demotion Signed-off-by: Arthur Schreiber <[email protected]>

Unblock semi-sync by disabling it on the source.

0444df7

Signed-off-by: Arthur Schreiber <[email protected]>

Fix behaviour when not forcing the demotion.

8adec1d

Signed-off-by: Arthur Schreiber <[email protected]>

shlomi-noach reviewed Oct 19, 2025

View reviewed changes

arthurschreiber added 2 commits October 21, 2025 14:47

Add a test case.

bde16be

Signed-off-by: Arthur Schreiber <[email protected]>

Improve the test case.

26aa0b4

Signed-off-by: Arthur Schreiber <[email protected]>

arthurschreiber changed the title ~~Add new force flag to DemotePrimary to force a failover during EmergencyReparentShard~~ Add new force flag to DemotePrimary to force a demotion even when blocked on waiting for semi-sync acks Oct 22, 2025

Bump the timeouts.

dda3b0a

Signed-off-by: Arthur Schreiber <[email protected]>

arthurschreiber marked this pull request as ready for review October 22, 2025 14:09

timvaillancourt added 3 commits November 4, 2025 17:13

add suggested test

1c91227

Signed-off-by: Tim Vaillancourt <[email protected]>

test insert blocks, check error from client

33f4cbe

Signed-off-by: Tim Vaillancourt <[email protected]>

remove static sleeps

691e337

Signed-off-by: Tim Vaillancourt <[email protected]>

timvaillancourt reviewed Nov 5, 2025

View reviewed changes

go/test/endtoend/reparent/emergencyreparent/ers_test.go Show resolved Hide resolved

timvaillancourt added 3 commits November 5, 2025 20:50

better comments, check vtgate stats

731198d

Signed-off-by: Tim Vaillancourt <[email protected]>

missing require

76823d1

Signed-off-by: Tim Vaillancourt <[email protected]>

validate tablet types

2294c2b

Signed-off-by: Tim Vaillancourt <[email protected]>

timvaillancourt reviewed Nov 6, 2025

View reviewed changes

go/test/endtoend/reparent/emergencyreparent/ers_test.go Show resolved Hide resolved

timvaillancourt added 4 commits November 11, 2025 13:00

simplify

feb665c

Signed-off-by: Tim Vaillancourt <[email protected]>

lint

b1722b9

Signed-off-by: Tim Vaillancourt <[email protected]>

test primary vttablet is really unreachable

f2622c4

Signed-off-by: Tim Vaillancourt <[email protected]>

Revert "test primary vttablet is really unreachable"

a805c4a

This reverts commit f2622c4. Signed-off-by: Tim Vaillancourt <[email protected]>

timvaillancourt force-pushed the arthur/add-forced-demotion branch from 714a20a to a805c4a Compare November 17, 2025 13:25

timvaillancourt self-requested a review November 17, 2025 13:50

timvaillancourt approved these changes Nov 17, 2025

View reviewed changes

timvaillancourt requested a review from mattlord November 17, 2025 14:24

timvaillancourt linked an issue Nov 18, 2025 that may be closed by this pull request

Bug Report: DemotePrimary always fails during ERS #18763

Closed

mattlord reviewed Nov 18, 2025

View reviewed changes

arthurschreiber and others added 5 commits November 19, 2025 17:19

Add comments to explain the reasoning here.

6284f7b

Signed-off-by: Arthur Schreiber <[email protected]>

use strings.EqualFold

568d148

Signed-off-by: Tim Vaillancourt <[email protected]>

Merge branch 'main' of https://github.com/vitessio/vitess into arthur…

e00923a

…/add-forced-demotion Signed-off-by: Arthur Schreiber <[email protected]>

Fix test failures after merging latest changes.

1f57d86

Signed-off-by: Arthur Schreiber <[email protected]>

mattlord approved these changes Nov 19, 2025

View reviewed changes

arthurschreiber mentioned this pull request Nov 20, 2025

Fix vtadmin package-lock.json #18919

Merged

5 tasks

Merge branch 'main' of https://github.com/vitessio/vitess into arthur…

3a4c5a7

…/add-forced-demotion Signed-off-by: Arthur Schreiber <[email protected]>

arthurschreiber enabled auto-merge (squash) November 26, 2025 15:52

arthurschreiber merged commit 1f49de4 into main Nov 26, 2025
106 of 111 checks passed

arthurschreiber deleted the arthur/add-forced-demotion branch November 26, 2025 17:14

Add new force flag to DemotePrimary to force a demotion even when blocked on waiting for semi-sync acks #18714

Add new force flag to DemotePrimary to force a demotion even when blocked on waiting for semi-sync acks #18714

Uh oh!

Conversation

arthurschreiber commented Oct 2, 2025 • edited by mattlord Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Details

Reproduction

Related Issue(s)

Checklist

Deployment Notes

AI Disclosure

Uh oh!

vitess-bot bot commented Oct 2, 2025

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

Uh oh!

codecov bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shlomi-noach left a comment

Choose a reason for hiding this comment

Uh oh!

arthurschreiber commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arthurschreiber commented Oct 19, 2025

Uh oh!

shlomi-noach commented Oct 19, 2025

Uh oh!

shlomi-noach commented Oct 19, 2025

Uh oh!

Uh oh!

Uh oh!

mattlord Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mattlord Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

mattlord Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattlord left a comment

Choose a reason for hiding this comment

Uh oh!

mattlord Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

arthurschreiber Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

arthurschreiber Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add new `force` flag to `DemotePrimary` to force a demotion even when blocked on waiting for semi-sync acks #18714

Add new `force` flag to `DemotePrimary` to force a demotion even when blocked on waiting for semi-sync acks #18714

arthurschreiber commented Oct 2, 2025 •

edited by mattlord

Loading

codecov bot commented Oct 3, 2025 •

edited

Loading

arthurschreiber commented Oct 19, 2025 •

edited

Loading