Skip to content

Conversation

@serathius
Copy link
Member

@serathius serathius commented Sep 21, 2025

Continuing @henrybear327 work on #20663, using main...serathius:etcd:robustness-reproduable I managed to find a new watch issue.

tldr; Watch opened on rev -1 will break watch synchronization and cause unsynchronized watchers to skip events, breaking reliable/resumable.

Found that opening a watch on revision -1 will be accepted and cause resync to break. There is no validation for negative revision in the watch code, instead revision is just check against >=compactRev. This is usually good enough, however if there was no compaction executed, etcd uses compactRev equal to -1. That means that request with revision -2 and lower are rejected, but revision -1 passes. The problem is that the watch with minRev equal -1 will be passed to rangeEvents function that doesn't support negative revisions. If negative revision is passed to rangeEvents it will be converted into incorrect range, and return empty result. So all watches during resynchronization when there is watch with rev -1 will not receive any events still will be marked as synchronized.

This is a short term fix to mitigate issue. Long term I expect we should do something better than using -1 to mark etcd uncompacted. Using special values is never a good idea.

/cc @ahrtr @fuweid @siyuanfoundation

@codecov
Copy link

codecov bot commented Sep 21, 2025

Codecov Report

❌ Patch coverage is 91.30435% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.24%. Comparing base (b7420c5) to head (afeab20).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
server/etcdserver/api/v3rpc/watch.go 83.33% 2 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
server/proxy/grpcproxy/watch.go 93.92% <100.00%> (+1.43%) ⬆️
server/storage/mvcc/watchable_store.go 93.25% <100.00%> (+0.37%) ⬆️
server/etcdserver/api/v3rpc/watch.go 85.41% <83.33%> (-0.08%) ⬇️

... and 19 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #20693      +/-   ##
==========================================
+ Coverage   69.20%   69.24%   +0.04%     
==========================================
  Files         420      420              
  Lines       34794    34817      +23     
==========================================
+ Hits        24078    24109      +31     
+ Misses       9320     9317       -3     
+ Partials     1396     1391       -5     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7420c5...afeab20. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 450 to 453
if minRev < 0 {
minRev = 0
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

This change makes sense.

@serathius serathius force-pushed the watch-negative-revision branch from cf9b6a5 to 9391a71 Compare September 21, 2025 18:19
henrybear327 added a commit to henrybear327/etcd that referenced this pull request Sep 21, 2025
Reference:
- etcd-io#20693

Signed-off-by: Chun-Hung Tseng <[email protected]>
henrybear327 added a commit to henrybear327/etcd that referenced this pull request Sep 21, 2025
Reference:
- etcd-io#20693

Signed-off-by: Chun-Hung Tseng <[email protected]>
@henrybear327
Copy link
Contributor

Just made #20663 ready to review! This was the robustness test case that Marek was referring to.

henrybear327 added a commit to henrybear327/etcd that referenced this pull request Sep 21, 2025
Reference:
- etcd-io#20693

Signed-off-by: Chun-Hung Tseng <[email protected]>
henrybear327 added a commit to henrybear327/etcd that referenced this pull request Sep 21, 2025
Reference:
- etcd-io#20693

Signed-off-by: Chun-Hung Tseng <[email protected]>
henrybear327 added a commit to henrybear327/etcd that referenced this pull request Sep 22, 2025
Reference:
- etcd-io#20693

Signed-off-by: Chun-Hung Tseng <[email protected]>
henrybear327 added a commit to henrybear327/etcd that referenced this pull request Sep 22, 2025
Reference:
- etcd-io#20693

Signed-off-by: Chun-Hung Tseng <[email protected]>
@serathius serathius force-pushed the watch-negative-revision branch from 9391a71 to 0c9169d Compare September 22, 2025 08:33
henrybear327 added a commit to henrybear327/etcd that referenced this pull request Sep 22, 2025
Reference:
- etcd-io#20693

Signed-off-by: Chun-Hung Tseng <[email protected]>
@serathius serathius force-pushed the watch-negative-revision branch 2 times, most recently from 7127584 to 5d8d14c Compare September 23, 2025 10:12
@serathius
Copy link
Member Author

Added validation into grpc_proxy to fix it's test.

@serathius serathius force-pushed the watch-negative-revision branch from 5d8d14c to afeab20 Compare September 23, 2025 10:25
@serathius
Copy link
Member Author

/retest

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fuweid, serathius, siyuanfoundation

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [fuweid,serathius,siyuanfoundation]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@serathius serathius merged commit c9d4602 into etcd-io:main Sep 23, 2025
31 checks passed
@serathius
Copy link
Member Author

/cherry-pick release-3.6

@k8s-infra-cherrypick-robot

@serathius: new pull request created: #20707

In response to this:

/cherry-pick release-3.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@serathius
Copy link
Member Author

/cherry-pick release-3.5

@k8s-infra-cherrypick-robot

@serathius: #20693 failed to apply on top of branch "release-3.5":

Applying: Reject watch request with -1 revision and make rangeEvents safe against negative revision
Using index info to reconstruct a base tree...
M	server/etcdserver/api/v3rpc/watch.go
M	server/proxy/grpcproxy/watch.go
A	server/storage/mvcc/watchable_store.go
A	server/storage/mvcc/watchable_store_test.go
M	tests/integration/clientv3/watch_test.go
Falling back to patching base and 3-way merge...
Auto-merging tests/integration/clientv3/watch_test.go
CONFLICT (content): Merge conflict in tests/integration/clientv3/watch_test.go
Auto-merging server/proxy/grpcproxy/watch.go
Auto-merging server/mvcc/watchable_store_test.go
CONFLICT (content): Merge conflict in server/mvcc/watchable_store_test.go
Auto-merging server/mvcc/watchable_store.go
CONFLICT (content): Merge conflict in server/mvcc/watchable_store.go
Auto-merging server/etcdserver/api/v3rpc/watch.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Reject watch request with -1 revision and make rangeEvents safe against negative revision

In response to this:

/cherry-pick release-3.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

henrybear327 added a commit to henrybear327/etcd that referenced this pull request Sep 26, 2025
Reference:
- etcd-io#20693

Signed-off-by: Chun-Hung Tseng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

7 participants