-
Notifications
You must be signed in to change notification settings - Fork 26.6k
fix(cluster): During the service provider's release period, concurrent read routes from consumers were rejected #15883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 3.3
Are you sure you want to change the base?
Conversation
…rantReadWriteLock avoids concurrency issues, and using invokerRefreshReadLock avoids lock blocking during high concurrency reads apache#15881
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 3.3 #15883 +/- ##
============================================
+ Coverage 60.74% 60.75% +0.01%
Complexity 11702 11702
============================================
Files 1938 1938
Lines 88694 88710 +16
Branches 13387 13389 +2
============================================
+ Hits 53879 53900 +21
+ Misses 29291 29278 -13
- Partials 5524 5532 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request addresses a concurrency issue during service provider release periods by upgrading the locking mechanism from ReentrantLock to ReentrantReadWriteLock. This change allows multiple consumer threads to concurrently read routes without blocking each other, while still maintaining exclusive access for write operations.
Key Changes:
- Replaced
invokerRefreshLock(ReentrantLock) with a ReentrantReadWriteLock and extracted separate read and write lock references - Modified the
list()method to use the read lock for concurrent access to invoker lists - Updated all write operations (add/remove invokers, refresh, etc.) to use the write lock
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dubbo-cluster/src/main/java/org/apache/dubbo/rpc/cluster/directory/AbstractDirectory.java
Show resolved
Hide resolved
dubbo-cluster/src/main/java/org/apache/dubbo/rpc/cluster/directory/AbstractDirectory.java
Outdated
Show resolved
Hide resolved
dubbo-cluster/src/main/java/org/apache/dubbo/rpc/cluster/directory/AbstractDirectory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dubbo-cluster/src/main/java/org/apache/dubbo/rpc/cluster/directory/AbstractDirectory.java
Show resolved
Hide resolved
dubbo-cluster/src/main/java/org/apache/dubbo/rpc/cluster/directory/AbstractDirectory.java
Outdated
Show resolved
Hide resolved
EarthChen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we accept this PR, it will lead to the invokers not being refreshed when routing occurs. If the QPS is high, this may cause issues such as dead nodes remaining valid for an extended period. Additionally, I don’t understand why #10925 added this restriction. We need to discuss this further. @AlbumenJ
|
Hi @RainYuY : |
So you haven’t encountered the situation where invokers are refreshed late? From my understanding of your code, if a request is being routed, the invoker list cannot be refreshed. As a result, the refresh process will be blocked until routing is completed. However, if routing is ongoing continuously (e.g., a read lock is held persistently), the write lock will take much longer to be acquired. |
you are right. I found that during the release period of a service provider, the overall Dubbo call time increased due to lock blocking until the service provider completed the release. So, I wonder if there is a better way to solve this problem, but currently all I can think of is to lock it first to ensure that the call can be made normally instead of being unable to call directly |
I think a solution that is more oriented to AP would be to remove the validation between the new and old invoker lists to ensure availability. However, this validation was added via a separate PR submitted by another PMC member, so we need to confirm the intention behind this modification. |
I don’t have a better solution yet and I’m still thinking about it. But I’m wondering why this restriction exists, so I’m waiting for Kevin to give me an answer LOL. If I don’t get a reply, I’ll call him this Friday ^v^. @AlbumenJ |
During the service provider's release period, concurrent read routes from consumers were rejected #15881
What is the purpose of the change?
Changing invokerRefreshLock from ReentrantLock to ReentrantReadWriteLock avoids concurrency issues, and using invokerRefreshReadLock avoids lock blocking during high concurrency reads
Checklist