-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix/triple shared channel unavailable issue #1520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/triple shared channel unavailable issue #1520
Conversation
WalkthroughReplaces ProviderInfo-keyed channel caching with ClientTransportConfig-keyed caching in TripleClientTransport; adds URL_CONNECTION_MAP and clientTransportConfig; updates ReferenceCountManagedChannel.shutdown() to call grpcChannel.shutdown(), await up to 5 seconds and log on interruption; adds tests validating distinct config-keyed entries and cleanup. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TCT as TripleClientTransport
participant Map as URL_CONNECTION_MAP
participant RCM as ReferenceCountManagedChannel
participant GC as GrpcChannel
Client->>TCT: connect(transportConfig)
TCT->>Map: get(ClientTransportConfig)
alt found
Map-->>TCT: existing RCM (increment ref)
else not found
TCT->>RCM: initChannel(config.getProviderInfo())
RCM-->>Map: put(ClientTransportConfig, RCM)
Map-->>TCT: new RCM
end
Client->>TCT: disconnect()
TCT->>RCM: release() / decrement ref
alt ref == 0
RCM->>GC: shutdown()
RCM->>GC: awaitTermination(5s)
alt interrupted
Note right of RCM: LOGGER.warn(...) on interruption
end
RCM-->>TCT: return shutdown handle
TCT->>Map: remove(ClientTransportConfig)
else ref > 0
RCM-->>TCT: return channel (no shutdown)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java (1)
57-57: Consider logging when termination times out.The
awaitTerminationmethod returnsfalseif the timeout elapses before termination completes. This return value is currently ignored, which means the channel might still be terminating but the caller has no indication of this condition.Apply this diff to log timeout conditions:
try { - shutdown.awaitTermination(5, TimeUnit.SECONDS); + if (!shutdown.awaitTermination(5, TimeUnit.SECONDS)) { + LOGGER.warn("Triple channel did not terminate within 5 seconds."); + } } catch (InterruptedException e) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java(3 hunks)remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java(7 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: EvenLjj
Repo: sofastack/sofa-rpc PR: 1488
File: remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/tracer/sofatracer/TripleTracerAdapter.java:290-303
Timestamp: 2025-05-08T06:34:38.670Z
Learning: In SOFA-RPC's Triple protocol adapter, the server detects invocation type (streaming/sync) from gRPC method descriptors but intentionally doesn't propagate this to request properties or headers. This is by design to maintain compatibility with clients of different versions, as the server should rely on client-provided values rather than overriding them.
🧬 Code graph analysis (1)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java (1)
core/api/src/main/java/com/alipay/sofa/rpc/log/LoggerFactory.java (1)
LoggerFactory(29-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: sca
🔇 Additional comments (3)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java (3)
19-20: LGTM!The logging imports are correct and follow SOFA-RPC's standard logging patterns.
35-35: LGTM!The logger declaration follows standard Java logging patterns with proper use of static final and SOFA-RPC's LoggerFactory.
77-79: Verify the incomplete shutdownNow() implementation.The TODO comment indicates that
shutdownNow()is not properly implemented. Per the gRPCManagedChannelcontract,shutdownNow()should forcefully terminate connections immediately, but the current implementation delegates toshutdown()which performs a graceful shutdown instead.Please confirm whether forceful immediate termination is needed. If so, the implementation should call
grpcChannel.shutdownNow()and potentially skip the await:@Override public ManagedChannel shutdownNow() { - // TODO - return shutdown(); + if (referenceCount.decrementAndGet() <= 0) { + return grpcChannel.shutdownNow(); + } + return grpcChannel; }
...-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java
Show resolved
Hide resolved
| if (channel != null) { | ||
| try { | ||
| channel.shutdown().awaitTermination(5, TimeUnit.SECONDS); | ||
| } catch (InterruptedException e) { | ||
| LOGGER.warn("Triple channel shut down interrupted."); | ||
| } | ||
| channel.shutdown(); | ||
| if (channel.isShutdown()) { | ||
| channel = null; | ||
| channelMap.remove(providerInfo.toString()); | ||
| URL_CONNECTION_MAP.remove(clientTransportConfig); | ||
| } | ||
| } else { | ||
| channelMap.remove(providerInfo.toString()); | ||
| URL_CONNECTION_MAP.remove(clientTransportConfig); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid removing a freshly replaced channel from the shared cache
Because URL_CONNECTION_MAP is static, another transport using the same ClientTransportConfig can repopulate the entry with a brand-new ReferenceCountManagedChannel between the time this transport shuts its instance down and this remove executes. Removing by key alone would tear down the new channel and drop it from the cache, forcing an unexpected reconnect for the other transport. Guard the removal with the current channel instance so we only clear the entry if it still matches the one we just shut down.
- URL_CONNECTION_MAP.remove(clientTransportConfig);
+ URL_CONNECTION_MAP.remove(clientTransportConfig, (ReferenceCountManagedChannel) channel);Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java
around lines 124-132, the code unconditionally removes the map entry by key
which can evict a freshly replaced channel from the static URL_CONNECTION_MAP;
change the removal to a conditional removal that only removes the entry if the
current map value equals the channel we just shut down (e.g., use
URL_CONNECTION_MAP.remove(clientTransportConfig, channel) or check equality
before remove) so that a concurrently replaced channel is not torn down; keep
the existing null/ shutdown checks and only perform the conditional remove after
you set channel to null.
f29ce63 to
d951b0e
Compare
d951b0e to
b575ff4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java (1)
258-280: Thread safety issue confirmed: synchronizedisconnect()on sharedlock.The race condition identified in the review is valid.
disconnect()removes fromURL_CONNECTION_MAP(lines 128, 131) without holding thelock, whilegetSharedChannel()protects its put operation (line 275) within a synchronized block. This creates a window where:
- Thread A calls
getSharedChannel(), reads the map at line 259 (unsynced)- Thread B concurrently calls
disconnect(), removes the entry (lines 128/131, unsynced)- Thread A's subsequent operations at line 275 may resurrect or overwrite that removal
Fix: Wrap the
disconnect()map removals (lines 128, 131) insynchronized(lock)to coordinate withgetSharedChannel(), ensuring atomic compound operations (check-then-act pattern).
♻️ Duplicate comments (1)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java (1)
123-133: Race condition: unconditional map removal can evict a newly created channel.The disconnect logic has a critical race condition previously flagged in review:
- Thread A shuts down
channel(line 125)- Thread B concurrently calls
getSharedChannel, detects the shut-down channel as unavailable, and replaces the map entry with a freshReferenceCountManagedChannel- Thread A removes the map entry by key alone (line 128 or 131), evicting Thread B's newly created channel
- Thread B's transport now has an orphaned channel not tracked in the map
Apply conditional removal to only clear the entry if it still holds the channel we shut down:
@Override public void disconnect() { if (channel != null) { + ManagedChannel channelToRemove = channel; channel.shutdown(); - if (channel.isShutdown()) { + if (channelToRemove.isShutdown()) { channel = null; - URL_CONNECTION_MAP.remove(clientTransportConfig); + URL_CONNECTION_MAP.remove(clientTransportConfig, (ReferenceCountManagedChannel) channelToRemove); } } else { - URL_CONNECTION_MAP.remove(clientTransportConfig); + // No channel to remove; another thread may have already cleared it } }
🧹 Nitpick comments (1)
remoting/remoting-triple/src/test/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransportTest.java (1)
40-67: Consider English comment and enhance test coverage.Minor issues:
- Line 42: The Chinese comment should be translated to English for consistency across the codebase.
- The test validates map presence but doesn't verify the channels are actually functional or that the reference counting works correctly. Consider adding assertions to check
channel.isAvailable()or reference count values.Translation suggestion for line 42:
- //模拟两个 reference 去消费同一份推送数据 + // Simulate two references consuming the same provider data
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java(3 hunks)remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java(7 hunks)remoting/remoting-triple/src/test/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransportTest.java(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java
🧰 Additional context used
🧬 Code graph analysis (1)
remoting/remoting-triple/src/test/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransportTest.java (4)
core/api/src/main/java/com/alipay/sofa/rpc/client/ProviderInfo.java (1)
ProviderInfo(35-528)core/api/src/main/java/com/alipay/sofa/rpc/config/ConsumerConfig.java (1)
ConsumerConfig(71-1044)core/api/src/main/java/com/alipay/sofa/rpc/transport/ClientTransportConfig.java (1)
ClientTransportConfig(41-299)core/api/src/main/java/com/alipay/sofa/rpc/transport/ClientTransportFactory.java (1)
ClientTransportFactory(36-189)
🪛 GitHub Actions: build
remoting/remoting-triple/src/test/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransportTest.java
[error] 1-1: Please commit your change before run this shell, un commit files: M remoting/remoting-triple/src/test/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransportTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: sca
🔇 Additional comments (5)
remoting/remoting-triple/src/test/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransportTest.java (2)
19-23: LGTM!The new imports support the test validating config-keyed channel caching behavior.
69-79: LGTM!The helper method correctly maps ConsumerConfig and ProviderInfo fields to ClientTransportConfig, following the fluent builder pattern.
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java (3)
64-87: LGTM!The migration from String-keyed
channelMapto ClientTransportConfig-keyedURL_CONNECTION_MAPis correctly implemented. Since ClientTransportConfig lacks customequals/hashCode, the map uses identity-based keying, ensuring each consumer config instance gets its own dedicated channel. This design prevents the shared-channel unavailability issues the PR aims to fix.
100-107: LGTM!Storing
transportConfigin theclientTransportConfigfield enables proper cleanup indisconnect()by providing the map key.
109-116: LGTM!The updated call to
getSharedChannel(transportConfig)correctly uses the config-based channel retrieval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java (1)
56-60: Restore the interrupted status in shutdown waitCatching
InterruptedExceptionwithout resetting the thread’s interrupted flag breaks cooperative cancellation. Please addThread.currentThread().interrupt();after the log so upstream logic can detect the interrupt. (seaxiang.com)} catch (InterruptedException e) { - LOGGER.warn("Triple channel shut down interrupted."); + LOGGER.warn("Triple channel shut down interrupted.", e); + Thread.currentThread().interrupt(); }remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java (1)
124-132: Guard shared-map removal against concurrent replacement
URL_CONNECTION_MAP.remove(clientTransportConfig)blindly clears whichever channel is registered under that key. If another transport refreshed the shared channel between your shutdown and this remove, you’ll evict the new channel and force an unexpected reconnect. Remove the entry conditionally using the channel instance you just closed (and drop the unconditionalelsebranch). (seaxiang.com)- if (channel != null) { - channel.shutdown(); - if (channel.isShutdown()) { - channel = null; - URL_CONNECTION_MAP.remove(clientTransportConfig); - } - } else { - URL_CONNECTION_MAP.remove(clientTransportConfig); - } + if (channel != null) { + ReferenceCountManagedChannel refChannel = (ReferenceCountManagedChannel) channel; + refChannel.shutdown(); + if (refChannel.isShutdown()) { + channel = null; + URL_CONNECTION_MAP.remove(clientTransportConfig, refChannel); + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java(3 hunks)remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java(7 hunks)remoting/remoting-triple/src/test/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransportTest.java(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- remoting/remoting-triple/src/test/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransportTest.java
🧰 Additional context used
🧬 Code graph analysis (1)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/ReferenceCountManagedChannel.java (1)
core/api/src/main/java/com/alipay/sofa/rpc/log/LoggerFactory.java (1)
LoggerFactory(29-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: sca
...emoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java (1)
258-281: Critical: Per-instance lock does not protect the shared static map.The
lockobject is an instance field (line 89), so differentTripleClientTransportinstances use different locks when accessing the shared staticURL_CONNECTION_MAP. This allows race conditions when multiple transports concurrently access the sameClientTransportConfig:Race scenario 1 (duplicate channel creation):
- Transport A:
synchronized(lockA)→ checks map, config not found- Transport B:
synchronized(lockB)→ checks map, config not found (no mutual exclusion)- Both create new channels; one overwrites the other in the map → channel leak and incorrect refcounting
Race scenario 2 (refcount increment order):
- Transport A: creates channel with refcount=0, calls
map.put(config, channel)- Transport B: gets channel from map (still refcount=0), increments to 1
- Transport A: increments to 2
- The final refcount is correct, but Transport B briefly held a channel with refcount=0
Use a static lock or a per-config lock (e.g., via
ConcurrentHashMap+computeIfAbsent):Solution 1: Static lock (simple but coarse-grained):
- protected final Object lock = new Object(); + protected static final Object lock = new Object();Solution 2: Per-config fine-grained locking (preferred, avoids contention across different configs):
private ReferenceCountManagedChannel getSharedChannel(ClientTransportConfig config) { - ReferenceCountManagedChannel channel = URL_CONNECTION_MAP.get(config); - - if (channelAvailable(channel)) { - channel.incrementAndGetCount(); - return channel; - } else if (channel != null) { - channel.shutdownNow(); - } - - synchronized (lock) { - channel = URL_CONNECTION_MAP.get(config); - // double check - if (channelAvailable(channel)) { - channel.incrementAndGetCount(); - } else { - channel = new ReferenceCountManagedChannel(initChannel(config.getProviderInfo())); - URL_CONNECTION_MAP.put(config, channel); - channel.incrementAndGetCount(); - } - } - - return channel; + return URL_CONNECTION_MAP.compute(config, (k, existing) -> { + if (channelAvailable(existing)) { + existing.incrementAndGetCount(); + return existing; + } else if (existing != null) { + existing.shutdownNow(); + } + ReferenceCountManagedChannel newChannel = new ReferenceCountManagedChannel(initChannel(config.getProviderInfo())); + newChannel.incrementAndGetCount(); + return newChannel; + }); }
♻️ Duplicate comments (1)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java (1)
124-132: Use conditional removal to avoid evicting a concurrently replaced channel.The static
URL_CONNECTION_MAPis shared across all transport instances. Between checkingchannel.isShutdown()and callingremove(), another thread might replace the map entry with a new channel for the sameclientTransportConfig. The unconditionalremove(key)would then evict the new channel and break active connections.Apply this diff to use conditional removal:
public void disconnect() { if (channel != null) { channel.shutdown(); if (channel.isShutdown()) { + URL_CONNECTION_MAP.remove(clientTransportConfig, (ReferenceCountManagedChannel) channel); channel = null; - URL_CONNECTION_MAP.remove(clientTransportConfig); } } else { - URL_CONNECTION_MAP.remove(clientTransportConfig); + // No local channel reference; cannot safely remove from shared map } }Based on learnings
🧹 Nitpick comments (1)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java (1)
64-70: Consider removing the redundantclientTransportConfigfield.The
clientTransportConfigfield duplicatestransportConfig, which is already available from the parentClientTransportclass. You can usetransportConfigdirectly throughout the code (e.g., at lines 128, 131, 259, 269, 275) instead of maintaining a separate field.Apply this diff to remove the redundancy:
- protected ClientTransportConfig clientTransportConfig; - protected ManagedChannel channel;Then update references:
- Line 103: Remove
clientTransportConfig = transportConfig;- Lines 128, 131: Use
transportConfiginstead ofclientTransportConfig- Lines 259, 269, 275: Already use
configparameter, no change needed
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java(7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: sca
🔇 Additional comments (3)
remoting/remoting-triple/src/main/java/com/alipay/sofa/rpc/transport/triple/TripleClientTransport.java (3)
81-87: LGTM! Config-based caching enables proper connection distinction.The migration from
channelMap(keyed by address String) toURL_CONNECTION_MAP(keyed byClientTransportConfig) correctly enables multiple connections to the same endpoint with different configurations.
114-114: LGTM! Correctly uses config-based channel retrieval.The change to
getSharedChannel(transportConfig)properly aligns with the refactored caching mechanism.
274-276: Refcount increment now present, addressing prior review.The code correctly increments the reference count after creating a new channel (line 276), which addresses the concern from the previous review. However, this section remains vulnerable to concurrency issues due to the per-instance lock (see comment on lines 258-281).
Fix/triple shared channel unavailable issue
Summary by CodeRabbit
Bug Fixes
Chores
Tests