Skip to content

Commit 256b873

Browse files
committed
Add mcp-resilience4j module with transport-level resilience
Adds a new optional mcp-resilience4j module that wraps any McpClientTransport with configurable Resilience4j policies, making MCP tool calls resilient to transient failures, slow servers, and traffic spikes. ResilientMcpClientTransport implements McpClientTransport and applies up to five policies in the standard recommended order: Retry -> CircuitBreaker -> RateLimiter -> TimeLimiter -> Bulkhead All policies are optional. sendMessage() applies all five. connect() applies only CircuitBreaker and Retry — session establishment is not throttled or timed out. McpResilienceConfig provides a high-level fluent facade over the builder for the common configuration case. Includes fail-fast null validation on all setters and a WARN log when a registry name collision would silently discard a supplied config. Circuit breaker state transitions and retry events are logged automatically via Resilience4j event publishers at construction time. Also includes 13 unit tests and a README covering usage, policy ordering rationale, registry guidance, and observability. Bumps project version to 2.1.1-SNAPSHOT.
1 parent d1ef187 commit 256b873

16 files changed

Lines changed: 1615 additions & 20 deletions

File tree

conformance-tests/client-jdk-http-client/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>io.modelcontextprotocol.sdk</groupId>
88
<artifactId>conformance-tests</artifactId>
9-
<version>2.0.1-SNAPSHOT</version>
9+
<version>2.1.1-SNAPSHOT</version>
1010
</parent>
1111
<artifactId>client-jdk-http-client</artifactId>
1212
<packaging>jar</packaging>
@@ -28,7 +28,7 @@
2828
<dependency>
2929
<groupId>io.modelcontextprotocol.sdk</groupId>
3030
<artifactId>mcp</artifactId>
31-
<version>2.0.1-SNAPSHOT</version>
31+
<version>2.1.1-SNAPSHOT</version>
3232
</dependency>
3333

3434
<!-- Logging -->

conformance-tests/client-spring-http-client/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>io.modelcontextprotocol.sdk</groupId>
88
<artifactId>conformance-tests</artifactId>
9-
<version>2.0.1-SNAPSHOT</version>
9+
<version>2.1.1-SNAPSHOT</version>
1010
</parent>
1111
<artifactId>client-spring-http-client</artifactId>
1212
<packaging>jar</packaging>

conformance-tests/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>io.modelcontextprotocol.sdk</groupId>
88
<artifactId>mcp-parent</artifactId>
9-
<version>2.0.1-SNAPSHOT</version>
9+
<version>2.1.1-SNAPSHOT</version>
1010
</parent>
1111
<artifactId>conformance-tests</artifactId>
1212
<packaging>pom</packaging>

conformance-tests/server-servlet/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>io.modelcontextprotocol.sdk</groupId>
88
<artifactId>conformance-tests</artifactId>
9-
<version>2.0.1-SNAPSHOT</version>
9+
<version>2.1.1-SNAPSHOT</version>
1010
</parent>
1111
<artifactId>server-servlet</artifactId>
1212
<packaging>jar</packaging>
@@ -28,7 +28,7 @@
2828
<dependency>
2929
<groupId>io.modelcontextprotocol.sdk</groupId>
3030
<artifactId>mcp</artifactId>
31-
<version>2.0.1-SNAPSHOT</version>
31+
<version>2.1.1-SNAPSHOT</version>
3232
</dependency>
3333

3434
<dependency>

mcp-bom/pom.xml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
<parent>
88
<groupId>io.modelcontextprotocol.sdk</groupId>
99
<artifactId>mcp-parent</artifactId>
10-
<version>2.0.1-SNAPSHOT</version>
10+
<version>2.1.1-SNAPSHOT</version>
1111
</parent>
1212

1313
<artifactId>mcp-bom</artifactId>
@@ -61,6 +61,13 @@
6161
<version>${project.version}</version>
6262
</dependency>
6363

64+
<!-- MCP Resilience4j -->
65+
<dependency>
66+
<groupId>io.modelcontextprotocol.sdk</groupId>
67+
<artifactId>mcp-resilience4j</artifactId>
68+
<version>${project.version}</version>
69+
</dependency>
70+
6471
</dependencies>
6572
</dependencyManagement>
6673

mcp-core/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>io.modelcontextprotocol.sdk</groupId>
88
<artifactId>mcp-parent</artifactId>
9-
<version>2.0.1-SNAPSHOT</version>
9+
<version>2.1.1-SNAPSHOT</version>
1010
</parent>
1111
<artifactId>mcp-core</artifactId>
1212
<packaging>jar</packaging>

mcp-json-jackson2/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>io.modelcontextprotocol.sdk</groupId>
88
<artifactId>mcp-parent</artifactId>
9-
<version>2.0.1-SNAPSHOT</version>
9+
<version>2.1.1-SNAPSHOT</version>
1010
</parent>
1111
<artifactId>mcp-json-jackson2</artifactId>
1212
<packaging>jar</packaging>
@@ -74,7 +74,7 @@
7474
<dependency>
7575
<groupId>io.modelcontextprotocol.sdk</groupId>
7676
<artifactId>mcp-core</artifactId>
77-
<version>2.0.1-SNAPSHOT</version>
77+
<version>2.1.1-SNAPSHOT</version>
7878
</dependency>
7979
<dependency>
8080
<groupId>com.networknt</groupId>

mcp-json-jackson3/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
<parent>
77
<groupId>io.modelcontextprotocol.sdk</groupId>
88
<artifactId>mcp-parent</artifactId>
9-
<version>2.0.1-SNAPSHOT</version>
9+
<version>2.1.1-SNAPSHOT</version>
1010
</parent>
1111
<artifactId>mcp-json-jackson3</artifactId>
1212
<packaging>jar</packaging>
@@ -68,7 +68,7 @@
6868
<dependency>
6969
<groupId>io.modelcontextprotocol.sdk</groupId>
7070
<artifactId>mcp-core</artifactId>
71-
<version>2.0.1-SNAPSHOT</version>
71+
<version>2.1.1-SNAPSHOT</version>
7272
</dependency>
7373
<dependency>
7474
<groupId>tools.jackson.core</groupId>

mcp-resilience4j/README.md

Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
# mcp-resilience4j
2+
3+
Resilience4j integration for the Java MCP SDK. Wraps any `McpClientTransport` with configurable circuit breaking, retry, rate limiting, time limiting, and bulkhead policies to make MCP tool calls resilient to transient failures, slow servers, and traffic spikes.
4+
5+
## Overview
6+
7+
MCP tool calls cross a network. Without resilience:
8+
9+
- A slow server blocks a thread indefinitely
10+
- A flaky server causes cascading failures upstream
11+
- A burst of parallel agent calls can overwhelm a rate-limited endpoint
12+
- One failing server keeps being called even though it cannot recover by itself
13+
14+
`mcp-resilience4j` addresses all of these at the **transport level** — the single integration point exposed by the MCP SDK and frameworks like Google ADK. Because one transport wraps one MCP server connection, the policies are effectively per-server and composable across multiple clients.
15+
16+
## Maven Dependency
17+
18+
```xml
19+
<dependency>
20+
<groupId>io.modelcontextprotocol.sdk</groupId>
21+
<artifactId>mcp-resilience4j</artifactId>
22+
<version>2.1.1-SNAPSHOT</version>
23+
</dependency>
24+
```
25+
26+
Or via the BOM:
27+
28+
```xml
29+
<dependencyManagement>
30+
<dependencies>
31+
<dependency>
32+
<groupId>io.modelcontextprotocol.sdk</groupId>
33+
<artifactId>mcp-bom</artifactId>
34+
<version>2.1.1-SNAPSHOT</version>
35+
<type>pom</type>
36+
<scope>import</scope>
37+
</dependency>
38+
</dependencies>
39+
</dependencyManagement>
40+
```
41+
42+
## Quick Start
43+
44+
### High-level facade: `McpResilienceConfig`
45+
46+
```java
47+
McpResilienceConfig config = McpResilienceConfig.builder()
48+
.transportCircuitBreaker(CircuitBreakerConfig.custom()
49+
.slidingWindowSize(10)
50+
.failureRateThreshold(50)
51+
.waitDurationInOpenState(Duration.ofSeconds(30))
52+
.build())
53+
.transportRetry(RetryConfig.custom()
54+
.maxAttempts(3)
55+
.waitDuration(Duration.ofMillis(500))
56+
.build())
57+
.transportTimeLimiter(TimeLimiterConfig.custom()
58+
.timeoutDuration(Duration.ofSeconds(8))
59+
.build())
60+
.build();
61+
62+
McpClientTransport resilientTransport = config.wrapTransport(rawTransport);
63+
McpAsyncClient client = McpClient.async(resilientTransport).build();
64+
```
65+
66+
### Direct builder: `ResilientMcpClientTransport`
67+
68+
For full control over all five policies:
69+
70+
```java
71+
McpClientTransport resilientTransport = ResilientMcpClientTransport.builder(rawTransport)
72+
.circuitBreakerConfig(CircuitBreakerConfig.custom()
73+
.slidingWindowSize(10)
74+
.failureRateThreshold(50)
75+
.waitDurationInOpenState(Duration.ofSeconds(30))
76+
.build())
77+
.retryConfig(RetryConfig.custom()
78+
.maxAttempts(3)
79+
.waitDuration(Duration.ofMillis(500))
80+
.build())
81+
.timeLimiterConfig(TimeLimiterConfig.custom()
82+
.timeoutDuration(Duration.ofSeconds(8))
83+
.build())
84+
.rateLimiterConfig(RateLimiterConfig.custom()
85+
.limitForPeriod(20)
86+
.limitRefreshPeriod(Duration.ofSeconds(1))
87+
.build())
88+
.bulkheadConfig(BulkheadConfig.custom()
89+
.maxConcurrentCalls(10)
90+
.build())
91+
.build();
92+
```
93+
94+
Policies are optional — configure only what you need.
95+
96+
## Policy Reference
97+
98+
| Policy | Guards against | Exception thrown | Applied on |
99+
|---|---|---|---|
100+
| **CircuitBreaker** | Persistent server failures | `CallNotPermittedException` | `sendMessage`, `connect` |
101+
| **Retry** | Transient failures | Last exception / `MaxRetriesExceededException` | `sendMessage`, `connect` |
102+
| **TimeLimiter** | Slow servers exceeding deadline | `TimeoutException` | `sendMessage` only |
103+
| **RateLimiter** | Request rate exceeding a threshold | `RequestNotPermitted` | `sendMessage` only |
104+
| **Bulkhead** | Too many concurrent in-flight requests | `BulkheadFullException` | `sendMessage` only |
105+
106+
`connect()` uses only CircuitBreaker and Retry — session establishment is not throttled or timed out, as it can legitimately take longer than a single request.
107+
108+
## Policy Ordering
109+
110+
Policies are applied in the following order (outermost to innermost):
111+
112+
```
113+
Retry → CircuitBreaker → RateLimiter → TimeLimiter → Bulkhead → MCP Server
114+
```
115+
116+
This is the standard Resilience4j recommended hierarchy. Each position is deliberate:
117+
118+
- **Retry is outermost** so it orchestrates the entire inner chain per attempt. If Retry were inside CircuitBreaker, the CB would only see the outcome of the entire retry loop — delaying its failure detection by `maxAttempts × timeout`.
119+
- **Bulkhead is innermost** so concurrency slots are held only during actual execution, not during Retry's backoff sleep. If Bulkhead were outermost, a failing request would clog a slot for the full backoff duration, blocking healthy concurrent callers.
120+
- **RateLimiter is inside Retry** so each retry attempt consumes a rate token. This ensures your local token count matches the actual number of requests the server receives.
121+
- **TimeLimiter is per-attempt** so each retry gets a fresh timeout window rather than sharing one budget across all attempts.
122+
123+
## Using Shared Registries
124+
125+
Register instances by name to observe policy state via Micrometer or the Resilience4j admin endpoints:
126+
127+
```java
128+
CircuitBreakerRegistry registry = CircuitBreakerRegistry.ofDefaults();
129+
130+
// Each transport gets a unique name — essential when sharing a registry
131+
McpClientTransport weatherTransport = ResilientMcpClientTransport.builder(rawWeatherTransport)
132+
.circuitBreakerName("mcp-weather")
133+
.circuitBreakerRegistry(registry)
134+
.circuitBreakerConfig(CircuitBreakerConfig.ofDefaults())
135+
.build();
136+
137+
McpClientTransport searchTransport = ResilientMcpClientTransport.builder(rawSearchTransport)
138+
.circuitBreakerName("mcp-search")
139+
.circuitBreakerRegistry(registry)
140+
.circuitBreakerConfig(CircuitBreakerConfig.ofDefaults())
141+
.build();
142+
```
143+
144+
> **Note:** Resilience4j registries cache instances by name. If you supply a registry and a config but the name already exists in the registry, the existing instance is returned and your config is **silently ignored** by the registry. `ResilientMcpClientTransport` logs a `WARN` when this happens. Always use unique names when creating multiple transports from a shared registry.
145+
146+
## Observability
147+
148+
Circuit breaker state transitions and retry events are logged automatically at construction time — no metrics system required.
149+
150+
| Event | Level | Example |
151+
|---|---|---|
152+
| Circuit breaker state change | `INFO` | `MCP circuit breaker 'mcp-weather': CLOSED -> OPEN` |
153+
| Call rejected (circuit open) | `WARN` | `MCP circuit breaker 'mcp-weather' is OPEN, call rejected` |
154+
| Retry attempt | `DEBUG` | `MCP retry 'mcp-weather': attempt #2` |
155+
| Retry exhausted | `WARN` | `MCP retry 'mcp-weather' exhausted after 3 attempt(s)` |
156+
157+
For Micrometer-based metrics (Prometheus, Datadog, etc.), add `resilience4j-micrometer` to your dependencies and bind the registry to your `MeterRegistry`:
158+
159+
```java
160+
TaggedCircuitBreakerMetrics.ofCircuitBreakerRegistry(registry)
161+
.bindTo(meterRegistry);
162+
```
163+
164+
## Integration with Google ADK
165+
166+
When using [Google ADK](https://google.github.io/adk-docs/), the `McpTransportBuilder` interface is the only injection point for custom transport behaviour. Implement it to wrap the raw transport transparently:
167+
168+
```java
169+
public class ResilientMcpTransportBuilder implements McpTransportBuilder {
170+
171+
private final McpTransportBuilder delegate;
172+
private final CircuitBreakerRegistry cbRegistry;
173+
174+
@Override
175+
public McpClientTransport build(Object serverParameters) {
176+
McpClientTransport raw = delegate.build(serverParameters);
177+
return ResilientMcpClientTransport.builder(raw)
178+
.circuitBreakerName("mcp-" + endpointName)
179+
.circuitBreakerRegistry(cbRegistry)
180+
.circuitBreakerConfig(CircuitBreakerConfig.custom()
181+
.slidingWindowSize(10)
182+
.failureRateThreshold(50)
183+
.waitDurationInOpenState(Duration.ofSeconds(30))
184+
.build())
185+
.retryConfig(RetryConfig.custom()
186+
.maxAttempts(3)
187+
.waitDuration(Duration.ofMillis(500))
188+
.build())
189+
.timeLimiterConfig(TimeLimiterConfig.custom()
190+
.timeoutDuration(Duration.ofSeconds(8))
191+
.build())
192+
.build();
193+
}
194+
}
195+
```
196+
197+
Pass `ResilientMcpTransportBuilder` wherever ADK expects an `McpTransportBuilder`. `McpSessionManager` calls `build()` lazily on first use, so each server connection gets its own named policy instance.
198+
199+
## Sample Request Flow
200+
201+
A normal `sendMessage()` call with all five policies configured:
202+
203+
```
204+
caller
205+
└─ Retry (attempt 1)
206+
└─ CircuitBreaker [CLOSED — passes through, records attempt]
207+
└─ RateLimiter [token available — acquires, passes through]
208+
└─ TimeLimiter [8s countdown starts]
209+
└─ Bulkhead [slot available — acquires]
210+
└─ MCP Server ──► response in 200ms
211+
Bulkhead slot released
212+
TimeLimiter cancelled
213+
CircuitBreaker records SUCCESS
214+
Retry: no failure, done
215+
caller receives response
216+
```
217+
218+
On a transient failure with retry:
219+
220+
```
221+
Retry (attempt 1) → server fails → CB records failure #1
222+
Retry waits 500ms (Bulkhead slot already released)
223+
Retry (attempt 2) → server succeeds → CB records success #1
224+
caller receives response
225+
```
226+
227+
On persistent failures, the CircuitBreaker opens after the sliding window fills with failures and subsequent calls are rejected immediately without a network round-trip.
228+
229+
## Building from Source
230+
231+
```bash
232+
./mvnw clean install -pl mcp-resilience4j -am -DskipTests
233+
```
234+
235+
To run the module's tests:
236+
237+
```bash
238+
./mvnw test -pl mcp-resilience4j
239+
```

0 commit comments

Comments
 (0)