-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Description
I’m using Lettuce as the Redis client in a Spring Boot application.
Our Redis setup is an AWS-managed Redis Cluster with 3 nodes.
In our system, we frequently use the MGET command to fetch up to 150 keys at once.
These keys are typically spread across multiple hash slots, not all located in the same slot.
Every few weeks (roughly every 3–4 weeks), we observe a short period—about 10–15 seconds—during which all Redis commands start to time out.
This affects not only MGET but also simple commands like GET.
After this period, everything recovers automatically without any manual intervention.
When the timeout error occurred, the CPU utilization and bandwidth of the Redis server were found to be in good condition.
I suspect this might be related to how Lettuce handles multi-slot MGET operations in cluster mode.
When MGET involves keys from different slots, Lettuce decomposes the command into multiple per-slot operations and executes them concurrently.
If the decomposition or result merging takes too long, it might temporarily block one of Lettuce’s underlying Netty channels, causing subsequent commands on the same channel to time out.
Questions
- Could this kind of global timeout be caused by executing large
MGEToperations across multiple slots? - Does Lettuce use a shared Netty I/O channel per connection that could become blocked during such operations?
- Would you recommend limiting the number of keys per
MGETor splitting them into smaller parallel requests to avoid blocking?
Error Stack
org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:274) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.convertLettuceAccessException(LettuceStringCommands.java:800) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.mGet(LettuceStringCommands.java:123) at org.springframework.data.redis.connection.DefaultedRedisConnection.mGet(DefaultedRedisConnection.java:281) at org.springframework.data.redis.connection.DefaultStringRedisConnection.mGet(DefaultStringRedisConnection.java:777) at org.springframework.data.redis.core.DefaultValueOperations.lambda$multiGet$7(DefaultValueOperations.java:180) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:222) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:189) at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96) at org.springframework.data.redis.core.DefaultValueOperations.multiGet(DefaultValueOperations.java:180) at Environment
- spring-data-redis: 2.4.15
- Lettuce version: 6.0.8
- Redis version: AWS Redis Cluster (3 nodes)
- Java version: 1.8
- Spring Boot version: 2.4.13
Redis&Lettuce Configuration
@Bean
@Primary
public LettuceConnectionFactory lettuceConnectionFactory() {
RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration(Collections.singletonList(nodes));
if (StrUtil.isNotBlank(password)) {
redisClusterConfiguration.setPassword(password);
}
return getLettuceConnectionFactory(redisClusterConfiguration, null);
}
private LettuceConnectionFactory getLettuceConnectionFactory(RedisClusterConfiguration redisClusterConfiguration, RedisStandaloneConfiguration redisStandaloneConfiguration) {
ClientOptions.Builder builder = initializeClientOptionsBuilder();
GenericObjectPoolConfig<Object> poolConfig = buildObjectGenericObjectPoolConfig();
LettucePoolingClientConfiguration.LettucePoolingClientConfigurationBuilder lettuceClientConfigurationBuilder = LettucePoolingClientConfiguration.builder()
.clientOptions(builder.build()).poolConfig(poolConfig).commandTimeout(Duration.ofMillis(timeoutMillis));
lettuceClientConfigurationBuilder.useSsl().disablePeerVerification();
LettuceClientConfiguration lettuceClientConfiguration;
lettuceClientConfiguration = lettuceClientConfigurationBuilder.readFrom(ReadFrom.REPLICA_PREFERRED).build();
if (Objects.nonNull(redisClusterConfiguration)) {
LettuceConnectionFactory lettuceConnectionFactory = new LettuceConnectionFactory(redisClusterConfiguration, lettuceClientConfiguration);
lettuceConnectionFactory.setShareNativeConnection(false);
lettuceConnectionFactory.afterPropertiesSet();
return lettuceConnectionFactory;
}
if (Objects.nonNull(redisStandaloneConfiguration)) {
LettuceConnectionFactory lettuceConnectionFactory = new LettuceConnectionFactory(redisStandaloneConfiguration, lettuceClientConfiguration);
lettuceConnectionFactory.setShareNativeConnection(false);
lettuceConnectionFactory.afterPropertiesSet();
return lettuceConnectionFactory;
}
throw new RuntimeException("redis configuration error");
}
private ClientOptions.Builder initializeClientOptionsBuilder() {
ClusterClientOptions.Builder builder = ClusterClientOptions.builder();
builder.validateClusterNodeMembership(false);
ClusterTopologyRefreshOptions.Builder refreshBuilder = ClusterTopologyRefreshOptions.builder()
.dynamicRefreshSources(true);
refreshBuilder.enablePeriodicRefresh(Duration.ofMillis(30000));
refreshBuilder.enableAllAdaptiveRefreshTriggers();
return builder.topologyRefreshOptions(refreshBuilder.build());
}
private GenericObjectPoolConfig<Object> buildObjectGenericObjectPoolConfig() {
GenericObjectPoolConfig<Object> poolConfig = new GenericObjectPoolConfig<>();
poolConfig.setMinIdle(64);
poolConfig.setMaxIdle(128);
poolConfig.setMaxTotal(128);
poolConfig.setMaxWaitMillis(5000);
poolConfig.setTestOnBorrow(true);
poolConfig.setTestWhileIdle(true);
poolConfig.setTimeBetweenEvictionRunsMillis(30000);
poolConfig.setMinEvictableIdleTimeMillis(600000);
return poolConfig;
}
@Bean
@Primary
public RedisTemplate<String, Object> redisTemplate(LettuceConnectionFactory lettuceConnectionFactory) {
RedisTemplate<String, Object> redisTemplate = new RedisTemplate<>();
redisTemplate.setConnectionFactory(lettuceConnectionFactory);
Jackson2JsonRedisSerializer<Object> jackson2JsonRedisSerializer = new Jackson2JsonRedisSerializer<>(Object.class);
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.setVisibility(PropertyAccessor.ALL, JsonAutoDetect.Visibility.ANY);
objectMapper.enableDefaultTyping(ObjectMapper.DefaultTyping.NON_FINAL);
jackson2JsonRedisSerializer.setObjectMapper(objectMapper);
StringRedisSerializer stringSerializer = new StringRedisSerializer(StandardCharsets.UTF_8);
redisTemplate.setKeySerializer(stringSerializer);
redisTemplate.setValueSerializer(jackson2JsonRedisSerializer);
redisTemplate.setHashKeySerializer(stringSerializer);
redisTemplate.setHashValueSerializer(jackson2JsonRedisSerializer);
redisTemplate.afterPropertiesSet();
return redisTemplate;
}
@Bean
public StringRedisTemplate stringRedisTemplate(LettuceConnectionFactory lettuceConnectionFactory) {
StringRedisTemplate stringRedisTemplate = new StringRedisTemplate();
stringRedisTemplate.setConnectionFactory(lettuceConnectionFactory);
return stringRedisTemplate;
}Additional context
This timeout only occurs occasionally (once every several weeks), but when it happens, all Redis commands fail for several seconds.
We’re trying to determine whether large cross-slot MGET operations could trigger a temporary stall or event loop blocking inside Lettuce.
Thank you very much for your time and for maintaining this excellent library!