perf(redis): resilience, Redis-backed rate limiter, and health metrics (#278) by Abhushan187 · Pull Request #296 · geturbackend/urBackend

Abhushan187 · 2026-06-09T06:15:02Z

Closes #278

Changes by file

File	Change
`packages/common/src/config/redis.js`	Exponential+jitter retry strategy, connection timeouts, keepalive, offline queue
`apps/public-api/src/middlewares/api_usage.js`	Replace in-memory `projectRateLimiter` with `rate-limit-redis` store; keeps global IP limiter
`apps/public-api/src/controllers/health.controller.js`	Add `getRedisHealth` with `INFO stats/memory/clients` metrics
`apps/public-api/src/routes/health.js`	Mount `GET /api/health/redis`
`apps/public-api/package.json`	Add `rate-limit-redis` dependency, `autocannon` devDependency
`docker-compose.yml`	Redis `maxclients 10000`, `tcp-keepalive 60`, `sysctls` tuning
`apps/public-api/tests/load/rate-limit.load.test.js`	Autocannon load test verifying 429 behavior under concurrency
`apps/public-api/tests/resilience/redis-failure.resilience.test.js`	Jest tests for graceful degradation on Redis disconnect

Technical Notes

rate-limit-redis uses redis.call(...) for pipelining compatibility with ioredis 5.x.
The existing global IP-based limiter is untouched; only the project-scoped rate limiter is migrated to Redis.
Retry strategy caps at 30s base delay + 100ms jitter to avoid thundering herd on Redis restart.
maxRetriesPerRequest: 3 prevents hung requests during transient outages.
Health endpoint parses INFO output with regex to avoid extra dependencies.

Summary by CodeRabbit

New Features
- Redis health monitoring endpoint for system status checks
- Project-specific rate limiting with configurable per-project limits
Improvements
- Enhanced Redis connection resilience with exponential backoff retry strategy and connection pooling
- Updated infrastructure configuration for improved stability and client capacity

coderabbitai · 2026-06-09T06:15:15Z

📝 Walkthrough

Walkthrough

This PR implements Redis connection pooling, health monitoring, and project-scoped rate limiting for the public API. It upgrades Redis client resilience with exponential backoff retry logic, adds a dedicated /redis health endpoint, introduces Redis-backed rate limiting per project, and includes comprehensive load and failure resilience tests.

Changes

Redis Connection Resilience and Rate Limiting

Layer / File(s)	Summary
Redis client resilience configuration and Docker setup `packages/common/src/config/redis.js`, `docker-compose.yml`	Redis client now retries with exponential backoff capped at 30s plus jitter, enables offline queuing and ready checks, sets connection/command/disconnect timeouts and keep-alive. Redis service pinned to `7-alpine` with explicit `redis-server` options for `--maxclients 10000` and `--tcp-keepalive 60`; kernel `sysctls` set to `net.core.somaxconn=65535`.
Redis health check endpoint `apps/public-api/src/controllers/health.controller.js`, `apps/public-api/src/routes/health.js`	New `getRedisHealth` controller checks Redis readiness and retrieves stats/memory/clients INFO fields, returning HTTP 200 with structured health object or HTTP 503 on failure. Endpoint registered at `GET /redis` in health routes.
Project-specific Redis rate limiter `apps/public-api/package.json`, `apps/public-api/src/middlewares/api_usage.js`	`rate-limit-redis` dependency added; `projectRateLimiter` middleware uses Redis store with `rl:project:` prefix, 15-minute window, dynamic per-project `max` from `req.project.rateLimit` (default 500), project-ID key generation, and custom JSON responses on limit exceed.
Load and resilience testing suite `apps/public-api/package.json`, `apps/public-api/tests/load/rate-limit.load.test.js`, `apps/public-api/tests/resilience/redis-failure.resilience.test.js`	`autocannon` dev dependency added. Load test clears/verifies Redis rate-limit keys and benchmarks 100 concurrent requests for 30s, reporting 2xx/429 counts and latency metrics. Resilience test simulates Redis disconnection, verifies health endpoint returns 503, and confirms API degrades gracefully without crashing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

nsoc26, level-3, quality:exceptional

Suggested reviewers

yash-pouranik

Poem

🐰 A pool of connections, now Redis won't strain,
With health checks and limits, no more connection pain!
Exponential backoff keeps retries so neat,
Rate limiting by project makes our API complete!
Tests prove resilience when Redis goes dark—
Our burrow's more stable, right down to the bark! 🌲

🚥 Pre-merge checks | ✅ 3 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check	❓ Inconclusive	The PR partially addresses issue `#278`'s objectives. It implements a Redis-backed rate limiter, health endpoint, resilience tests, and Docker tuning, but does not implement connection pooling with configurable pool size (max/min connections) as specified.	Clarify whether the shared Redis client approach meets pooling requirements, or if an explicit pool implementation with min/max connections should be added per issue `#278`.
Description check	❓ Inconclusive	The PR summary documents all changes clearly but lacks detailed inline explanations of the rate-limit-redis implementation and how it differs from expected pooling behavior.	Add explanation of whether the shared Redis client satisfies the pooling requirement or if further pooling implementation is needed.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding Redis resilience, a Redis-backed rate limiter, and health metrics.
Out of Scope Changes check	✅ Passed	All changes are within scope of implementing Redis resilience and rate limiting. Package.json adds rate-limit-redis and autocannon (necessary dependencies), and docker-compose tuning aligns with issue requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/public-api/src/controllers/health.controller.js`:
- Around line 73-78: The catch block in the health controller currently returns
raw err.message to clients; update the handler in the health check function (the
try/catch that references res and err) to wrap or normalize the error using the
project's AppError (or an equivalent sanitizer) and return a safe public message
(e.g., "Service unavailable") instead of err.message; ensure the response still
includes redis.connected: false and an optional non-sensitive error code or
AppError.safeMessage but never expose MongoDB or raw backend messages directly.
- Around line 35-39: The getRedisHealth controller currently returns a raw
object ({ status, timestamp, redis }) — update all return responses in the
getRedisHealth function (and the other similar response blocks at the other two
places referenced) to conform to the required response envelope: return
res.status(...).json({ success: boolean, data: { timestamp: string, redis: {
connected: boolean, error?: string } }, message: string }); ensure success is
false for 503/500 errors and true for 200, place timestamp and redis under data,
and provide a concise message string (e.g., "Redis not ready" or "Redis
healthy") so clients receive { success, data, message }.

In `@apps/public-api/src/middlewares/api_usage.js`:
- Around line 22-48: The projectRateLimiter constant is never exported so
downstream code that imports { limiter, logger } can't use it; update the
module.exports object to include projectRateLimiter (alongside limiter and
logger) so the project-scoped Redis rate limiter is available to middleware
wiring; locate projectRateLimiter in this file and add its identifier to the
exported properties (ensure export naming matches how other modules import it).

In `@apps/public-api/tests/resilience/redis-failure.resilience.test.js`:
- Around line 10-20: The test titled "health endpoint returns 503 when Redis is
unreachable" disconnects Redis with redis.disconnect() but only reconnects on
the happy path; update the test so the disconnect/reconnect is enclosed in a
try/finally (or use finally semantics) around the request and assertions,
calling redis.connect() in the finally block to guarantee reconnection even if
assertions fail; locate the test block (the async test function that calls
redis.disconnect(), request(app).get('/api/health/redis'), and redis.connect())
and modify it to always call redis.connect() in finally.
- Around line 6-8: The test teardown in afterEach currently calls
redis.flushdb(), which is too destructive for shared Redis; change it to only
remove keys created by this suite by tracking and deleting keys by prefix or
pattern instead. Modify the afterEach handler that references redis.flushdb() to
use a safe deletion strategy (e.g., SCAN/KEYS with the test-specific prefix or
maintain a list/set of created keys during tests and call DEL on those entries)
and ensure any helper that creates keys (tests that call redis.set/redis.hset)
uses the agreed prefix so keys can be reliably targeted for cleanup.
- Around line 29-31: The test currently accepts any of [200, 401, 403, 500]
which masks regressions; update the assertion on the response (the expect([...])
that checks res.status) to assert the narrow set required by the middleware
contract—preferably a single expected status (e.g., 401 or 500) or at most two
statuses that are explicitly allowed during Redis degradation—so change the
expect call on res.status in the redis-failure.resilience.test to that narrow
set and add a brief comment linking it to the expected degraded behavior from
the auth/middleware code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d9aae05-9154-48eb-be94-62a99a028370

📥 Commits

Reviewing files that changed from the base of the PR and between b6e59ab and 21e346c.

📒 Files selected for processing (8)

apps/public-api/package.json
apps/public-api/src/controllers/health.controller.js
apps/public-api/src/middlewares/api_usage.js
apps/public-api/src/routes/health.js
apps/public-api/tests/load/rate-limit.load.test.js
apps/public-api/tests/resilience/redis-failure.resilience.test.js
docker-compose.yml
packages/common/src/config/redis.js

coderabbitai · 2026-06-09T06:19:38Z

+        return res.status(503).json({
+            status: 'error',
+            timestamp: new Date().toISOString(),
+            redis: { connected: false, error: 'Redis client not ready' },
+        });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the required controller response envelope for getRedisHealth.

This endpoint currently returns { status, timestamp, redis }, which breaks the controller response contract and can break clients expecting { success, data, message }.

As per coding guidelines, "All API endpoints must return response format: { success: bool, data: {}, message: "" }."

Also applies to: 61-63, 74-78

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/public-api/src/controllers/health.controller.js` around lines 35 - 39, The getRedisHealth controller currently returns a raw object ({ status, timestamp, redis }) — update all return responses in the getRedisHealth function (and the other similar response blocks at the other two places referenced) to conform to the required response envelope: return res.status(...).json({ success: boolean, data: { timestamp: string, redis: { connected: boolean, error?: string } }, message: string }); ensure success is false for 503/500 errors and true for 200, place timestamp and redis under data, and provide a concise message string (e.g., "Redis not ready" or "Redis healthy") so clients receive { success, data, message }.

Source: Coding guidelines

coderabbitai · 2026-06-09T06:19:38Z

+    } catch (err) {
+        return res.status(503).json({
+            status: 'error',
+            timestamp: new Date().toISOString(),
+            redis: { connected: false, error: err.message },
+        });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not return raw backend error messages from the health controller.

Line 77 exposes err.message to clients. Please route this through AppError (or equivalent sanitized handling) and return a safe public message.

As per coding guidelines, "Use AppError class for errors. Never use raw throw statements or expose MongoDB errors to clients."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/public-api/src/controllers/health.controller.js` around lines 73 - 78, The catch block in the health controller currently returns raw err.message to clients; update the handler in the health check function (the try/catch that references res and err) to wrap or normalize the error using the project's AppError (or an equivalent sanitizer) and return a safe public message (e.g., "Service unavailable") instead of err.message; ensure the response still includes redis.connected: false and an optional non-sensitive error code or AppError.safeMessage but never expose MongoDB or raw backend messages directly.

Source: Coding guidelines

coderabbitai · 2026-06-09T06:19:38Z

+const projectRateLimiter = rateLimit({
+    store: new RedisStore({
+        sendCommand: (...args) => redis.call(...args),
+        prefix: 'rl:project:',
+    }),
+    windowMs: 15 * 60 * 1000,
+    max: async (req, res) => {
+        if (req.project && req.project.rateLimit) {
+            return req.project.rateLimit;
+        }
+        return 500;
+    },
+    keyGenerator: (req, res) => {
+        if (!req.project || !req.project._id) {
+            return 'unauthorized';
+        }
+        return req.project._id.toString();
+    },
+    handler: (req, res, next, options) => {
+        res.status(options.statusCode).json({
+            error: "Too Many Requests",
+            message: "Project rate limit exceeded. Please try again later."
+        });
+    },
+    standardHeaders: true,
+    legacyHeaders: false,
+});


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

projectRateLimiter is defined but not exported, so it is never applied.

The new limiter is created but module.exports omits it, and downstream middleware wiring currently imports only { limiter, logger }. This leaves project-scoped Redis rate limiting inactive.

Suggested minimal fix

-module.exports = { limiter, logger }; +module.exports = { limiter, logger, projectRateLimiter };

Also applies to: 150-150

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/public-api/src/middlewares/api_usage.js` around lines 22 - 48, The projectRateLimiter constant is never exported so downstream code that imports { limiter, logger } can't use it; update the module.exports object to include projectRateLimiter (alongside limiter and logger) so the project-scoped Redis rate limiter is available to middleware wiring; locate projectRateLimiter in this file and add its identifier to the exported properties (ensure export naming matches how other modules import it).

coderabbitai · 2026-06-09T06:19:38Z

+    afterEach(async () => {
+        if (redis.status === 'ready') await redis.flushdb();
+    });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

flushdb() is too destructive for shared test Redis instances.

At Line 7, redis.flushdb() clears the full DB. This can erase unrelated fixtures/tests when Redis DB isolation is not guaranteed. Prefer deleting only keys created by this suite/prefix.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/public-api/tests/resilience/redis-failure.resilience.test.js` around lines 6 - 8, The test teardown in afterEach currently calls redis.flushdb(), which is too destructive for shared Redis; change it to only remove keys created by this suite by tracking and deleting keys by prefix or pattern instead. Modify the afterEach handler that references redis.flushdb() to use a safe deletion strategy (e.g., SCAN/KEYS with the test-specific prefix or maintain a list/set of created keys during tests and call DEL on those entries) and ensure any helper that creates keys (tests that call redis.set/redis.hset) uses the agreed prefix so keys can be reliably targeted for cleanup.

coderabbitai · 2026-06-09T06:19:38Z

+    test('health endpoint returns 503 when Redis is unreachable', async () => {
+        // Simulate Redis failure by disconnecting temporarily
+        await redis.disconnect();
+
+        const res = await request(app).get('/api/health/redis');
+        expect(res.status).toBe(503);
+        expect(res.body.redis.connected).toBe(false);
+
+        // Reconnect for subsequent tests
+        await redis.connect();
+    });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Always reconnect Redis in a finally block.

At Line 19 and Line 32, reconnect only happens on the happy path. If an assertion fails earlier, the suite can leave Redis disconnected and create cascading failures.

Suggested fix

test('health endpoint returns 503 when Redis is unreachable', async () => { - // Simulate Redis failure by disconnecting temporarily - await redis.disconnect(); - - const res = await request(app).get('/api/health/redis'); - expect(res.status).toBe(503); - expect(res.body.redis.connected).toBe(false); - - // Reconnect for subsequent tests - await redis.connect(); + await redis.disconnect(); + try { + const res = await request(app).get('/api/health/redis'); + expect(res.status).toBe(503); + expect(res.body.redis.connected).toBe(false); + } finally { + await redis.connect(); + } }); test('public API endpoints degrade gracefully when Redis is down', async () => { - await redis.disconnect(); - - const res = await request(app) - .get('/api/data') - .set('x-api-key', 'test-key'); - - // Should not crash; may return 401 or 500 depending on auth flow - expect([200, 401, 403, 500]).toContain(res.status); - - await redis.connect(); + await redis.disconnect(); + try { + const res = await request(app) + .get('/api/data') + .set('x-api-key', 'test-key'); + expect([200, 401, 403, 500]).toContain(res.status); + } finally { + await redis.connect(); + } });

Also applies to: 22-33

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/public-api/tests/resilience/redis-failure.resilience.test.js` around lines 10 - 20, The test titled "health endpoint returns 503 when Redis is unreachable" disconnects Redis with redis.disconnect() but only reconnects on the happy path; update the test so the disconnect/reconnect is enclosed in a try/finally (or use finally semantics) around the request and assertions, calling redis.connect() in the finally block to guarantee reconnection even if assertions fail; locate the test block (the async test function that calls redis.disconnect(), request(app).get('/api/health/redis'), and redis.connect()) and modify it to always call redis.connect() in finally.

coderabbitai · 2026-06-09T06:19:39Z

+        // Should not crash; may return 401 or 500 depending on auth flow
+        expect([200, 401, 403, 500]).toContain(res.status);
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Status assertion is too broad to catch regressions.

At Line 30, allowing 200/401/403/500 makes this test pass for many unrelated failures. Assert the specific expected degraded behavior (or a narrow set tied to current middleware contract) so this test actually protects behavior.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/public-api/tests/resilience/redis-failure.resilience.test.js` around lines 29 - 31, The test currently accepts any of [200, 401, 403, 500] which masks regressions; update the assertion on the response (the expect([...]) that checks res.status) to assert the narrow set required by the middleware contract—preferably a single expected status (e.g., 401 or 500) or at most two statuses that are explicitly allowed during Redis degradation—so change the expect call on res.status in the redis-failure.resilience.test to that narrow set and add a brief comment linking it to the expected degraded behavior from the auth/middleware code.

yash-pouranik · 2026-06-11T08:53:57Z

@Abhushan187
please fix the issues pointed by coderabbit.

yash-pouranik · 2026-06-11T09:07:40Z

and 4 CI checks are also failing.

Abhushan187 requested review from Nitin-kumar-yadav1307 and yash-pouranik as code owners June 9, 2026 06:15

perf(redis): add resilience, Redis-backed rate limiter, health metrics

21e346c

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

		// Should not crash; may return 401 or 500 depending on auth flow
		expect([200, 401, 403, 500]).toContain(res.status);

Conversation

Abhushan187 commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes by file

Technical Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 2 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

yash-pouranik commented Jun 11, 2026

Uh oh!

yash-pouranik commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Abhushan187 commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading