perf(redis): resilience, Redis-backed rate limiter, and health metrics (#278)#296
perf(redis): resilience, Redis-backed rate limiter, and health metrics (#278)#296Abhushan187 wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThis PR implements Redis connection pooling, health monitoring, and project-scoped rate limiting for the public API. It upgrades Redis client resilience with exponential backoff retry logic, adds a dedicated ChangesRedis Connection Resilience and Rate Limiting
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 3❌ Failed checks (1 warning, 2 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/public-api/src/controllers/health.controller.js`:
- Around line 73-78: The catch block in the health controller currently returns
raw err.message to clients; update the handler in the health check function (the
try/catch that references res and err) to wrap or normalize the error using the
project's AppError (or an equivalent sanitizer) and return a safe public message
(e.g., "Service unavailable") instead of err.message; ensure the response still
includes redis.connected: false and an optional non-sensitive error code or
AppError.safeMessage but never expose MongoDB or raw backend messages directly.
- Around line 35-39: The getRedisHealth controller currently returns a raw
object ({ status, timestamp, redis }) — update all return responses in the
getRedisHealth function (and the other similar response blocks at the other two
places referenced) to conform to the required response envelope: return
res.status(...).json({ success: boolean, data: { timestamp: string, redis: {
connected: boolean, error?: string } }, message: string }); ensure success is
false for 503/500 errors and true for 200, place timestamp and redis under data,
and provide a concise message string (e.g., "Redis not ready" or "Redis
healthy") so clients receive { success, data, message }.
In `@apps/public-api/src/middlewares/api_usage.js`:
- Around line 22-48: The projectRateLimiter constant is never exported so
downstream code that imports { limiter, logger } can't use it; update the
module.exports object to include projectRateLimiter (alongside limiter and
logger) so the project-scoped Redis rate limiter is available to middleware
wiring; locate projectRateLimiter in this file and add its identifier to the
exported properties (ensure export naming matches how other modules import it).
In `@apps/public-api/tests/resilience/redis-failure.resilience.test.js`:
- Around line 10-20: The test titled "health endpoint returns 503 when Redis is
unreachable" disconnects Redis with redis.disconnect() but only reconnects on
the happy path; update the test so the disconnect/reconnect is enclosed in a
try/finally (or use finally semantics) around the request and assertions,
calling redis.connect() in the finally block to guarantee reconnection even if
assertions fail; locate the test block (the async test function that calls
redis.disconnect(), request(app).get('/api/health/redis'), and redis.connect())
and modify it to always call redis.connect() in finally.
- Around line 6-8: The test teardown in afterEach currently calls
redis.flushdb(), which is too destructive for shared Redis; change it to only
remove keys created by this suite by tracking and deleting keys by prefix or
pattern instead. Modify the afterEach handler that references redis.flushdb() to
use a safe deletion strategy (e.g., SCAN/KEYS with the test-specific prefix or
maintain a list/set of created keys during tests and call DEL on those entries)
and ensure any helper that creates keys (tests that call redis.set/redis.hset)
uses the agreed prefix so keys can be reliably targeted for cleanup.
- Around line 29-31: The test currently accepts any of [200, 401, 403, 500]
which masks regressions; update the assertion on the response (the expect([...])
that checks res.status) to assert the narrow set required by the middleware
contract—preferably a single expected status (e.g., 401 or 500) or at most two
statuses that are explicitly allowed during Redis degradation—so change the
expect call on res.status in the redis-failure.resilience.test to that narrow
set and add a brief comment linking it to the expected degraded behavior from
the auth/middleware code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7d9aae05-9154-48eb-be94-62a99a028370
📒 Files selected for processing (8)
apps/public-api/package.jsonapps/public-api/src/controllers/health.controller.jsapps/public-api/src/middlewares/api_usage.jsapps/public-api/src/routes/health.jsapps/public-api/tests/load/rate-limit.load.test.jsapps/public-api/tests/resilience/redis-failure.resilience.test.jsdocker-compose.ymlpackages/common/src/config/redis.js
| return res.status(503).json({ | ||
| status: 'error', | ||
| timestamp: new Date().toISOString(), | ||
| redis: { connected: false, error: 'Redis client not ready' }, | ||
| }); |
There was a problem hiding this comment.
Use the required controller response envelope for getRedisHealth.
This endpoint currently returns { status, timestamp, redis }, which breaks the controller response contract and can break clients expecting { success, data, message }.
As per coding guidelines, "All API endpoints must return response format: { success: bool, data: {}, message: "" }."
Also applies to: 61-63, 74-78
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/public-api/src/controllers/health.controller.js` around lines 35 - 39,
The getRedisHealth controller currently returns a raw object ({ status,
timestamp, redis }) — update all return responses in the getRedisHealth function
(and the other similar response blocks at the other two places referenced) to
conform to the required response envelope: return res.status(...).json({
success: boolean, data: { timestamp: string, redis: { connected: boolean,
error?: string } }, message: string }); ensure success is false for 503/500
errors and true for 200, place timestamp and redis under data, and provide a
concise message string (e.g., "Redis not ready" or "Redis healthy") so clients
receive { success, data, message }.
Source: Coding guidelines
| } catch (err) { | ||
| return res.status(503).json({ | ||
| status: 'error', | ||
| timestamp: new Date().toISOString(), | ||
| redis: { connected: false, error: err.message }, | ||
| }); |
There was a problem hiding this comment.
Do not return raw backend error messages from the health controller.
Line 77 exposes err.message to clients. Please route this through AppError (or equivalent sanitized handling) and return a safe public message.
As per coding guidelines, "Use AppError class for errors. Never use raw throw statements or expose MongoDB errors to clients."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/public-api/src/controllers/health.controller.js` around lines 73 - 78,
The catch block in the health controller currently returns raw err.message to
clients; update the handler in the health check function (the try/catch that
references res and err) to wrap or normalize the error using the project's
AppError (or an equivalent sanitizer) and return a safe public message (e.g.,
"Service unavailable") instead of err.message; ensure the response still
includes redis.connected: false and an optional non-sensitive error code or
AppError.safeMessage but never expose MongoDB or raw backend messages directly.
Source: Coding guidelines
| const projectRateLimiter = rateLimit({ | ||
| store: new RedisStore({ | ||
| sendCommand: (...args) => redis.call(...args), | ||
| prefix: 'rl:project:', | ||
| }), | ||
| windowMs: 15 * 60 * 1000, | ||
| max: async (req, res) => { | ||
| if (req.project && req.project.rateLimit) { | ||
| return req.project.rateLimit; | ||
| } | ||
| return 500; | ||
| }, | ||
| keyGenerator: (req, res) => { | ||
| if (!req.project || !req.project._id) { | ||
| return 'unauthorized'; | ||
| } | ||
| return req.project._id.toString(); | ||
| }, | ||
| handler: (req, res, next, options) => { | ||
| res.status(options.statusCode).json({ | ||
| error: "Too Many Requests", | ||
| message: "Project rate limit exceeded. Please try again later." | ||
| }); | ||
| }, | ||
| standardHeaders: true, | ||
| legacyHeaders: false, | ||
| }); |
There was a problem hiding this comment.
projectRateLimiter is defined but not exported, so it is never applied.
The new limiter is created but module.exports omits it, and downstream middleware wiring currently imports only { limiter, logger }. This leaves project-scoped Redis rate limiting inactive.
Suggested minimal fix
-module.exports = { limiter, logger };
+module.exports = { limiter, logger, projectRateLimiter };Also applies to: 150-150
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/public-api/src/middlewares/api_usage.js` around lines 22 - 48, The
projectRateLimiter constant is never exported so downstream code that imports {
limiter, logger } can't use it; update the module.exports object to include
projectRateLimiter (alongside limiter and logger) so the project-scoped Redis
rate limiter is available to middleware wiring; locate projectRateLimiter in
this file and add its identifier to the exported properties (ensure export
naming matches how other modules import it).
| afterEach(async () => { | ||
| if (redis.status === 'ready') await redis.flushdb(); | ||
| }); |
There was a problem hiding this comment.
flushdb() is too destructive for shared test Redis instances.
At Line 7, redis.flushdb() clears the full DB. This can erase unrelated fixtures/tests when Redis DB isolation is not guaranteed. Prefer deleting only keys created by this suite/prefix.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/public-api/tests/resilience/redis-failure.resilience.test.js` around
lines 6 - 8, The test teardown in afterEach currently calls redis.flushdb(),
which is too destructive for shared Redis; change it to only remove keys created
by this suite by tracking and deleting keys by prefix or pattern instead. Modify
the afterEach handler that references redis.flushdb() to use a safe deletion
strategy (e.g., SCAN/KEYS with the test-specific prefix or maintain a list/set
of created keys during tests and call DEL on those entries) and ensure any
helper that creates keys (tests that call redis.set/redis.hset) uses the agreed
prefix so keys can be reliably targeted for cleanup.
| test('health endpoint returns 503 when Redis is unreachable', async () => { | ||
| // Simulate Redis failure by disconnecting temporarily | ||
| await redis.disconnect(); | ||
|
|
||
| const res = await request(app).get('/api/health/redis'); | ||
| expect(res.status).toBe(503); | ||
| expect(res.body.redis.connected).toBe(false); | ||
|
|
||
| // Reconnect for subsequent tests | ||
| await redis.connect(); | ||
| }); |
There was a problem hiding this comment.
Always reconnect Redis in a finally block.
At Line 19 and Line 32, reconnect only happens on the happy path. If an assertion fails earlier, the suite can leave Redis disconnected and create cascading failures.
Suggested fix
test('health endpoint returns 503 when Redis is unreachable', async () => {
- // Simulate Redis failure by disconnecting temporarily
- await redis.disconnect();
-
- const res = await request(app).get('/api/health/redis');
- expect(res.status).toBe(503);
- expect(res.body.redis.connected).toBe(false);
-
- // Reconnect for subsequent tests
- await redis.connect();
+ await redis.disconnect();
+ try {
+ const res = await request(app).get('/api/health/redis');
+ expect(res.status).toBe(503);
+ expect(res.body.redis.connected).toBe(false);
+ } finally {
+ await redis.connect();
+ }
});
test('public API endpoints degrade gracefully when Redis is down', async () => {
- await redis.disconnect();
-
- const res = await request(app)
- .get('/api/data')
- .set('x-api-key', 'test-key');
-
- // Should not crash; may return 401 or 500 depending on auth flow
- expect([200, 401, 403, 500]).toContain(res.status);
-
- await redis.connect();
+ await redis.disconnect();
+ try {
+ const res = await request(app)
+ .get('/api/data')
+ .set('x-api-key', 'test-key');
+ expect([200, 401, 403, 500]).toContain(res.status);
+ } finally {
+ await redis.connect();
+ }
});Also applies to: 22-33
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/public-api/tests/resilience/redis-failure.resilience.test.js` around
lines 10 - 20, The test titled "health endpoint returns 503 when Redis is
unreachable" disconnects Redis with redis.disconnect() but only reconnects on
the happy path; update the test so the disconnect/reconnect is enclosed in a
try/finally (or use finally semantics) around the request and assertions,
calling redis.connect() in the finally block to guarantee reconnection even if
assertions fail; locate the test block (the async test function that calls
redis.disconnect(), request(app).get('/api/health/redis'), and redis.connect())
and modify it to always call redis.connect() in finally.
| // Should not crash; may return 401 or 500 depending on auth flow | ||
| expect([200, 401, 403, 500]).toContain(res.status); | ||
|
|
There was a problem hiding this comment.
Status assertion is too broad to catch regressions.
At Line 30, allowing 200/401/403/500 makes this test pass for many unrelated failures. Assert the specific expected degraded behavior (or a narrow set tied to current middleware contract) so this test actually protects behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/public-api/tests/resilience/redis-failure.resilience.test.js` around
lines 29 - 31, The test currently accepts any of [200, 401, 403, 500] which
masks regressions; update the assertion on the response (the expect([...]) that
checks res.status) to assert the narrow set required by the middleware
contract—preferably a single expected status (e.g., 401 or 500) or at most two
statuses that are explicitly allowed during Redis degradation—so change the
expect call on res.status in the redis-failure.resilience.test to that narrow
set and add a brief comment linking it to the expected degraded behavior from
the auth/middleware code.
|
@Abhushan187 |
|
and 4 CI checks are also failing. |
Closes #278
Changes by file
packages/common/src/config/redis.jsapps/public-api/src/middlewares/api_usage.jsprojectRateLimiterwithrate-limit-redisstore; keeps global IP limiterapps/public-api/src/controllers/health.controller.jsgetRedisHealthwithINFO stats/memory/clientsmetricsapps/public-api/src/routes/health.jsGET /api/health/redisapps/public-api/package.jsonrate-limit-redisdependency,autocannondevDependencydocker-compose.ymlmaxclients 10000,tcp-keepalive 60,sysctlstuningapps/public-api/tests/load/rate-limit.load.test.jsapps/public-api/tests/resilience/redis-failure.resilience.test.jsTechnical Notes
rate-limit-redisusesredis.call(...)for pipelining compatibility with ioredis 5.x.limiteris untouched; only the project-scoped rate limiter is migrated to Redis.maxRetriesPerRequest: 3prevents hung requests during transient outages.INFOoutput with regex to avoid extra dependencies.Summary by CodeRabbit
New Features
Improvements