Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions apps/docs/src/pages/en/courses/section.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,22 +55,26 @@ If drip configuration is enabled for a section, a student won't be able to acces
### Drip by Date

1. If you want a section to be available to users on a specific date, this is the option you should opt for.
2. Exact-date sections unlock only when their chosen date and time arrives.
3. Unlocking an exact-date section does not change the timing of other relative drip sections.

![Drip by Date](/assets/products/drip-by-date.png)

2. Select the date on which this section will be dripped.
3. Click `Continue` to save it.
4. Select the date on which this section will be dripped.
5. Click `Continue` to save it.

### Drip After a Certain Number of Days From Last Dripped Content

1. If you want a section to be available to users after a certain number of days have elapsed since the last dripped content, this is the option you should opt for.
2. Relative-date sections are released in section order. A later relative section waits for the earlier relative section before its own delay begins.
3. The first relative section counts from the student's enrollment date. After that, each newly released relative section becomes the anchor for the next relative section.

> For the first dripped section, the date of enrollment will be considered the last dripped content date.

![Drip After a Certain Number of Days Have Elapsed](/assets/products/drip-by-specific-days.png)

2. Select the number of days.
3. Click `Continue` to save it.
4. Select the number of days.
5. Click `Continue` to save it.

### Notify Users When a Section Has Dripped

Expand Down
27 changes: 26 additions & 1 deletion apps/queue/.env
Original file line number Diff line number Diff line change
@@ -1,11 +1,36 @@
# Port for the queue service to listen on
PORT=4000

# email configuration for sending notifications (e.g., course updates, reminders)
EMAIL_USER=email_user
EMAIL_PASS=email_pass
EMAIL_HOST=email_host
EMAIL_FROM=no-reply@example.com

# MongoDB connection string for storing queue data and job information
DB_CONNECTION_STRING=mongodb://db.string

# Redis configuration for caching and job queue management
REDIS_HOST=localhost
REDIS_PORT=6379

# Maximum number of times to retry a failed job before marking it as failed
SEQUENCE_BOUNCE_LIMIT=3

# Domain for constructing URLs in notifications and links
DOMAIN=courselit.app
PIXEL_SIGNING_SECRET=super_secret_string

# Secret key for signing tracking pixels (used in email notifications)
PIXEL_SIGNING_SECRET=super_secret_string

# Optional: enables PostHog exception tracking when set.
POSTHOG_API_KEY=

# Optional: PostHog host URL (default shown here).
POSTHOG_HOST=https://us.i.posthog.com

# Optional: per-source exception cap per minute (default: 100).
POSTHOG_ERROR_CAP_PER_SOURCE_PER_MINUTE=100

# Optional: deployment environment label sent in telemetry (default: unknown).
DEPLOY_ENV=local
4 changes: 4 additions & 0 deletions apps/queue/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ The following environment variables are used by the queue service:
- `EMAIL_PORT` - SMTP server port (default: `587`)
- `PORT` - HTTP server port (default: `80`)
- `NODE_ENV` - Environment mode. When set to `production`, emails are actually sent; otherwise they are only logged
- `POSTHOG_API_KEY` - Enables PostHog error tracking when set
- `POSTHOG_HOST` - PostHog host URL (default: `https://us.i.posthog.com`)
- `POSTHOG_ERROR_CAP_PER_SOURCE_PER_MINUTE` - Per-source exception cap (default: `100`)
- `DEPLOY_ENV` - Deployment environment label used in telemetry (default: `unknown`)
- `SEQUENCE_BOUNCE_LIMIT` - Maximum number of bounces allowed for email sequences (default: `3`)
- `PROTOCOL` - Protocol used for generating site URLs (default: `https`)
- `DOMAIN` - Base domain name for generating site URLs
Expand Down
261 changes: 261 additions & 0 deletions apps/queue/docs/wip/DRIP_HARDENING_GAPS_AND_ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
# Drip Hardening Gaps And Roadmap

## Context

This document captures the current missing pieces in drip functionality (queue + web integration), focused on scalability, maintainability, and industry-standard operational behavior.

Date: 2026-03-21
Scope:

- `apps/queue/src/domain/process-drip.ts`
- `apps/queue/src/domain/queries.ts`
- `apps/web/graphql/courses/logic.ts`

This list reflects remaining open hardening gaps on the current branch.
Last updated after:

- Domain-scoped membership query contract in queue drip flow.
- Regression test coverage for domain-scoped membership lookup and multi-group drip email behavior.

## 1) Duplicate Emails In Horizontal Scale (Critical)

Current behavior:

- Multiple queue instances can execute `processDrip()` concurrently.
- They can compute same unlocked groups and queue duplicate emails.
- User progress update is idempotent (`$addToSet`), email queueing is not.

Impact:

- Duplicate notifications to learners.
- Hard-to-debug race conditions.

Recommendation:

- Add distributed lock for drip worker cycle (Redis lock / BullMQ job lock / leader election).
- Add idempotency key for drip emails (for example: `drip:<domain>:<course>:<user>:<group>`).

Acceptance criteria:

- Running N workers concurrently does not create duplicate drip emails for same user/group release event.

## 2) Non-Atomic Unlock + Notify Flow (High)

Current behavior:

- Unlock state is written first, email queueing happens afterwards.
- If email enqueue fails, unlock succeeds but notification may be partially/fully lost.

Impact:

- Inconsistent learner communication.
- No deterministic retry for missed notifications.

Recommendation:

- Introduce outbox pattern:
- Persist release events atomically with unlock update.
- Separate reliable sender consumes outbox with retries and idempotency.

Acceptance criteria:

- If queue/email subsystem is down temporarily, unlock notifications are eventually delivered without duplicates.

## 3) Full Polling + In-Memory Expansion (High)

Current behavior:

- Every minute:
- loads all courses with drip,
- loads memberships per course,
- loads users per course,
- loops in application memory.

Impact:

- High DB load and memory pressure as tenants/courses/users scale.
- Runtime grows with total data size, not just due work.

Recommendation:

- Move to due-work driven execution:
- precompute next drip due per user/course purchase, or
- enqueue per-user drip check jobs at enrollment/release events.
- Use cursor/batch processing where full scan remains necessary.

Acceptance criteria:

- Runtime and DB pressure scale roughly with due drip events, not total historical volume.

## 4) Missing/Weak Indexing For Drip Query Paths (Medium)

Current behavior:

- Frequent query shapes in drip path do not have explicit compound indexes aligned to usage.

Primary candidates:

- `Membership`: `(domain, entityType, entityId, status, userId)`
- `User`: `(domain, userId)` (for bulk lookup by userIds in a domain)
- `Course`: if keeping scan approach, index around drip-enabled groups and domain as feasible.

Impact:

- Degraded throughput and elevated DB CPU at scale.

Recommendation:

- Add and validate compound indexes for hot predicates.
- Run explain plans before/after.

Acceptance criteria:

- Explain plans avoid broad collection scans for hot drip queries.

## 5) Rank Reorder Semantics For Relative Drip (Medium)

Current behavior:

- Relative drip is rank-ordered each run.
- If groups are reordered after enrollments, in-flight learner release path changes.

Impact:

- Learner-facing release schedule may shift unexpectedly.
- Support burden and potential trust issues.

Recommendation (choose one explicit product policy):

- Policy A (simple): lock relative rank editing once enrollments exist.
- Policy B (robust): persist per-user drip cursor/snapshot so future rank changes affect only new learners (or only after explicit migration).

Acceptance criteria:

- Reordering behavior is deterministic and documented.

## 6) Multi-Email Burst For Same-Run Unlocks (Medium)

Current behavior:

- If multiple groups unlock in one run and each has drip email configured, user gets multiple emails.

Impact:

- Notification fatigue.

Recommendation:

- Add configurable notification policy:
- `per_group` (current),
- `digest_per_run` (recommended default for larger schools).
- If digest enabled, provide editable digest template with localization strategy.

Acceptance criteria:

- Multi-unlock runs follow configured policy and are test-covered.

## 7) Data Validation Hardening For Drip Inputs (Medium)

Current behavior:

- Some server-side constraints are implicit/incomplete (for example, exact-date vs relative-date required fields).

Impact:

- Invalid drip configs can be stored and silently skipped.

Recommendation:

- Enforce stricter validation in `updateGroup`:
- `relative-date` requires numeric `delayInMillis >= 0`,
- `exact-date` requires valid numeric `dateInUTC`,
- optional sanity checks for email schema consistency.

Acceptance criteria:

- Invalid drip payloads are rejected with explicit errors.

## 8) Observability Gaps For Release Lifecycle (Medium)

Current behavior:

- Error capture exists, but release lifecycle visibility is limited.

Recommendation:

- Add structured counters/events:
- `drip_courses_scanned`
- `drip_users_evaluated`
- `drip_groups_unlocked`
- `drip_emails_queued`
- `drip_emails_failed`
- `drip_loop_duration_ms`
- Add per-domain dimensions where safe.

Acceptance criteria:

- Can answer: "How many unlocks and drip emails happened per domain/day and failure rate?"

## 9) Test Coverage Still Missing For Some Hard Cases (Medium)

Current behavior:

- Unit-level coverage has improved significantly, but concurrency/idempotency/outbox/reorder-policy and digest policy are not covered yet.

Recommendation:

- Add tests for:
- concurrent worker execution idempotency,
- outbox retry semantics,
- rank reorder policy behavior,
- digest mode behavior.

Acceptance criteria:

- Regression suite catches duplicate-email races and policy regressions.

## Proposed Implementation Plan

## Phase 1 (P0 - Safety)

- Add strict drip input validation in `updateGroup`.
- Add key indexes for hot query paths.
- Add release metrics counters.

## Phase 2 (P1 - Correctness Under Scale)

- Introduce distributed locking for drip worker cycle.
- Add email idempotency key to avoid duplicate sends.

## Phase 3 (P1/P2 - Reliability)

- Implement outbox for unlock-notification events.
- Add sender worker retry + dead-letter handling.

## Phase 4 (P2 - Product Policy)

- Decide and implement rank-reorder semantics for relative drip.
- Add digest mode and editable template/localization design.

## Suggested Ownership Split

- Queue worker correctness/scalability: `apps/queue`
- GraphQL validation + admin constraints: `apps/web/graphql`
- Schema/index migrations: `packages/common-logic`, `packages/orm-models`, migration scripts
- Metrics/dashboarding: observability owner

## Decision Log Needed

Before implementation, confirm:

- Reorder policy for in-flight learners (lock vs cursor snapshot).
- Email policy for multi-unlock runs (per-group vs digest).
- Preferred reliability model (direct enqueue with idempotency vs outbox).

## References

- `apps/queue/src/domain/process-drip.ts`
- `apps/queue/src/domain/queries.ts`
- `apps/web/graphql/courses/logic.ts`
- `apps/queue/src/domain/__tests__/process-drip.test.ts`
- `apps/web/graphql/courses/__tests__/update-group-drip.test.ts`
1 change: 0 additions & 1 deletion apps/queue/jest.config.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
const config = {
preset: "@shelf/jest-mongodb",
setupFilesAfterEnv: ["<rootDir>/setupTests.ts"],
watchPathIgnorePatterns: ["globalConfig"],
moduleNameMapper: {
Expand Down
6 changes: 6 additions & 0 deletions apps/queue/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@
"@courselit/email-editor": "workspace:^",
"@courselit/orm-models": "workspace:^",
"@courselit/utils": "workspace:^",
"@opentelemetry/api-logs": "^0.204.0",
"@opentelemetry/exporter-logs-otlp-http": "^0.204.0",
"@opentelemetry/resources": "^2.1.0",
"@opentelemetry/sdk-logs": "^0.204.0",
"@opentelemetry/sdk-node": "^0.204.0",
"@types/jsdom": "^21.1.7",
"bullmq": "^4.14.0",
"express": "^4.18.2",
Expand All @@ -26,6 +31,7 @@
"mongodb": "^6.15.0",
"mongoose": "^8.13.1",
"nodemailer": "^6.9.2",
"posthog-node": "^5.9.1",
"pino": "^8.14.1",
"pino-mongodb": "^4.3.0",
"zod": "^3.22.4"
Expand Down
Loading
Loading