Skip to content

Conversation

@MODSetter
Copy link
Owner

@MODSetter MODSetter commented Oct 23, 2025

Description

added periodic indexing for indexable search source connectors

Screenshots

API Changes

  • This PR includes API changes

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring
  • Documentation
  • Dependency/Build system
  • Breaking change
  • Other (specify):

Testing Performed

  • Tested locally
  • Manual/QA verification

Checklist

  • Follows project coding standards and conventions
  • Documentation updated as needed
  • Dependencies updated as needed
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR implements periodic indexing functionality for indexable search source connectors, allowing automated data synchronization at configurable intervals. The implementation uses a meta-scheduler pattern with Celery Beat, where a single scheduled task checks the database every minute (configurable) for connectors due for indexing, rather than creating individual Beat schedules per connector. This includes database schema changes to store scheduling information (periodic_indexing_enabled, indexing_frequency_minutes, next_scheduled_at), backend API updates to manage periodic schedules, a new Celery Beat service in Docker, and a comprehensive UI for configuring periodic indexing with preset frequency options (15m, 1h, 6h, 12h, daily, weekly) or custom intervals.

⏱️ Estimated Review Time: 30-90 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/alembic/versions/32_add_periodic_indexing_fields.py
2 surfsense_backend/app/db.py
3 surfsense_backend/app/schemas/search_source_connector.py
4 surfsense_backend/app/celery_app.py
5 surfsense_backend/app/tasks/celery_tasks/schedule_checker_task.py
6 surfsense_backend/app/utils/periodic_scheduler.py
7 surfsense_backend/app/routes/search_source_connectors_routes.py
8 surfsense_web/hooks/use-search-source-connectors.ts
9 surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx
10 docker-compose.yml
11 surfsense_backend/.env.example
12 surfsense_web/content/docs/docker-installation.mdx
13 surfsense_web/content/docs/manual-installation.mdx
14 surfsense_backend/.gitignore
15 surfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.py
16 README.md
17 surfsense_web/components/homepage/navbar.tsx
⚠️ Inconsistent Changes Detected
File Path Warning
README.md Removal of AWS/Vercel outage announcement appears to be housekeeping unrelated to periodic indexing feature
surfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.py Bug fixes to migration script handling NULL values and missing user_id column are unrelated to periodic indexing functionality
surfsense_web/components/homepage/navbar.tsx Mobile navigation improvements including touch-manipulation classes and Discord link updates are unrelated to periodic indexing feature

Need help? Join our Discord

Analyze latest changes

Summary by CodeRabbit

  • New Features

    • Added periodic indexing scheduling for search connectors with configurable frequency intervals.
    • New UI controls to enable/disable periodic indexing and set custom frequencies in the connectors management dashboard.
    • Celery Beat scheduler service now included for automated periodic task execution.
  • Documentation

    • Updated Docker and manual installation guides with Celery Beat setup and scheduling configuration instructions.
  • UI/UX Improvements

    • Enhanced mobile menu toggle with improved accessibility and button styling.
    • Updated navigation links and improved touch interaction experience.

@vercel
Copy link

vercel bot commented Oct 23, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
surf-sense-frontend Ready Ready Preview Comment Oct 23, 2025 8:01am

@coderabbitai
Copy link

coderabbitai bot commented Oct 23, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This PR introduces periodic indexing functionality for search source connectors. Changes include adding database fields to track periodic indexing state, creating a Celery Beat scheduler service to check and trigger indexing tasks on schedule, updating API schemas and routes to support periodic configuration with validation, building UI components for periodic indexing management, and updating documentation and environment configuration.

Changes

Cohort / File(s) Summary
Configuration & Docker Setup
docker-compose.yml, surfsense_backend/.env.example, surfsense_backend/.gitignore
Added celery_beat service to docker-compose with dependencies and shared configuration; introduced SCHEDULE_CHECKER_INTERVAL environment variable with 5m default; added .gitignore patterns for celery beat schedule artifacts.
Database Schema Migrations
surfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.py, surfsense_backend/alembic/versions/32_add_periodic_indexing_fields.py
Updated migration 25 with conditional guards to handle missing user_id columns; introduced migration 32 to add three new columns (periodic_indexing_enabled, indexing_frequency_minutes, next_scheduled_at) to search_source_connectors table with idempotent, guarded operations.
Backend Database Model
surfsense_backend/app/db.py
Extended SearchSourceConnector table with three periodic indexing fields: periodic_indexing_enabled (Boolean, default False), indexing_frequency_minutes (Integer, nullable), and next_scheduled_at (TIMESTAMP, nullable).
Backend API Schema & Validation
surfsense_backend/app/schemas/search_source_connector.py
Added periodic indexing fields to SearchSourceConnectorBase and SearchSourceConnectorUpdate schemas; introduced model-level validator validate_periodic_indexing to enforce consistency requirements.
Celery Scheduling Core
surfsense_backend/app/celery_app.py
Added parse_schedule_interval helper function to convert interval strings (1m, 5m, etc.) to crontab parameters; introduced SCHEDULE_CHECKER_INTERVAL configuration constant; extended beat_schedule with new check_periodic_schedules task.
Periodic Scheduling Tasks
surfsense_backend/app/tasks/celery_tasks/schedule_checker_task.py, surfsense_backend/app/utils/periodic_scheduler.py
Created schedule_checker_task.py implementing check_periodic_schedules_task that queries and triggers due connectors; created periodic_scheduler.py with utility functions (create_periodic_schedule, delete_periodic_schedule, update_periodic_schedule) to manage periodic schedule lifecycle and initial task dispatch.
API Routes Integration
surfsense_backend/app/routes/search_source_connectors_routes.py
Integrated periodic scheduling utilities into connector create, update, and delete operations; added automatic next_scheduled_at computation; added validation and schedule lifecycle management (create/update/delete) on connector changes.
Frontend TypeScript Types
surfsense_web/hooks/use-search-source-connectors.ts
Extended SearchSourceConnector interface with three new fields: periodic_indexing_enabled, indexing_frequency_minutes, and next_scheduled_at.
Frontend UI Components
surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx
Added periodic indexing configuration dialog with enable switch, frequency presets, custom input, and preview; introduced new "Periodic" table column displaying per-connector status; integrated updateConnector function for persisting periodic settings.
Documentation
surfsense_web/content/docs/docker-installation.mdx, surfsense_web/content/docs/manual-installation.mdx
Added Docker Services Overview section and environment variable documentation; introduced Celery Beat startup instructions for manual installation with scheduling configuration notes.
Homepage Component & Readme
surfsense_web/components/homepage/navbar.tsx, README.md
Refactored mobile menu toggle into dedicated button wrapper with improved accessibility; changed bottom CTA to Link component; updated URLs for Discord/GitHub links; removed announcement block from README.

Sequence Diagram(s)

sequenceDiagram
    participant CeleryBeat as Celery Beat<br/>(Scheduler)
    participant CheckTask as check_periodic<br/>_schedules_task
    participant DB as Database
    participant PeriodicScheduler as periodic_scheduler<br/>(utilities)
    participant IndexingTasks as Connector-Specific<br/>Indexing Tasks

    CeleryBeat->>CheckTask: Triggers at configured interval<br/>(SCHEDULE_CHECKER_INTERVAL)
    CheckTask->>DB: Query connectors where<br/>periodic_indexing_enabled=true<br/>AND next_scheduled_at <= NOW
    DB-->>CheckTask: Return due connectors
    
    alt Connectors are due
        CheckTask->>IndexingTasks: .delay() for each<br/>connector's mapped task
        IndexingTasks->>DB: Execute indexing
        CheckTask->>DB: Update next_scheduled_at<br/>to NOW + frequency
        DB-->>CheckTask: Confirm update
    else No due connectors
        CheckTask->>CheckTask: Log and continue
    end
    
    CheckTask-->>CeleryBeat: Task complete
Loading
sequenceDiagram
    participant User as User
    participant ConnectorsUI as Connectors Page<br/>(UI)
    participant API as API Routes
    participant PeriodicScheduler as periodic_scheduler
    participant DB as Database

    User->>ConnectorsUI: Click configure<br/>periodic indexing
    ConnectorsUI->>ConnectorsUI: Open dialog with<br/>enable/frequency options
    User->>ConnectorsUI: Enable & set frequency,<br/>click Save
    
    ConnectorsUI->>API: updateConnector with<br/>periodic_indexing_enabled,<br/>indexing_frequency_minutes
    
    API->>DB: Validate & persist<br/>connector updates
    
    alt Enabling periodic indexing
        API->>PeriodicScheduler: create_periodic_schedule()
        PeriodicScheduler->>IndexingTasks: Dispatch initial indexing<br/>task via .delay()
        PeriodicScheduler-->>API: Return success
    else Disabling periodic indexing
        API->>PeriodicScheduler: delete_periodic_schedule()
        PeriodicScheduler-->>API: Return success
    end
    
    API-->>ConnectorsUI: Update response
    ConnectorsUI->>User: Show success toast<br/>& update UI
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

The changes span multiple subsystems: database migrations with conditional logic, asynchronous Celery task orchestration with scheduling, RESTful API integration with business logic validation, and frontend UI state management with async operations. The feature threads through database layer, backend scheduling infrastructure, API routes with side effects, and frontend forms/dialogs. Cross-layer integration points and conditional migration logic increase complexity.

Possibly related PRs

Suggested reviewers

  • Utkarsh-Patel-13

Poem

🐰 Hops through schedules with glee,
Celery Beat keeps things on spree,
Periodic indexing, precise and clean,
A scheduler's dream, a rabbit's keen scene! ✨🕐

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70808eb and aed8163.

📒 Files selected for processing (17)
  • README.md (0 hunks)
  • docker-compose.yml (1 hunks)
  • surfsense_backend/.env.example (1 hunks)
  • surfsense_backend/.gitignore (1 hunks)
  • surfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.py (1 hunks)
  • surfsense_backend/alembic/versions/32_add_periodic_indexing_fields.py (1 hunks)
  • surfsense_backend/app/celery_app.py (4 hunks)
  • surfsense_backend/app/db.py (1 hunks)
  • surfsense_backend/app/routes/search_source_connectors_routes.py (6 hunks)
  • surfsense_backend/app/schemas/search_source_connector.py (4 hunks)
  • surfsense_backend/app/tasks/celery_tasks/schedule_checker_task.py (1 hunks)
  • surfsense_backend/app/utils/periodic_scheduler.py (1 hunks)
  • surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx (9 hunks)
  • surfsense_web/components/homepage/navbar.tsx (3 hunks)
  • surfsense_web/content/docs/docker-installation.mdx (2 hunks)
  • surfsense_web/content/docs/manual-installation.mdx (2 hunks)
  • surfsense_web/hooks/use-search-source-connectors.ts (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on 70808eb..aed8163

  Severity     Location     Issue     Delete  
High surfsense_backend/app/tasks/celery_tasks/schedule_checker_task.py:111 Type mismatch in task parameters
✅ Files analyzed, no issues (16)

README.md
docker-compose.yml
surfsense_backend/.env.example
surfsense_backend/.gitignore
surfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.py
surfsense_backend/alembic/versions/32_add_periodic_indexing_fields.py
surfsense_backend/app/celery_app.py
surfsense_backend/app/db.py
surfsense_backend/app/routes/search_source_connectors_routes.py
surfsense_backend/app/schemas/search_source_connector.py
surfsense_backend/app/utils/periodic_scheduler.py
surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx
surfsense_web/components/homepage/navbar.tsx
surfsense_web/content/docs/docker-installation.mdx
surfsense_web/content/docs/manual-installation.mdx
surfsense_web/hooks/use-search-source-connectors.ts

connector.id,
connector.search_space_id,
str(connector.user_id),
None, # start_date - uses last_indexed_at
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch in task invocation. The indexing tasks expect start_date: str and end_date: str parameters (see connector_tasks.py line 34-35), but None is being passed. When these None values reach the indexing functions like run_slack_indexing() at line 788-789, they will be passed to index_slack_messages() which likely expects string date parameters. This will cause a TypeError or AttributeError when the indexing logic tries to process these dates (e.g., parsing, comparison operations). The same issue occurs in periodic_scheduler.py line 104 where tasks are also invoked with None for dates.

Evidence:

  1. Task signature in connector_tasks.py defines: start_date: str, end_date: str
  2. schedule_checker_task.py line 111-112 passes: None, None
  3. periodic_scheduler.py line 104 passes: None, None
  4. These None values are passed through to run_slack_indexing() at lines 788-789
  5. The indexing functions will fail when they try to use these None values as strings

React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)

@MODSetter MODSetter merged commit a44deb7 into main Oct 23, 2025
7 of 9 checks passed
Copy link

@recurseml recurseml bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by RecurseML

🔍 Review performed on aed8163..aed8163

✨ No files to analyze

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants