-
-
Notifications
You must be signed in to change notification settings - Fork 854
feat: added periodic indexing for indexable search source connectors #428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- TODO: Add celery redbeat and create tasks dinamically in our redis
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedThe pull request is closed. WalkthroughThis PR introduces periodic indexing functionality for search source connectors. Changes include adding database fields to track periodic indexing state, creating a Celery Beat scheduler service to check and trigger indexing tasks on schedule, updating API schemas and routes to support periodic configuration with validation, building UI components for periodic indexing management, and updating documentation and environment configuration. Changes
Sequence Diagram(s)sequenceDiagram
participant CeleryBeat as Celery Beat<br/>(Scheduler)
participant CheckTask as check_periodic<br/>_schedules_task
participant DB as Database
participant PeriodicScheduler as periodic_scheduler<br/>(utilities)
participant IndexingTasks as Connector-Specific<br/>Indexing Tasks
CeleryBeat->>CheckTask: Triggers at configured interval<br/>(SCHEDULE_CHECKER_INTERVAL)
CheckTask->>DB: Query connectors where<br/>periodic_indexing_enabled=true<br/>AND next_scheduled_at <= NOW
DB-->>CheckTask: Return due connectors
alt Connectors are due
CheckTask->>IndexingTasks: .delay() for each<br/>connector's mapped task
IndexingTasks->>DB: Execute indexing
CheckTask->>DB: Update next_scheduled_at<br/>to NOW + frequency
DB-->>CheckTask: Confirm update
else No due connectors
CheckTask->>CheckTask: Log and continue
end
CheckTask-->>CeleryBeat: Task complete
sequenceDiagram
participant User as User
participant ConnectorsUI as Connectors Page<br/>(UI)
participant API as API Routes
participant PeriodicScheduler as periodic_scheduler
participant DB as Database
User->>ConnectorsUI: Click configure<br/>periodic indexing
ConnectorsUI->>ConnectorsUI: Open dialog with<br/>enable/frequency options
User->>ConnectorsUI: Enable & set frequency,<br/>click Save
ConnectorsUI->>API: updateConnector with<br/>periodic_indexing_enabled,<br/>indexing_frequency_minutes
API->>DB: Validate & persist<br/>connector updates
alt Enabling periodic indexing
API->>PeriodicScheduler: create_periodic_schedule()
PeriodicScheduler->>IndexingTasks: Dispatch initial indexing<br/>task via .delay()
PeriodicScheduler-->>API: Return success
else Disabling periodic indexing
API->>PeriodicScheduler: delete_periodic_schedule()
PeriodicScheduler-->>API: Return success
end
API-->>ConnectorsUI: Update response
ConnectorsUI->>User: Show success toast<br/>& update UI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes The changes span multiple subsystems: database migrations with conditional logic, asynchronous Celery task orchestration with scheduling, RESTful API integration with business logic validation, and frontend UI state management with async operations. The feature threads through database layer, backend scheduling infrastructure, API routes with side effects, and frontend forms/dialogs. Cross-layer integration points and conditional migration logic increase complexity. Possibly related PRs
Suggested reviewers
Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (17)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by RecurseML
🔍 Review performed on 70808eb..aed8163
| Severity | Location | Issue | Delete |
|---|---|---|---|
| surfsense_backend/app/tasks/celery_tasks/schedule_checker_task.py:111 | Type mismatch in task parameters |
✅ Files analyzed, no issues (16)
• README.md
• docker-compose.yml
• surfsense_backend/.env.example
• surfsense_backend/.gitignore
• surfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.py
• surfsense_backend/alembic/versions/32_add_periodic_indexing_fields.py
• surfsense_backend/app/celery_app.py
• surfsense_backend/app/db.py
• surfsense_backend/app/routes/search_source_connectors_routes.py
• surfsense_backend/app/schemas/search_source_connector.py
• surfsense_backend/app/utils/periodic_scheduler.py
• surfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsx
• surfsense_web/components/homepage/navbar.tsx
• surfsense_web/content/docs/docker-installation.mdx
• surfsense_web/content/docs/manual-installation.mdx
• surfsense_web/hooks/use-search-source-connectors.ts
| connector.id, | ||
| connector.search_space_id, | ||
| str(connector.user_id), | ||
| None, # start_date - uses last_indexed_at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch in task invocation. The indexing tasks expect start_date: str and end_date: str parameters (see connector_tasks.py line 34-35), but None is being passed. When these None values reach the indexing functions like run_slack_indexing() at line 788-789, they will be passed to index_slack_messages() which likely expects string date parameters. This will cause a TypeError or AttributeError when the indexing logic tries to process these dates (e.g., parsing, comparison operations). The same issue occurs in periodic_scheduler.py line 104 where tasks are also invoked with None for dates.
Evidence:
- Task signature in connector_tasks.py defines:
start_date: str, end_date: str - schedule_checker_task.py line 111-112 passes:
None, None - periodic_scheduler.py line 104 passes:
None, None - These None values are passed through to run_slack_indexing() at lines 788-789
- The indexing functions will fail when they try to use these None values as strings
React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
added periodic indexing for indexable search source connectors
Screenshots
API Changes
Change Type
Testing Performed
Checklist
High-level PR Summary
This PR implements periodic indexing functionality for indexable search source connectors, allowing automated data synchronization at configurable intervals. The implementation uses a meta-scheduler pattern with Celery Beat, where a single scheduled task checks the database every minute (configurable) for connectors due for indexing, rather than creating individual Beat schedules per connector. This includes database schema changes to store scheduling information (
periodic_indexing_enabled,indexing_frequency_minutes,next_scheduled_at), backend API updates to manage periodic schedules, a new Celery Beat service in Docker, and a comprehensive UI for configuring periodic indexing with preset frequency options (15m, 1h, 6h, 12h, daily, weekly) or custom intervals.⏱️ Estimated Review Time: 30-90 minutes
💡 Review Order Suggestion
surfsense_backend/alembic/versions/32_add_periodic_indexing_fields.pysurfsense_backend/app/db.pysurfsense_backend/app/schemas/search_source_connector.pysurfsense_backend/app/celery_app.pysurfsense_backend/app/tasks/celery_tasks/schedule_checker_task.pysurfsense_backend/app/utils/periodic_scheduler.pysurfsense_backend/app/routes/search_source_connectors_routes.pysurfsense_web/hooks/use-search-source-connectors.tssurfsense_web/app/dashboard/[search_space_id]/connectors/(manage)/page.tsxdocker-compose.ymlsurfsense_backend/.env.examplesurfsense_web/content/docs/docker-installation.mdxsurfsense_web/content/docs/manual-installation.mdxsurfsense_backend/.gitignoresurfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.pyREADME.mdsurfsense_web/components/homepage/navbar.tsxREADME.mdsurfsense_backend/alembic/versions/25_migrate_llm_configs_to_search_spaces.pysurfsense_web/components/homepage/navbar.tsxSummary by CodeRabbit
New Features
Documentation
UI/UX Improvements