-
-
Notifications
You must be signed in to change notification settings - Fork 839
bugs(fix):Fixed broken data range filtering #407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vaishcodescape is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel. A member of the Team first needs to authorize it. |
WalkthroughAdds scheduling capability for connector indexing: new DB model and migration, scheduling utilities, a background scheduler service with app lifecycle integration, API routes for managing schedules and scheduler status, connector date-range fixes (Jira, ClickUp), schemas for schedules, and a dashboard page to create/manage schedules. Includes IDE config files. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant FastAPI App
participant SchedulerSvc as ConnectorSchedulerService
participant DB
Note over FastAPI App,SchedulerSvc: Application startup
FastAPI App->>DB: create_all() / migrations applied
FastAPI App->>SchedulerSvc: start()
SchedulerSvc-->>SchedulerSvc: background loop (check_interval)
SchedulerSvc->>DB: fetch active schedules where next_run_at <= now
alt Due schedules found
loop up to max_concurrent_jobs
SchedulerSvc->>SchedulerSvc: _execute_schedule(schedule)
SchedulerSvc->>DB: update last_run_at
SchedulerSvc->>Indexers: run connector indexer (async)
Indexers-->>SchedulerSvc: success/failure
SchedulerSvc->>DB: update next_run_at (calculate_next_run)
end
else None due
SchedulerSvc-->>SchedulerSvc: sleep(check_interval)
end
Note over FastAPI App,SchedulerSvc: Application shutdown
FastAPI App->>SchedulerSvc: stop() and cancel tasks
sequenceDiagram
autonumber
participant User
participant API as /api/v1/connector-schedules/*
participant Helper as schedule_helpers
participant DB
User->>API: POST create schedule
API->>DB: validate ownership and uniqueness
API->>Helper: calculate_next_run(...)
Helper-->>API: next_run_at (UTC)
API->>DB: insert schedule with next_run_at
API-->>User: 201 Created (schedule)
User->>API: POST /scheduler/schedules/{id}/force-execute
API->>Scheduler: queue force_execute_schedule(id)
API-->>User: 202 Accepted
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by RecurseML
🔍 Review performed on d868bae..c044d42
✨ No bugs found, your code is sparkling clean
✅ Files analyzed, no issues (17)
• .idea/.gitignore
• .idea/SurfSense.iml
• .idea/inspectionProfiles/profiles_settings.xml
• .idea/modules.xml
• .idea/vcs.xml
• surfsense_backend/alembic/versions/23_add_connector_schedules_table.py
• surfsense_backend/app/app.py
• surfsense_backend/app/connectors/clickup_connector.py
• surfsense_backend/app/connectors/jira_connector.py
• surfsense_backend/app/db.py
• surfsense_backend/app/routes/connector_schedules_routes.py
• surfsense_backend/app/routes/scheduler_routes.py
• surfsense_backend/app/schemas/__init__.py
• surfsense_backend/app/schemas/connector_schedule.py
• surfsense_backend/app/services/connector_scheduler_service.py
• surfsense_backend/app/utils/schedule_helpers.py
• surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
⏭️ Files skipped (1)
| Locations |
|---|
surfsense_web/package-lock.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (11)
surfsense_backend/app/connectors/jira_connector.py (2)
224-227: LGTM! OR logic correctly captures created or updated issues.The date filter now properly includes issues that were either created or updated within the specified range, which is essential for incremental synchronization.
However, consider validating the date format (YYYY-MM-DD) before constructing the JQL to prevent potential JQL injection or syntax errors.
Add validation like this before constructing the date_filter:
from datetime import datetime def get_issues_by_date_range( self, start_date: str, end_date: str, include_comments: bool = True, project_key: str | None = None, ) -> tuple[list[dict[str, Any]], str | None]: """...""" try: # Validate date format datetime.strptime(start_date, "%Y-%m-%d") datetime.strptime(end_date, "%Y-%m-%d") # Build JQL query for date range date_filter = ( f"(createdDate >= '{start_date}' AND createdDate <= '{end_date}') " f"OR (updatedDate >= '{start_date}' AND updatedDate <= '{end_date}')" ) # ... rest of the function
230-233: LGTM! Parentheses ensure correct operator precedence.The parentheses around
date_filterare essential when combining with the project filter, preventing the OR from breaking out of the intended grouping.To reduce duplication, consider extracting the common pattern:
- _jql = f"{date_filter} ORDER BY created DESC" - if project_key: - _jql = ( - f'project = "{project_key}" AND ({date_filter}) ORDER BY created DESC' - ) + base_filter = date_filter + if project_key: + base_filter = f'project = "{project_key}" AND ({date_filter})' + _jql = f"{base_filter} ORDER BY created DESC".idea/SurfSense.iml (1)
1-8: Consider excluding IDE project files from VCS.Project-level .idea files are environment-specific. Prefer ignoring them at repo root (or use .editorconfig/devcontainer) to avoid churn across contributors.
surfsense_backend/app/app.py (1)
44-49: Remove task-cancel block if you await startup.If you switch to awaiting start_scheduler(), there’s no persistent task to cancel here; rely on stop_scheduler().
- # Cancel the scheduler task - if not scheduler_task.done(): - scheduler_task.cancel() - with suppress(asyncio.CancelledError): - await scheduler_task + # Scheduler stopped above; no task cancellation needed when startup was awaitedsurfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx (1)
75-83: params should not be a PromiseNext.js app router passes params as a plain object. Typing it as Promise adds confusion. Prefer direct typing and remove the resolveParams effect.
Example:
-export default function ConnectorSchedulesPage({ - params, -}: { - params: Promise<{ search_space_id: string }>; -}) { +export default function ConnectorSchedulesPage({ + params, +}: { + params: { search_space_id: string }; +}) { - const [searchSpaceId, setSearchSpaceId] = useState<string>(""); - useEffect(() => { /* resolveParams ... */ }, [params]); + const [searchSpaceId, setSearchSpaceId] = useState<string>(params.search_space_id);surfsense_backend/app/routes/scheduler_routes.py (1)
48-81: Force-execute route implementation is reasonableBackgroundTasks usage and user auth are fine. Consider returning 202 Accepted to reflect async enqueueing (optional).
surfsense_backend/app/services/connector_scheduler_service.py (2)
49-51: Make scheduler knobs configurable via configHardcoded values limit ops flexibility. Read from config with sane defaults.
- self.check_interval = 60 # Check every 60 seconds - self.max_concurrent_jobs = 5 # Maximum concurrent indexing jobs + self.check_interval = getattr(config, "SCHEDULER_CHECK_INTERVAL", 60) + self.max_concurrent_jobs = getattr(config, "SCHEDULER_MAX_CONCURRENT_JOBS", 5)
371-375: Confirm non-blocking startupstart_scheduler awaits scheduler.start(), which runs an indefinite loop. Ensure app startup creates a background task (e.g., asyncio.create_task(scheduler.start())) instead of awaiting on the main startup coroutine.
surfsense_backend/app/routes/connector_schedules_routes.py (1)
164-179: Optional: add stable ordering to list resultsOrdering by next_run_at (then id) improves predictability for clients.
- result = await session.execute(query.offset(skip).limit(limit)) + result = await session.execute( + query.order_by(ConnectorSchedule.next_run_at.asc(), ConnectorSchedule.id.asc()) + .offset(skip).limit(limit) + )surfsense_backend/app/db.py (1)
2-2: Avoid datetime.UTC for broader Python compatibilitydatetime.UTC requires Python 3.11+. If your runtime is 3.10, import will fail. Use timezone.utc instead.
-from datetime import UTC, datetime +from datetime import datetime, timezone @@ class TimestampMixin: - default=lambda: datetime.now(UTC), + default=lambda: datetime.now(timezone.utc),Also applies to: 148-152
surfsense_backend/app/schemas/connector_schedule.py (1)
118-127: Relax update validators to allow time-only updatesCurrently, clients must include schedule_type when updating time fields. Allow updates without schedule_type (route uses effective type).
def validate_daily_time_update(cls, v: time | None, info: FieldValidationInfo) -> time | None: """Validate daily_time is only provided for DAILY schedule type.""" schedule_type = info.data.get("schedule_type") - if v is not None and schedule_type != ScheduleType.DAILY: + if v is not None and schedule_type is not None and schedule_type != ScheduleType.DAILY: raise ValueError( "daily_time should only be provided for DAILY schedule_type" ) return v @@ def validate_weekly_day_update(cls, v: int | None, info: FieldValidationInfo) -> int | None: """Validate weekly_day is only provided for WEEKLY schedule type.""" schedule_type = info.data.get("schedule_type") - if v is not None and schedule_type != ScheduleType.WEEKLY: + if v is not None and schedule_type is not None and schedule_type != ScheduleType.WEEKLY: raise ValueError( "weekly_day should only be provided for WEEKLY schedule_type" ) if v is not None and not (0 <= v <= 6): raise ValueError("weekly_day must be between 0 (Monday) and 6 (Sunday)") return v @@ def validate_hourly_minute_update(cls, v: int | None, info: FieldValidationInfo) -> int | None: """Validate hourly_minute is only provided for HOURLY schedule type.""" schedule_type = info.data.get("schedule_type") - if v is not None and schedule_type != ScheduleType.HOURLY: + if v is not None and schedule_type is not None and schedule_type != ScheduleType.HOURLY: raise ValueError( "hourly_minute should only be provided for HOURLY schedule_type" ) if v is not None and not (0 <= v <= 59): raise ValueError("hourly_minute must be between 0 and 59") return vAlso applies to: 129-141, 153-164
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
surfsense_web/package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (17)
.idea/.gitignore(1 hunks).idea/SurfSense.iml(1 hunks).idea/inspectionProfiles/profiles_settings.xml(1 hunks).idea/modules.xml(1 hunks).idea/vcs.xml(1 hunks)surfsense_backend/alembic/versions/23_add_connector_schedules_table.py(1 hunks)surfsense_backend/app/app.py(3 hunks)surfsense_backend/app/connectors/clickup_connector.py(2 hunks)surfsense_backend/app/connectors/jira_connector.py(2 hunks)surfsense_backend/app/db.py(4 hunks)surfsense_backend/app/routes/connector_schedules_routes.py(1 hunks)surfsense_backend/app/routes/scheduler_routes.py(1 hunks)surfsense_backend/app/schemas/__init__.py(2 hunks)surfsense_backend/app/schemas/connector_schedule.py(1 hunks)surfsense_backend/app/services/connector_scheduler_service.py(1 hunks)surfsense_backend/app/utils/schedule_helpers.py(1 hunks)surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{jsx,tsx}
📄 CodeRabbit inference engine (.rules/require_unique_id_props.mdc)
**/*.{jsx,tsx}: When mapping arrays to React elements in JSX/TSX, each rendered element must include a unique key prop
Keys used for React list items should be stable, predictable, and unique among siblings
Files:
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
🧬 Code graph analysis (8)
surfsense_backend/app/utils/schedule_helpers.py (1)
surfsense_backend/app/db.py (1)
ScheduleType(133-137)
surfsense_backend/app/schemas/__init__.py (1)
surfsense_backend/app/schemas/connector_schedule.py (4)
ConnectorScheduleBase(11-85)ConnectorScheduleCreate(88-91)ConnectorScheduleRead(167-173)ConnectorScheduleUpdate(94-164)
surfsense_backend/app/routes/connector_schedules_routes.py (4)
surfsense_backend/app/db.py (7)
ConnectorSchedule(299-330)ScheduleType(133-137)SearchSourceConnector(264-296)SearchSpace(221-261)User(424-433)User(437-443)get_async_session(484-486)surfsense_backend/app/schemas/connector_schedule.py (3)
ConnectorScheduleCreate(88-91)ConnectorScheduleRead(167-173)ConnectorScheduleUpdate(94-164)surfsense_backend/app/utils/check_ownership.py (1)
check_ownership(9-19)surfsense_backend/app/utils/schedule_helpers.py (1)
calculate_next_run(10-80)
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx (1)
surfsense_backend/app/db.py (1)
ConnectorSchedule(299-330)
surfsense_backend/app/schemas/connector_schedule.py (2)
surfsense_backend/app/db.py (2)
BaseModel(155-159)ScheduleType(133-137)surfsense_backend/app/schemas/base.py (2)
IDModel(11-13)TimestampModel(6-8)
surfsense_backend/app/services/connector_scheduler_service.py (4)
surfsense_backend/app/db.py (4)
ConnectorSchedule(299-330)SearchSourceConnector(264-296)SearchSourceConnectorType(56-72)get_async_session(484-486)surfsense_backend/app/services/task_logging_service.py (4)
TaskLoggingService(13-243)log_task_start(20-58)log_task_failure(107-162)log_task_success(60-105)surfsense_backend/app/utils/schedule_helpers.py (1)
calculate_next_run(10-80)surfsense_backend/app/routes/scheduler_routes.py (2)
get_scheduler_status(24-45)force_execute_schedule(49-80)
surfsense_backend/app/app.py (2)
surfsense_backend/app/services/connector_scheduler_service.py (2)
start_scheduler(371-374)stop_scheduler(377-382)surfsense_backend/app/db.py (1)
create_db_and_tables(477-481)
surfsense_backend/app/routes/scheduler_routes.py (3)
surfsense_backend/app/db.py (3)
get_async_session(484-486)ConnectorSchedule(299-330)SearchSpace(221-261)surfsense_backend/app/schemas/connector_schedule.py (1)
ConnectorScheduleRead(167-173)surfsense_backend/app/services/connector_scheduler_service.py (3)
get_scheduler(363-368)get_scheduler_status(310-325)force_execute_schedule(327-356)
🪛 GitHub Actions: Code Quality Checks
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
[error] 3-3: Biome: Organize Imports (FIXABLE). The imports and exports are not sorted.
[error] 3-3: Biome: Import organization suggested fix available. Run Biome organize imports to fix.
🪛 GitHub Actions: pre-commit
surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
[error] 412-412: lint/correctness/useUniqueElementIds: id attribute should not be a static string literal. Generate unique IDs using useId().
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Python Backend Quality
🔇 Additional comments (7)
surfsense_backend/app/connectors/jira_connector.py (2)
229-229: LGTM! Consistent ordering added.The ORDER BY clause ensures predictable result ordering for pagination.
252-254: LGTM! Date filtering now properly activated.The JQL query is now correctly passed to the API request, enabling the date range filtering functionality. This is the key fix that restores incremental synchronization.
surfsense_backend/app/connectors/clickup_connector.py (2)
8-8: LGTM!The datetime import is correctly added to support date parsing and timestamp conversion.
196-221: LGTM with previous concerns addressed.The pagination logic, response validation, and error handling are well-implemented. The function correctly handles empty results and API errors.
surfsense_backend/app/app.py (1)
100-101: Routers wiring LGTM.New connector schedules and scheduler routes correctly mounted under /api/v1.
surfsense_backend/app/schemas/__init__.py (1)
11-16: Exports wired correctlyConnectorSchedule schemas are imported and exposed via all. Looks good.
Also applies to: 61-64
surfsense_backend/app/routes/scheduler_routes.py (1)
23-45: Scheduler status endpoint looks goodAuth, error handling, and delegation to service are appropriate.
| CREATE TABLE IF NOT EXISTS connector_schedules ( | ||
| id SERIAL PRIMARY KEY, | ||
| created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), | ||
| connector_id INTEGER NOT NULL REFERENCES search_source_connectors(id) ON DELETE CASCADE, | ||
| search_space_id INTEGER NOT NULL REFERENCES searchspaces(id) ON DELETE CASCADE, | ||
| schedule_type scheduletype NOT NULL, | ||
| cron_expression VARCHAR(100), | ||
| daily_time TIME, | ||
| weekly_day SMALLINT, | ||
| weekly_time TIME, | ||
| hourly_minute SMALLINT, | ||
| is_active BOOLEAN NOT NULL DEFAULT TRUE, | ||
| last_run_at TIMESTAMPTZ, | ||
| next_run_at TIMESTAMPTZ, | ||
| CONSTRAINT uq_connector_search_space UNIQUE (connector_id, search_space_id) | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing updated_at column to match ORM TimestampMixin
The ConnectorSchedule model inherits TimestampMixin (created_at, updated_at). Table DDL creates created_at but omits updated_at, which will cause runtime SQL errors (column not found) when ORM selects/updates it.
Apply this diff to add updated_at:
CREATE TABLE IF NOT EXISTS connector_schedules (
id SERIAL PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+ updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
connector_id INTEGER NOT NULL REFERENCES search_source_connectors(id) ON DELETE CASCADE,
search_space_id INTEGER NOT NULL REFERENCES searchspaces(id) ON DELETE CASCADE,
schedule_type scheduletype NOT NULL,Optional: the explicit index on id is redundant (PK already indexed). Consider dropping creation of ix_connector_schedules_id.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CREATE TABLE IF NOT EXISTS connector_schedules ( | |
| id SERIAL PRIMARY KEY, | |
| created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), | |
| connector_id INTEGER NOT NULL REFERENCES search_source_connectors(id) ON DELETE CASCADE, | |
| search_space_id INTEGER NOT NULL REFERENCES searchspaces(id) ON DELETE CASCADE, | |
| schedule_type scheduletype NOT NULL, | |
| cron_expression VARCHAR(100), | |
| daily_time TIME, | |
| weekly_day SMALLINT, | |
| weekly_time TIME, | |
| hourly_minute SMALLINT, | |
| is_active BOOLEAN NOT NULL DEFAULT TRUE, | |
| last_run_at TIMESTAMPTZ, | |
| next_run_at TIMESTAMPTZ, | |
| CONSTRAINT uq_connector_search_space UNIQUE (connector_id, search_space_id) | |
| ); | |
| CREATE TABLE IF NOT EXISTS connector_schedules ( | |
| id SERIAL PRIMARY KEY, | |
| created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), | |
| updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), | |
| connector_id INTEGER NOT NULL REFERENCES search_source_connectors(id) ON DELETE CASCADE, | |
| search_space_id INTEGER NOT NULL REFERENCES searchspaces(id) ON DELETE CASCADE, | |
| schedule_type scheduletype NOT NULL, | |
| cron_expression VARCHAR(100), | |
| daily_time TIME, | |
| weekly_day SMALLINT, | |
| weekly_time TIME, | |
| hourly_minute SMALLINT, | |
| is_active BOOLEAN NOT NULL DEFAULT TRUE, | |
| last_run_at TIMESTAMPTZ, | |
| next_run_at TIMESTAMPTZ, | |
| CONSTRAINT uq_connector_search_space UNIQUE (connector_id, search_space_id) | |
| ); |
🤖 Prompt for AI Agents
In surfsense_backend/alembic/versions/23_add_connector_schedules_table.py around
lines 38 to 53, the CREATE TABLE for connector_schedules defines created_at but
omits updated_at which the ORM's TimestampMixin expects; add an updated_at
TIMESTAMPTZ NOT NULL DEFAULT NOW() column (positioned with created_at) so
selects/updates succeed, and optionally remove the explicit
ix_connector_schedules_id index since the PRIMARY KEY already creates an index.
| # Start the connector scheduler service | ||
| scheduler_task = asyncio.create_task(start_scheduler()) | ||
| logger.info("Connector scheduler service started") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid spawning an unobserved startup task; await and handle errors.
create_task(start_scheduler()) can fail silently; the “started” log is emitted even if startup fails. Await it and log/propagate errors.
Apply this diff:
- # Start the connector scheduler service
- scheduler_task = asyncio.create_task(start_scheduler())
- logger.info("Connector scheduler service started")
+ # Start the connector scheduler service
+ try:
+ await start_scheduler()
+ logger.info("Connector scheduler service started")
+ except Exception:
+ logger.exception("Failed to start connector scheduler")
+ raiseCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In surfsense_backend/app/app.py around lines 31 to 34, the code spawns
start_scheduler() with asyncio.create_task and immediately logs "Connector
scheduler service started", which can hide startup failures; change this to
await start_scheduler() (or await the created task) inside a try/except so
startup errors are caught, log a success message only after await returns, and
on exception log the error and re-raise or handle/propagate it appropriately so
failures are not silent.
| # Convert date strings to Unix timestamps (milliseconds) | ||
| start_datetime = datetime.strptime(start_date, "%Y-%m-%d") | ||
| end_datetime = datetime.strptime(end_date, "%Y-%m-%d") | ||
|
|
||
| # Set time to start and end of day for complete coverage | ||
| start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0) | ||
| end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999) | ||
|
|
||
| start_timestamp = int(start_datetime.timestamp() * 1000) | ||
| end_timestamp = int(end_datetime.timestamp() * 1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use UTC timezone explicitly to avoid inconsistent filtering.
The code creates naive datetime objects (without timezone info) and calls .timestamp(), which uses the system's local timezone. This can lead to incorrect date filtering if the server timezone differs from the expected timezone or when handling tasks across timezones.
Apply this diff to use UTC explicitly:
- # Convert date strings to Unix timestamps (milliseconds)
- start_datetime = datetime.strptime(start_date, "%Y-%m-%d")
- end_datetime = datetime.strptime(end_date, "%Y-%m-%d")
-
- # Set time to start and end of day for complete coverage
- start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0)
- end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999)
-
- start_timestamp = int(start_datetime.timestamp() * 1000)
- end_timestamp = int(end_datetime.timestamp() * 1000)
+ # Convert date strings to Unix timestamps (milliseconds) using UTC
+ from datetime import timezone
+
+ start_datetime = datetime.strptime(start_date, "%Y-%m-%d").replace(
+ hour=0, minute=0, second=0, microsecond=0, tzinfo=timezone.utc
+ )
+ end_datetime = datetime.strptime(end_date, "%Y-%m-%d").replace(
+ hour=23, minute=59, second=59, microsecond=999999, tzinfo=timezone.utc
+ )
+
+ start_timestamp = int(start_datetime.timestamp() * 1000)
+ end_timestamp = int(end_datetime.timestamp() * 1000)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Convert date strings to Unix timestamps (milliseconds) | |
| start_datetime = datetime.strptime(start_date, "%Y-%m-%d") | |
| end_datetime = datetime.strptime(end_date, "%Y-%m-%d") | |
| # Set time to start and end of day for complete coverage | |
| start_datetime = start_datetime.replace(hour=0, minute=0, second=0, microsecond=0) | |
| end_datetime = end_datetime.replace(hour=23, minute=59, second=59, microsecond=999999) | |
| start_timestamp = int(start_datetime.timestamp() * 1000) | |
| end_timestamp = int(end_datetime.timestamp() * 1000) | |
| # Convert date strings to Unix timestamps (milliseconds) using UTC | |
| from datetime import timezone | |
| start_datetime = datetime.strptime(start_date, "%Y-%m-%d").replace( | |
| hour=0, minute=0, second=0, microsecond=0, tzinfo=timezone.utc | |
| ) | |
| end_datetime = datetime.strptime(end_date, "%Y-%m-%d").replace( | |
| hour=23, minute=59, second=59, microsecond=999999, tzinfo=timezone.utc | |
| ) | |
| start_timestamp = int(start_datetime.timestamp() * 1000) | |
| end_timestamp = int(end_datetime.timestamp() * 1000) |
🤖 Prompt for AI Agents
In surfsense_backend/app/connectors/clickup_connector.py around lines 172 to
181, the code builds naive datetimes and calls .timestamp(), which uses the
system local timezone; make the datetimes timezone-aware in UTC before
converting to timestamps: parse the dates as date objects or datetimes as
currently done, set their time to start/end of day, attach UTC tzinfo (e.g.,
replace or combine with datetime.timezone.utc) so they are aware, then call
.timestamp() to produce consistent Unix millisecond timestamps in UTC.
| # Date filtering - filter by both created and updated dates | ||
| "date_created_gt": start_timestamp, | ||
| "date_created_lt": end_timestamp, | ||
| "date_updated_gt": start_timestamp, | ||
| "date_updated_lt": end_timestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify that ClickUp API uses OR logic for multiple date filters.
The implementation sends all four date parameters (date_created_gt/lt AND date_updated_gt/lt) simultaneously. Depending on how the ClickUp API interprets these parameters, this could either:
- OR logic (desired): Return tasks created OR updated within the range, enabling proper incremental sync.
- AND logic (problematic): Return only tasks where BOTH created AND updated dates fall within the range, missing tasks created before the range but updated within it (or vice versa).
The PR objectives mention Jira uses explicit OR logic. Verify the ClickUp API behavior to ensure incremental synchronization works correctly.
Run the following script to check the ClickUp API documentation and behavior:
Additionally, search the web for ClickUp API documentation on date filtering:
🏁 Script executed:
#!/bin/bash
# Verify ClickUp API date filtering behavior and search for existing integration examples
# Search codebase for any documentation or comments about ClickUp date filter behavior
rg -i "clickup.*date.*filter" -A 5 -B 5
# Search for any tests that might clarify the expected behavior
rg -i "date_created.*date_updated" -A 5 -B 5Length of output: 0
🌐 Web query:
ClickUp API v2 date filtering parameters date_created_gt date_updated_gt behavior AND OR logic
💡 Result:
Short answer
- date_created_gt and date_updated_gt are Unix‑epoch millisecond filters (strict "greater than") that compare against a task's dateCreated / dateUpdated fields. [1][2][3]
- When you pass multiple filters (dates, status, assignees, etc.) the API applies them conjunctively (AND) — a task must satisfy all supplied filters to be returned. [1][3]
Details / notes
- Use milliseconds since 1970-01-01 UTC (or stringified millisecond value). The API returns timestamps in UTC. [2]
- Operators documented: gt (greater than), lt (less than), gte/lte for inclusive comparisons where available. Use date_created_lt / date_updated_lt for upper bounds. [1][3]
- If you need OR logic (e.g., created after A OR updated after B) you must perform multiple requests and merge results client‑side (no server-side OR across different date fields is documented). [1][3]
Sources
- ClickUp developer — Filter Views (dateCreated/dateUpdated, operators). [1]
- ClickUp developer — Date formatting / timestamps (milliseconds, UTC). [2]
- ClickUp API task endpoints and community docs referencing date_created_gt / date_updated_gt. [3]
Handle ClickUp date filters with OR logic by splitting requests
ClickUp applies multiple date filters conjunctively (AND), so passing both created and updated bounds returns only tasks matching both ranges, missing tasks where only one date qualifies. Break into two queries (created range AND updated range), then merge and dedupe results client-side.
🤖 Prompt for AI Agents
In surfsense_backend/app/connectors/clickup_connector.py around lines 189 to
193, the current request includes both created and updated date bounds which
ClickUp applies as AND; instead, perform two queries: one with only
date_created_gt/lt set to start_timestamp/end_timestamp and another with only
date_updated_gt/lt set to the same timestamps, then combine the two response
lists and deduplicate by task id (e.g., use a dict or set of ids) before
returning so tasks that match either created OR updated ranges are preserved.
| connector_schedules = relationship( | ||
| "ConnectorSchedule", | ||
| back_populates="search_space", | ||
| order_by="ConnectorSchedule.id", | ||
| cascade="all, delete-orphan", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix broken SQLAlchemy relationships (missing back_populates)
Several relationships reference back_populates that don’t exist, causing mapper configuration errors at startup.
Apply these additions:
@@ class SearchSpace(BaseModel, TimestampMixin):
logs = relationship(
"Log",
back_populates="search_space",
order_by="Log.id",
cascade="all, delete-orphan",
)
+ # Add missing relationships for back_populates used elsewhere
+ search_source_connectors = relationship(
+ "SearchSourceConnector",
+ back_populates="search_space",
+ order_by="SearchSourceConnector.id",
+ cascade="all, delete-orphan",
+ )
connector_schedules = relationship(
"ConnectorSchedule",
back_populates="search_space",
order_by="ConnectorSchedule.id",
cascade="all, delete-orphan",
)
+ llm_configs = relationship(
+ "LLMConfig",
+ back_populates="search_space",
+ order_by="LLMConfig.id",
+ cascade="all, delete-orphan",
+ )
+ user_preferences = relationship(
+ "UserSearchSpacePreference",
+ back_populates="search_space",
+ order_by="UserSearchSpacePreference.id",
+ cascade="all, delete-orphan",
+ )
@@ class SearchSourceConnector(BaseModel, TimestampMixin):
- search_space = relationship(
- "SearchSpace", back_populates="search_source_connectors"
- )
+ search_space = relationship("SearchSpace", back_populates="search_source_connectors")
@@ class LLMConfig(BaseModel, TimestampMixin):
- search_space = relationship("SearchSpace", back_populates="llm_configs")
+ search_space = relationship("SearchSpace", back_populates="llm_configs")
@@ class UserSearchSpacePreference(BaseModel, TimestampMixin):
- search_space = relationship("SearchSpace", back_populates="user_preferences")
+ search_space = relationship("SearchSpace", back_populates="user_preferences")
@@ class User(SQLAlchemyBaseUserTableUUID, Base): # GOOGLE path
search_spaces = relationship("SearchSpace", back_populates="user")
search_space_preferences = relationship(
"UserSearchSpacePreference",
back_populates="user",
cascade="all, delete-orphan",
)
+ search_source_connectors = relationship(
+ "SearchSourceConnector",
+ back_populates="user",
+ )
@@ else path: class User(SQLAlchemyBaseUserTableUUID, Base):
search_spaces = relationship("SearchSpace", back_populates="user")
search_space_preferences = relationship(
"UserSearchSpacePreference",
back_populates="user",
cascade="all, delete-orphan",
)
+ search_source_connectors = relationship(
+ "SearchSourceConnector",
+ back_populates="user",
+ )Also applies to: 284-291, 333-356, 391-392, 424-444
🤖 Prompt for AI Agents
In surfsense_backend/app/db.py around lines 256-261 (and similarly for ranges
284-291, 333-356, 391-392, 424-444), several relationship() calls reference
back_populates names that are missing on the related models; add the reciprocal
relationship attributes to those related ORM classes using the exact
back_populates string used here, matching relationship parameters (e.g., class
name, back_populates value, order_by/cascade where appropriate) so each
relationship has a corresponding partner — ensure names and casing match, set
backref behavior consistently (use relationship on both sides with
back_populates, not backref), and include any order_by or cascade clauses on the
appropriate side to mirror the original intended behavior.
| def __init__(self): | ||
| self.running = False | ||
| self.check_interval = 60 # Check every 60 seconds | ||
| self.max_concurrent_jobs = 5 # Maximum concurrent indexing jobs | ||
| self.active_jobs: Dict[int, asyncio.Task] = {} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent duplicate concurrent runs with a lock around check/set
There’s a race if the main loop and force-execute call _execute_schedule concurrently; both can pass the check before active_jobs is updated. Add an asyncio.Lock to guard the check and assignment.
@@
def __init__(self):
self.running = False
- self.check_interval = 60 # Check every 60 seconds
- self.max_concurrent_jobs = 5 # Maximum concurrent indexing jobs
+ self.check_interval = 60 # Check every 60 seconds
+ self.max_concurrent_jobs = 5 # Maximum concurrent indexing jobs
self.active_jobs: Dict[int, asyncio.Task] = {}
+ self._jobs_lock = asyncio.Lock()
@@
- # Check if we already have an active job for this schedule
- if schedule_id in self.active_jobs and not self.active_jobs[schedule_id].done():
- logger.warning(f"Schedule {schedule_id} is already running, skipping")
- return
+ # Check under lock to prevent concurrent duplicate runs
+ async with self._jobs_lock:
+ if schedule_id in self.active_jobs and not self.active_jobs[schedule_id].done():
+ logger.warning(f"Schedule {schedule_id} is already running, skipping")
+ return
@@
- self.active_jobs[schedule_id] = task
+ # Set under lock to avoid races
+ async with self._jobs_lock:
+ self.active_jobs[schedule_id] = taskAlso applies to: 153-160, 181-181
🤖 Prompt for AI Agents
In surfsense_backend/app/services/connector_scheduler_service.py around lines
47-52 (and apply same pattern at 153-160 and 181), add an asyncio.Lock instance
on the class (e.g., self._lock = asyncio.Lock()) in __init__ and wrap the
check-and-set sections in _execute_schedule and the force-execute path with
"async with self._lock:" so the check of active_jobs and the insertion of a new
task into active_jobs are atomic; ensure you hold the lock only for the minimal
duration needed to check active_jobs and add the task (release it before
awaiting the task itself) and also protect any concurrent removal/update of
active_jobs when tasks complete.
| if schedule_type == ScheduleType.HOURLY: | ||
| # Run at the specified minute of the next hour | ||
| minute = hourly_minute if hourly_minute is not None else 0 | ||
| next_run = (now + timedelta(hours=1)).replace(minute=minute, second=0, microsecond=0) | ||
| return next_run.replace(tzinfo=timezone.utc) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HOURLY next-run skips same-hour executions
Current logic always jumps to the next hour, so if now=12:10 and minute=30 it incorrectly returns 13:30 instead of 12:30.
Apply this diff:
if schedule_type == ScheduleType.HOURLY:
# Run at the specified minute of the next hour
minute = hourly_minute if hourly_minute is not None else 0
- next_run = (now + timedelta(hours=1)).replace(minute=minute, second=0, microsecond=0)
- return next_run.replace(tzinfo=timezone.utc)
+ candidate = now.replace(minute=minute, second=0, microsecond=0)
+ if candidate <= now:
+ candidate = (now + timedelta(hours=1)).replace(minute=minute, second=0, microsecond=0)
+ return candidate.replace(tzinfo=timezone.utc)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if schedule_type == ScheduleType.HOURLY: | |
| # Run at the specified minute of the next hour | |
| minute = hourly_minute if hourly_minute is not None else 0 | |
| next_run = (now + timedelta(hours=1)).replace(minute=minute, second=0, microsecond=0) | |
| return next_run.replace(tzinfo=timezone.utc) | |
| if schedule_type == ScheduleType.HOURLY: | |
| # Run at the specified minute of the next hour (or later this hour if still upcoming) | |
| minute = hourly_minute if hourly_minute is not None else 0 | |
| candidate = now.replace(minute=minute, second=0, microsecond=0) | |
| if candidate <= now: | |
| candidate = (now + timedelta(hours=1)).replace(minute=minute, second=0, microsecond=0) | |
| return candidate.replace(tzinfo=timezone.utc) |
🤖 Prompt for AI Agents
In surfsense_backend/app/utils/schedule_helpers.py around lines 37 to 42, the
HOURLY branch always advances to the next hour which skips same-hour executions;
change the logic to compute the target minute (default 0), build a candidate
time by replacing minute/second/microsecond on now, and if that candidate is
later than now return it, otherwise add one hour and return that; preserve
timezone by setting tzinfo=timezone.utc on the returned datetime.
| @@ -0,0 +1,572 @@ | |||
| "use client"; | |||
|
|
|||
| import { useState, useEffect } from "react"; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Organize and prune imports to satisfy Biome
Imports need organizing; also remove unused (Tabs, TabsContent, TabsList, TabsTrigger, AlertCircle) to keep the tree clean.
Run your Biome organize imports task (autofix). If you prefer a manual tweak, drop the unused imports and group/sort remaining imports.
🧰 Tools
🪛 GitHub Actions: Code Quality Checks
[error] 3-3: Biome: Organize Imports (FIXABLE). The imports and exports are not sorted.
[error] 3-3: Biome: Import organization suggested fix available. Run Biome organize imports to fix.
🤖 Prompt for AI Agents
In surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
around line 3, imports are disorganized and include unused symbols (Tabs,
TabsContent, TabsList, TabsTrigger, AlertCircle) which Biome will complain
about; remove those unused imports, keep only used imports (e.g., useState,
useEffect, and any others referenced in the file), then reorder/group and sort
the remaining imports according to project convention (external packages first,
then absolute/internal paths, then relative), and finally run Biome's
organize/import autofix or save the file to ensure the import list is pruned and
formatted.
Fix static id to satisfy pre-commit (useId) and link Label/Control correctly
Static id violates lint rule. Use useId to generate a unique id and wire Label.htmlFor to it.
-import { useState, useEffect } from "react";
+import { useEffect, useId, useState } from "react";
@@
const [isCreateDialogOpen, setIsCreateDialogOpen] = useState(false);
const [newSchedule, setNewSchedule] = useState({
@@
is_active: true,
});
+ const activeId = useId();
@@
- <Switch
- id="is_active"
+ <Switch
+ id={activeId}
checked={newSchedule.is_active}
onCheckedChange={(checked) => setNewSchedule({ ...newSchedule, is_active: checked })}
/>
- <Label htmlFor="is_active">Active</Label>
+ <Label htmlFor={activeId}>Active</Label>Also applies to: 85-96, 412-418
🧰 Tools
🪛 GitHub Actions: Code Quality Checks
[error] 3-3: Biome: Organize Imports (FIXABLE). The imports and exports are not sorted.
[error] 3-3: Biome: Import organization suggested fix available. Run Biome organize imports to fix.
🤖 Prompt for AI Agents
In surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
around lines 3, 85-96, and 412-418 the component uses hard-coded/static id
attributes which violate the lint rule; replace those static ids by importing
and calling React's useId() at the top of the component to generate unique ids,
assign the generated id to the form control's id prop, and set the corresponding
Label's htmlFor to that same generated id so the label and control are correctly
linked; ensure multiple controls each get distinct ids (e.g., append suffixes)
and remove the static literal id strings.
| useEffect(() => { | ||
| if (searchSpaceId) { | ||
| fetchSchedules(); | ||
| fetchConnectors(); | ||
| fetchSchedulerStatus(); | ||
| } | ||
| }, [searchSpaceId]); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loading spinner never stops (setLoading(false) missing)
loading is initialized true and never set to false; the page will spin indefinitely.
Apply this diff to load all data and flip loading off:
useEffect(() => {
- if (searchSpaceId) {
- fetchSchedules();
- fetchConnectors();
- fetchSchedulerStatus();
- }
+ if (!searchSpaceId) return;
+ let cancelled = false;
+ const run = async () => {
+ setLoading(true);
+ try {
+ await Promise.all([fetchSchedules(), fetchConnectors(), fetchSchedulerStatus()]);
+ } finally {
+ if (!cancelled) setLoading(false);
+ }
+ };
+ run();
+ return () => {
+ cancelled = true;
+ };
}, [searchSpaceId]);🤖 Prompt for AI Agents
In surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
around lines 105 to 112, the useEffect triggers
fetchSchedules/fetchConnectors/fetchSchedulerStatus but never sets loading back
to false causing an infinite spinner; update the effect to await all three
fetches (e.g., Promise.all or sequential awaits) and call setLoading(false) in a
finally block so loading is turned off regardless of success or error, and
ensure any thrown errors are caught/logged inside the effect to avoid unhandled
rejections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by RecurseML
🔍 Review performed on 6b88c92..c044d42
✨ No bugs found, your code is sparkling clean
✅ Files analyzed, no issues (17)
• .idea/.gitignore
• .idea/SurfSense.iml
• .idea/inspectionProfiles/profiles_settings.xml
• .idea/modules.xml
• .idea/vcs.xml
• surfsense_backend/alembic/versions/23_add_connector_schedules_table.py
• surfsense_backend/app/app.py
• surfsense_backend/app/connectors/clickup_connector.py
• surfsense_backend/app/connectors/jira_connector.py
• surfsense_backend/app/db.py
• surfsense_backend/app/routes/connector_schedules_routes.py
• surfsense_backend/app/routes/scheduler_routes.py
• surfsense_backend/app/schemas/__init__.py
• surfsense_backend/app/schemas/connector_schedule.py
• surfsense_backend/app/services/connector_scheduler_service.py
• surfsense_backend/app/utils/schedule_helpers.py
• surfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx
⏭️ Files skipped (1)
| Locations |
|---|
surfsense_web/package-lock.json |
Summary
Fixes broken date range filtering in Jira and ClickUp connectors, enabling proper incremental data synchronization.
🔧 Changes Made
Jira Connector (
jira_connector.py)ClickUp Connector (
clickup_connector.py)Linear Connector (
linear_connector.py)🚀 Impact
✅ Testing
🔗 Related Issues
Fixes the connector date range filtering issues mentioned in the codebase TODOs.
Closes #406
Types of changes
Testing
Checklist:
High-level PR Summary
This PR fixes broken date range filtering in Jira and ClickUp connectors and introduces a comprehensive automated connector scheduling system. The date filtering fixes enable proper incremental data synchronization by correctly applying date parameters to API queries (fixing TODOs in the connector code). The major addition is a new scheduling infrastructure that allows users to automate connector syncs with HOURLY, DAILY, WEEKLY, or CUSTOM (cron) schedules, including a background scheduler service, database models, API routes, and a React frontend for schedule management.
⏱️ Estimated Review Time: 30-90 minutes
💡 Review Order Suggestion
surfsense_backend/app/db.pysurfsense_backend/alembic/versions/23_add_connector_schedules_table.pysurfsense_backend/app/schemas/connector_schedule.pysurfsense_backend/app/schemas/__init__.pysurfsense_backend/app/utils/schedule_helpers.pysurfsense_backend/app/services/connector_scheduler_service.pysurfsense_backend/app/routes/connector_schedules_routes.pysurfsense_backend/app/routes/scheduler_routes.pysurfsense_backend/app/app.pysurfsense_backend/app/connectors/jira_connector.pysurfsense_backend/app/connectors/clickup_connector.pysurfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx.idea/.gitignore.idea/SurfSense.iml.idea/inspectionProfiles/profiles_settings.xml.idea/modules.xml.idea/vcs.xml.idea/.gitignore.idea/SurfSense.iml.idea/inspectionProfiles/profiles_settings.xml.idea/modules.xml.idea/vcs.xmlSummary by CodeRabbit
High-level PR Summary
This PR fixes broken date range filtering in Jira and ClickUp connectors by implementing proper date-to-timestamp conversion and JQL query logic. Additionally, it introduces a comprehensive automated connector scheduling system that allows users to schedule connector syncs with HOURLY, DAILY, WEEKLY, or CUSTOM (cron) schedules. The implementation includes database schema changes (
ConnectorScheduletable andScheduleTypeenum), a background scheduler service with configurable concurrency, API routes for CRUD operations on schedules, helper utilities for calculating next run times, and a React frontend for managing schedules with status monitoring and manual execution capabilities.⏱️ Estimated Review Time: 30-90 minutes
💡 Review Order Suggestion
surfsense_backend/app/connectors/jira_connector.pysurfsense_backend/app/connectors/clickup_connector.pysurfsense_backend/app/db.pysurfsense_backend/alembic/versions/23_add_connector_schedules_table.pysurfsense_backend/app/schemas/connector_schedule.pysurfsense_backend/app/schemas/__init__.pysurfsense_backend/app/utils/schedule_helpers.pysurfsense_backend/app/services/connector_scheduler_service.pysurfsense_backend/app/routes/connector_schedules_routes.pysurfsense_backend/app/routes/scheduler_routes.pysurfsense_backend/app/app.pysurfsense_web/app/dashboard/[search_space_id]/connectors/schedules/page.tsx.idea/.gitignore.idea/SurfSense.iml.idea/inspectionProfiles/profiles_settings.xml.idea/modules.xml.idea/vcs.xml.idea/.gitignore.idea/SurfSense.iml.idea/inspectionProfiles/profiles_settings.xml.idea/modules.xml.idea/vcs.xml