Skip to content

Conversation

@tw4l
Copy link
Member

@tw4l tw4l commented Nov 18, 2025

Fixes #2957

Full backend and frontend implementation, with a new email notification to org admins when a crawl is paused because an org quota has been reached.

Backend changes

  • Modify operator to auto-pause crawls when quotas are reached or archiving is disabled rather than stopping the crawls
  • Add new crawl states: paused_storage_quota_reached, paused_time_quota_reached, paused_org_readonly
  • Add uploaded WACZs to org storage totals immediately after upload so that auto-paused crawls will actually put the org's bytesStored above the storage quota
  • Send an email from new template to all org admins when a crawl is auto-paused with information about what to do
  • Fix datetime deprecation in tests

Frontend changes

  • Add new paused crawl states
  • Update checks throughout frontend for whether crawl is paused to compare against all paused states

Needs attention

  • There is a bug/race condition where sometimes when a crawl is pausing, the uploaded WACZ's size is added to status.filesAddedSize, then added again to stats.size (see TODO comment in crawl operator code) again, which effectively doubles the stats.size of the crawl and results in the crawl seeming larger than it is. I've attempted a few solutions for this such as not adding status.filesAddedSize to stats.size is the crawl is pausing, but no solution I've attempted has consistently resolved the issue without introducing other side effects. I think this may have a downstream effect at times on the storage quota check in is_crawl_stopping - I have that check now subtracting the size of already-uploaded WACZs from the active crawl size that's used in checking whether active crawls will put the org over its storage quota, but if it's inconsistent whether the previously-uploaded WACZs are included in stats.size or not, the check might become inaccurate at times.
  • In commit 217e935, I've attempted to fix how workflow crawl counts are handled - previously, every crawl (whether successful or failed) would increment crawlSuccessfulCount - this change could use a second pair of eyes to make sure it makes sense - I'm not entirely sure crawlSuccessfulCount is intended to mean crawls that ended with a successful state, or just crawls that completed in any form. This field does not appear to be used in the frontend in any form, and might be inconsistent if we switch how it's counted now without a migration, so maybe this is handled better separately?

@tw4l tw4l force-pushed the issue-2957-pause-crawl-on-quota-reached branch from f7568a3 to 217e935 Compare November 18, 2025 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: When a quota is reached, the crawl should be paused instead of stopped.

2 participants