-
Notifications
You must be signed in to change notification settings - Fork 295
Add two new configuration options for periodic retained WAL size check in scaled down mode #3274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3274 +/- ##
==========================================
- Coverage 87.79% 87.76% -0.03%
==========================================
Files 18 18
Lines 1663 1676 +13
Branches 420 425 +5
==========================================
+ Hits 1460 1471 +11
- Misses 201 203 +2
Partials 2 2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
35d985e to
ca3a175
Compare
4730f9c to
fd65b33
Compare
msfstef
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I assume this does not work since there's no querying of the wal size (and no tests which I assume is for the above reason?)
92a696c to
38a1a84
Compare
fd65b33 to
bd2bd19
Compare
…leConnection and use OneOffConnection
bd2bd19 to
f2f0ec9
Compare
This comment has been minimized.
This comment has been minimized.
To avoid the failure state where Electric can no longer connect to the db without first doing the revalidation
This PR introduces periodic monitoring of retained WAL size when Electric has scaled down its database connections due to inactivity. If during one of these periodic checks the retained WAL is detected to have grown beyond the configured threshold, Electric wakes up the connection subsystem to resume replication stream processing.
Core changes
Connection.Restarterpg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)Connection.ManagerWhen it validates connection options, it stores the validated ones in StackConfig. This can then be used by
Connection.Restarterfor one-off DB queries to check the retained WAL size. WhenConnection.Managerrestarts (after an error or when the connection subsystem is woken up), it erases these options from StackConfig and repeats the validation process as before. This is to prevent the system from ending up in an invalid state by ensuring that atConnection.Managerstartup it will always get the same starting connection options to work with.Added two new configuration options:
ELECTRIC_REPLICATION_IDLE_WAL_SIZE_CHECK_PERIOD: How often to check retained WAL size when scaled down (default: 1 hour)ELECTRIC_REPLICATION_IDLE_WAL_SIZE_THRESHOLD: WAL size threshold that triggers reconnection (default: 100 MB)Both options support human-readable time/size formats.
Refactoring
Introduced a new module name
OneOffConnectionthat wrapPostgrex.SimpleConnectionand provides a simple API for opening a one-off DB connection, running a query (using the simple protocol) and getting the result back, all synchronously.Reimplemented the lock breaking logic using
OneOffConnectionand removedLockBreakerConnectionsince it was no longer necessary to have as a separate module.Refactored
ConnectionResolver, replacing its ad-hoc wrapping ofPostgrex.SimpleConnectionwithOneOffConnection.Testing
Added a new integration test (
integration-tests/tests/wal-size-check-while-scaled-down.lux) that verifies Electric's handling of two cases during its periodic WAL size check: 1) WAL size under the threshold; 2) WAL size has exceeded the threshold.Closes #3260.