From 83c417b0e100bf03c3110b601c8056a1bcc5115c Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Mon, 13 Apr 2026 15:29:46 +0200 Subject: [PATCH 1/5] Document new WAL errors functionality --- .../source-db/postgres-maintenance.mdx | 18 +++++++++++ debugging/error-codes.mdx | 27 +++++++++++++++- .../production-readiness-guide.mdx | 6 +++- maintenance-ops/self-hosting/diagnostics.mdx | 31 ++++++++++++------- 4 files changed, 69 insertions(+), 13 deletions(-) diff --git a/configuration/source-db/postgres-maintenance.mdx b/configuration/source-db/postgres-maintenance.mdx index 1ef5ab3a..47c973f1 100644 --- a/configuration/source-db/postgres-maintenance.mdx +++ b/configuration/source-db/postgres-maintenance.mdx @@ -33,6 +33,24 @@ select slot_name, pg_drop_replication_slot(slot_name) from pg_replication_slots Postgres prevents active slots from being dropped. If it does happen (e.g. while a PowerSync instance is disconnected), PowerSync would automatically re-create the slot, and restart replication. +### WAL Slot Invalidation + +Postgres can invalidate a replication slot when the amount of retained WAL data exceeds the `max_slot_wal_keep_size` limit. This is most likely to happen during a long-running initial snapshot — PowerSync must hold the slot open while copying your entire dataset, and WAL accumulates throughout that time. + +If the slot is invalidated mid-snapshot, PowerSync detects this early and aborts with error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues) rather than continuing a doomed snapshot. The fix is to increase `max_slot_wal_keep_size` on the source database and then redeploy your sync config to trigger a fresh snapshot. + +To check the current `max_slot_wal_keep_size` value: + +```sql +SELECT setting AS max_slot_wal_keep_size +FROM pg_settings +WHERE name = 'max_slot_wal_keep_size'; +``` + +A value of `-1` means unlimited (no cap on WAL retention). If your database has a cap set, make sure it is large enough to cover the full WAL growth expected during an initial snapshot. See [Managing & Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag) for guidance on choosing an appropriate value. + +You can monitor slot health in real time using the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics). The `wal_status`, `safe_wal_size`, and `max_slot_wal_keep_size` fields on each connection object show how much WAL budget remains. The PowerSync Service also logs a warning when less than 50% of the WAL budget remains during a snapshot. + ### Maximum Replication Slots Postgres is configured with a maximum number of replication slots per server. Since each PowerSync instance uses one replication slot for replication and an additional one while deploying a new Sync Streams/Rules version, the maximum number of PowerSync instances connected to one Postgres server is equal to the maximum number of replication slots, minus 1\. diff --git a/debugging/error-codes.mdx b/debugging/error-codes.mdx index 05eed7a2..a6c15c67 100644 --- a/debugging/error-codes.mdx +++ b/debugging/error-codes.mdx @@ -62,6 +62,11 @@ This reference documents PowerSync error codes organized by component, with trou This may occur if there is very deep nesting in JSON or embedded documents. +- **PSYNC_S1005**: + Storage version not supported. + + This could be caused by a downgrade to a version that does not support the current storage version. + ## PSYNC_S11xx: Postgres replication issues - **PSYNC_S1101**: @@ -143,6 +148,15 @@ This reference documents PowerSync error codes organized by component, with trou An alternative is to create explicit policies for the replication role. If you have done that, you may ignore this warning. +- **PSYNC_S1146**: + Replication slot invalidated. + + The replication slot was invalidated by PostgreSQL, typically because WAL retention exceeded `max_slot_wal_keep_size` during a long-running snapshot. Increase `max_slot_wal_keep_size` on the source database and redeploy Sync Streams/Sync Rules to trigger a fresh snapshot. + + Other causes: `rows_removed` (catalog rows needed by the slot were removed), `wal_level_insufficient`, or `idle_timeout` (PostgreSQL 18+). + + See [Managing & Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag) for guidance on sizing `max_slot_wal_keep_size`. + ## PSYNC_S12xx: MySQL replication issues ## PSYNC_S13xx: MongoDB replication issues @@ -235,6 +249,17 @@ This reference documents PowerSync error codes organized by component, with trou Possible causes: - Older data has been cleaned up due to exceeding the retention period. +## PSYNC_S16xx: MSSQL replication issues + +- **PSYNC_S1601**: + A replicated source table's capture instance has been dropped during a polling cycle. + + Possible causes: + - CDC has been disabled for the table. + - The table has been dropped, which also drops the capture instance. + + Replication for the table will only resume once CDC has been re-enabled for the table. + ## PSYNC_S2xxx: Service API - **PSYNC_S2001**: @@ -303,7 +328,7 @@ This does not include auth configuration errors on the service. - **PSYNC_S2203**: IPs in this range are not supported. - Make sure to use a publically-accessible JWKS URI. + Make sure to use a publicly-accessible JWKS URI. - **PSYNC_S2204**: JWKS request failed. diff --git a/maintenance-ops/production-readiness-guide.mdx b/maintenance-ops/production-readiness-guide.mdx index 1ce30c9d..ab4bb909 100644 --- a/maintenance-ops/production-readiness-guide.mdx +++ b/maintenance-ops/production-readiness-guide.mdx @@ -288,7 +288,11 @@ WHERE name = 'max_slot_wal_keep_size' ``` It's recommended to check the current replication slot lag and `max_slot_wal_keep_size` when deploying Sync Streams/Sync Rules changes to your PowerSync Service instance, especially when you're working with large database volumes. -If you notice that the replication lag is greater than the current `max_slot_wal_keep_size` it's recommended to increase value of the `max_slot_wal_keep_size` on the connected source Postgres database to accommodate for the lag and to ensure the PowerSync Service can complete initial replication without further delays. +If you notice that the replication lag is greater than the current `max_slot_wal_keep_size` it's recommended to increase the value of `max_slot_wal_keep_size` on the connected source Postgres database to accommodate for the lag and to ensure the PowerSync Service can complete initial replication without further delays. + +If the slot is invalidated, PowerSync aborts the snapshot early and surfaces error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). After increasing `max_slot_wal_keep_size`, redeploy your sync config to trigger a fresh snapshot. + +You can also monitor slot health in real time using the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics). Each connection object in the response includes `wal_status` (slot status from `pg_replication_slots`), `safe_wal_size` (bytes remaining before potential invalidation), and `max_slot_wal_keep_size` (the configured cap). The PowerSync Service logs a warning when less than 50% of the WAL budget is remaining during a snapshot. ### Managing Replication Slots diff --git a/maintenance-ops/self-hosting/diagnostics.mdx b/maintenance-ops/self-hosting/diagnostics.mdx index 6e6674f6..06ff6b32 100644 --- a/maintenance-ops/self-hosting/diagnostics.mdx +++ b/maintenance-ops/self-hosting/diagnostics.mdx @@ -1,13 +1,9 @@ --- title: "Diagnostics" -description: "How to use the PowerSync Service Diagnostics API" +description: "How to use the PowerSync Service Diagnostics API to inspect replication status, errors, and slot health." --- -All self-hosted PowerSync Service instances ship with a Diagnostics API. -This API provides the following diagnostic information: - -- Connections → Connected backend source database and any active errors associated with the connection. -- Active Sync Streams / Sync Rules → Currently deployed Sync Streams (or legacy Sync Rules) and its status. +All self-hosted PowerSync Service instances ship with a Diagnostics API for inspecting replication state, surfacing errors, and monitoring source database health. ## CLI @@ -17,27 +13,40 @@ If you have the [PowerSync CLI](/tools/cli) installed, use `powersync status` to powersync status # Extract a specific field -powersync status --output=json | jq '.connections[0]' +powersync status --output=json | jq '.data.active_sync_rules' ``` ## Diagnostics API -# Configuration +### Configuration -1. To enable the Diagnostics API, specify an API token in your PowerSync YAML file: +1. Specify an API token in your PowerSync YAML file: ```yaml service.yaml api: tokens: - YOUR_API_TOKEN ``` -Make sure to use a secure API token as part of this configuration + +Use a secure, randomly generated API token. 2. Restart the PowerSync Service. -3. Once configured, send an HTTP request to your PowerSync Service Diagnostics API endpoint. Include the API token set in step 1 as a Bearer token in the Authorization header. +3. Send a POST request to the diagnostics endpoint, passing the token as a Bearer token: ```shell curl -X POST http://localhost:8080/api/admin/v1/diagnostics \ -H "Authorization: Bearer YOUR_API_TOKEN" ``` + +### Response structure + +The response `data` object contains information about: + +**`connections`** — whether PowerSync can reach the configured source database, and any connection-level errors. + +**`active_sync_rules`** — the currently serving sync config (Sync Streams/Sync Rules). Shows which replication slot is in use, whether the initial snapshot has completed, replication lag, which tables are being replicated, and any errors. + +**`deploying_sync_rules`** — only present while a new sync config is being deployed. PowerSync runs the new snapshot in parallel so clients continue to be served by the existing active config. Once the snapshot completes, this section disappears and `active_sync_rules` updates. Errors during deployment (snapshot failures, configuration problems) surface here rather than in `active_sync_rules`. + +For Postgres sources on version 13 or later, each connection entry in `active_sync_rules` also includes `wal_status`, `safe_wal_size`, and `max_slot_wal_keep_size`. These fields show how much WAL budget remains before the replication slot could be invalidated, which is particularly useful to monitor when deploying a new sync config against a large database. See [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues) for details on slot invalidation and how to resolve it. From 6d0e76a675a79589f0c6bcbe5ac54b9351612f6a Mon Sep 17 00:00:00 2001 From: benitav Date: Thu, 16 Apr 2026 13:44:48 +0200 Subject: [PATCH 2/5] Apply suggestions from code review Co-authored-by: Jose Vargas --- configuration/source-db/postgres-maintenance.mdx | 6 ++---- debugging/error-codes.mdx | 6 ++++-- maintenance-ops/production-readiness-guide.mdx | 2 +- maintenance-ops/self-hosting/diagnostics.mdx | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/configuration/source-db/postgres-maintenance.mdx b/configuration/source-db/postgres-maintenance.mdx index 47c973f1..6662b8ac 100644 --- a/configuration/source-db/postgres-maintenance.mdx +++ b/configuration/source-db/postgres-maintenance.mdx @@ -37,14 +37,12 @@ Postgres prevents active slots from being dropped. If it does happen (e.g. while Postgres can invalidate a replication slot when the amount of retained WAL data exceeds the `max_slot_wal_keep_size` limit. This is most likely to happen during a long-running initial snapshot — PowerSync must hold the slot open while copying your entire dataset, and WAL accumulates throughout that time. -If the slot is invalidated mid-snapshot, PowerSync detects this early and aborts with error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues) rather than continuing a doomed snapshot. The fix is to increase `max_slot_wal_keep_size` on the source database and then redeploy your sync config to trigger a fresh snapshot. +If the slot is invalidated mid-snapshot, PowerSync detects this early and aborts with error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues) rather than continuing a doomed snapshot. The fix is to increase `max_slot_wal_keep_size` on the source database and delete the existing replication slot. PowerSync will automatically create a new slot and restart the snapshot. To check the current `max_slot_wal_keep_size` value: ```sql -SELECT setting AS max_slot_wal_keep_size -FROM pg_settings -WHERE name = 'max_slot_wal_keep_size'; +SHOW max_slot_wal_keep_size ``` A value of `-1` means unlimited (no cap on WAL retention). If your database has a cap set, make sure it is large enough to cover the full WAL growth expected during an initial snapshot. See [Managing & Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag) for guidance on choosing an appropriate value. diff --git a/debugging/error-codes.mdx b/debugging/error-codes.mdx index a6c15c67..058ca57d 100644 --- a/debugging/error-codes.mdx +++ b/debugging/error-codes.mdx @@ -151,9 +151,11 @@ This reference documents PowerSync error codes organized by component, with trou - **PSYNC_S1146**: Replication slot invalidated. - The replication slot was invalidated by PostgreSQL, typically because WAL retention exceeded `max_slot_wal_keep_size` during a long-running snapshot. Increase `max_slot_wal_keep_size` on the source database and redeploy Sync Streams/Sync Rules to trigger a fresh snapshot. + The replication slot was invalidated by PostgreSQL, typically because WAL retention exceeded `max_slot_wal_keep_size` during a long-running snapshot. Increase `max_slot_wal_keep_size` on the source database and delete the existing replication slot to recover. PowerSync will create a new slot and restart replication automatically. - Other causes: `rows_removed` (catalog rows needed by the slot were removed), `wal_level_insufficient`, or `idle_timeout` (PostgreSQL 18+). +Other causes: `rows_removed` (catalog rows needed by the slot were removed), `wal_level_insufficient`, or `idle_timeout`. + +`idle_timeout` is a PostgreSQL 18+ slot invalidation, in this case increase `idle_replication_slot_timeout` instead of `max_slot_wal_keep_size`. See [Managing & Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag) for guidance on sizing `max_slot_wal_keep_size`. diff --git a/maintenance-ops/production-readiness-guide.mdx b/maintenance-ops/production-readiness-guide.mdx index ab4bb909..115f6da3 100644 --- a/maintenance-ops/production-readiness-guide.mdx +++ b/maintenance-ops/production-readiness-guide.mdx @@ -290,7 +290,7 @@ WHERE name = 'max_slot_wal_keep_size' It's recommended to check the current replication slot lag and `max_slot_wal_keep_size` when deploying Sync Streams/Sync Rules changes to your PowerSync Service instance, especially when you're working with large database volumes. If you notice that the replication lag is greater than the current `max_slot_wal_keep_size` it's recommended to increase the value of `max_slot_wal_keep_size` on the connected source Postgres database to accommodate for the lag and to ensure the PowerSync Service can complete initial replication without further delays. -If the slot is invalidated, PowerSync aborts the snapshot early and surfaces error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). After increasing `max_slot_wal_keep_size`, redeploy your sync config to trigger a fresh snapshot. +If the slot is invalidated, PowerSync aborts the snapshot early and surfaces error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). After increasing `max_slot_wal_keep_size`, delete the existing replication slot. PowerSync will automatically create a new slot and restart the snapshot. You can also monitor slot health in real time using the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics). Each connection object in the response includes `wal_status` (slot status from `pg_replication_slots`), `safe_wal_size` (bytes remaining before potential invalidation), and `max_slot_wal_keep_size` (the configured cap). The PowerSync Service logs a warning when less than 50% of the WAL budget is remaining during a snapshot. diff --git a/maintenance-ops/self-hosting/diagnostics.mdx b/maintenance-ops/self-hosting/diagnostics.mdx index 06ff6b32..d251fa9e 100644 --- a/maintenance-ops/self-hosting/diagnostics.mdx +++ b/maintenance-ops/self-hosting/diagnostics.mdx @@ -49,4 +49,4 @@ The response `data` object contains information about: **`deploying_sync_rules`** — only present while a new sync config is being deployed. PowerSync runs the new snapshot in parallel so clients continue to be served by the existing active config. Once the snapshot completes, this section disappears and `active_sync_rules` updates. Errors during deployment (snapshot failures, configuration problems) surface here rather than in `active_sync_rules`. -For Postgres sources on version 13 or later, each connection entry in `active_sync_rules` also includes `wal_status`, `safe_wal_size`, and `max_slot_wal_keep_size`. These fields show how much WAL budget remains before the replication slot could be invalidated, which is particularly useful to monitor when deploying a new sync config against a large database. See [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues) for details on slot invalidation and how to resolve it. +For Postgres sources on version 13 or later, each connection entry in `active_sync_rules` also includes `wal_status`, `safe_wal_size`, and `max_slot_wal_keep_size`. These fields show how much WAL budget remains before the replication slot could be invalidated, which is particularly useful to monitor when deploying a new sync config against a large database. When the WAL budget drops below 50%, a warning appears in the sync rules errors array. If the slot is fully invalidated, the error is reported via `last_fatal_error` with code [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). From 4de4ceb3629d95a199c9ed2c8b6fb968288d9e37 Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Fri, 17 Apr 2026 13:31:49 +0200 Subject: [PATCH 3/5] Simplify wording --- .../source-db/postgres-maintenance.mdx | 16 ++++----------- .../production-readiness-guide.mdx | 20 ++++++++----------- 2 files changed, 12 insertions(+), 24 deletions(-) diff --git a/configuration/source-db/postgres-maintenance.mdx b/configuration/source-db/postgres-maintenance.mdx index 6662b8ac..0109f237 100644 --- a/configuration/source-db/postgres-maintenance.mdx +++ b/configuration/source-db/postgres-maintenance.mdx @@ -27,7 +27,7 @@ While this is desired behavior for slot replication downtime, it could result in Inactive slots can be dropped using: -```bash +```sql select slot_name, pg_drop_replication_slot(slot_name) from pg_replication_slots where active = false; ``` @@ -35,19 +35,11 @@ Postgres prevents active slots from being dropped. If it does happen (e.g. while ### WAL Slot Invalidation -Postgres can invalidate a replication slot when the amount of retained WAL data exceeds the `max_slot_wal_keep_size` limit. This is most likely to happen during a long-running initial snapshot — PowerSync must hold the slot open while copying your entire dataset, and WAL accumulates throughout that time. - -If the slot is invalidated mid-snapshot, PowerSync detects this early and aborts with error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues) rather than continuing a doomed snapshot. The fix is to increase `max_slot_wal_keep_size` on the source database and delete the existing replication slot. PowerSync will automatically create a new slot and restart the snapshot. - -To check the current `max_slot_wal_keep_size` value: - -```sql -SHOW max_slot_wal_keep_size -``` +Postgres can invalidate a replication slot when the amount of retained WAL data exceeds the `max_slot_wal_keep_size` limit. This is most likely to happen during a long-running [initial snapshot](/architecture/powersync-service#initial-replication-vs-incremental-replication). PowerSync must hold the slot open while copying your entire dataset, and WAL accumulates throughout that time. -A value of `-1` means unlimited (no cap on WAL retention). If your database has a cap set, make sure it is large enough to cover the full WAL growth expected during an initial snapshot. See [Managing & Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag) for guidance on choosing an appropriate value. +If the slot is invalidated mid-snapshot, PowerSync detects the problem and stops with error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues) instead of finishing a snapshot that will fail. On the source database, increase `max_slot_wal_keep_size` and delete the existing replication slot. PowerSync creates a new slot and restarts the snapshot. -You can monitor slot health in real time using the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics). The `wal_status`, `safe_wal_size`, and `max_slot_wal_keep_size` fields on each connection object show how much WAL budget remains. The PowerSync Service also logs a warning when less than 50% of the WAL budget remains during a snapshot. +During a snapshot, PowerSync warns when less than 50% of the WAL budget remains. You may see this warning in the PowerSync dashboard, in the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics) if you self-host, and in PowerSync Service logs. Increase `max_slot_wal_keep_size` or reduce snapshot work before the slot is invalidated. For how to choose a value, see [Managing & Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag). ### Maximum Replication Slots diff --git a/maintenance-ops/production-readiness-guide.mdx b/maintenance-ops/production-readiness-guide.mdx index 115f6da3..f4a00b84 100644 --- a/maintenance-ops/production-readiness-guide.mdx +++ b/maintenance-ops/production-readiness-guide.mdx @@ -262,7 +262,7 @@ Because PowerSync relies on Postgres logical replication, it's important to cons The WAL growth rate is expected to increase substantially during the initial replication of large datasets with high update frequency, particularly for tables included in the PowerSync publication. -During normal operation (after Sync Streams (or legacy Sync Rules) are deployed) the WAL growth rate is much smaller than the initial replication period, since the PowerSync Service can replicate ~5k operations per second, meaning the WAL lag is typically in the MB range as opposed to the GB range. +During normal operation (after Sync Streams/Sync Rules are deployed) the WAL growth rate is much smaller than the initial replication period, since the PowerSync Service can replicate ~5k operations per second, meaning the WAL lag is typically in the MB range as opposed to the GB range. When deciding what to set the `max_slot_wal_keep_size` configuration parameter the following should be taken in account: 1. Database size - This impacts the time it takes to complete the initial replication from the source Postgres database. @@ -271,7 +271,7 @@ When deciding what to set the `max_slot_wal_keep_size` configuration parameter t To view the current replication slots that are being used by PowerSync you can run the following query: -``` +```sql SELECT slot_name, plugin, slot_type, @@ -281,18 +281,14 @@ FROM pg_replication_slots; ``` To view the current configured value of the `max_slot_wal_keep_size` you can run the following query: -``` -SELECT setting as max_slot_wal_keep_size -FROM pg_settings -WHERE name = 'max_slot_wal_keep_size' -``` -It's recommended to check the current replication slot lag and `max_slot_wal_keep_size` when deploying Sync Streams/Sync Rules changes to your PowerSync Service instance, especially when you're working with large database volumes. -If you notice that the replication lag is greater than the current `max_slot_wal_keep_size` it's recommended to increase the value of `max_slot_wal_keep_size` on the connected source Postgres database to accommodate for the lag and to ensure the PowerSync Service can complete initial replication without further delays. +```sql +SHOW max_slot_wal_keep_size +``` -If the slot is invalidated, PowerSync aborts the snapshot early and surfaces error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). After increasing `max_slot_wal_keep_size`, delete the existing replication slot. PowerSync will automatically create a new slot and restart the snapshot. +If the slot is invalidated mid-snapshot, PowerSync detects the problem and stops replication with error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). On the source database, increase `max_slot_wal_keep_size` and delete the existing replication slot. PowerSync creates a new slot and restarts the snapshot. -You can also monitor slot health in real time using the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics). Each connection object in the response includes `wal_status` (slot status from `pg_replication_slots`), `safe_wal_size` (bytes remaining before potential invalidation), and `max_slot_wal_keep_size` (the configured cap). The PowerSync Service logs a warning when less than 50% of the WAL budget is remaining during a snapshot. +During a snapshot, PowerSync warns when less than 50% of the WAL budget remains. You may see this warning in the PowerSync dashboard, in the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics) if you self-host, and in PowerSync Service logs. Increase `max_slot_wal_keep_size` or reduce snapshot work before the slot is invalidated. Use the considerations above to set a high enough cap. ### Managing Replication Slots @@ -315,7 +311,7 @@ FROM pg_replication_slots where active = false; The alternative to manually checking for inactive replication slots would be to configure the `idle_replication_slot_timeout` configuration parameter on the source Postgres database. -The `idle_replication_slot_timeout` [configuration parameter](https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-IDLE-REPLICATION-SLOT-TIMEOUT) is only available from PostgresSQL 18 and above. +The `idle_replication_slot_timeout` [configuration parameter](https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-IDLE-REPLICATION-SLOT-TIMEOUT) is only available from Postgres 18 onward. The `idle_replication_slot_timeout` will invalidate replication slots that have remained inactive for longer than the value set for the `idle_replication_slot_timeout` parameter. From 9a9893194f4a9acd55c2c3946611052072356534 Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Fri, 17 Apr 2026 13:53:53 +0200 Subject: [PATCH 4/5] Wording polish for diagnostics API --- maintenance-ops/self-hosting/diagnostics.mdx | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/maintenance-ops/self-hosting/diagnostics.mdx b/maintenance-ops/self-hosting/diagnostics.mdx index d251fa9e..2eece2d8 100644 --- a/maintenance-ops/self-hosting/diagnostics.mdx +++ b/maintenance-ops/self-hosting/diagnostics.mdx @@ -45,8 +45,14 @@ The response `data` object contains information about: **`connections`** — whether PowerSync can reach the configured source database, and any connection-level errors. -**`active_sync_rules`** — the currently serving sync config (Sync Streams/Sync Rules). Shows which replication slot is in use, whether the initial snapshot has completed, replication lag, which tables are being replicated, and any errors. +**`active_sync_rules`** — the currently serving sync config (Sync Streams/Sync Rules). Shows which replication slot is in use, whether initial replication has completed, which tables are being replicated, and any replication lag or errors. -**`deploying_sync_rules`** — only present while a new sync config is being deployed. PowerSync runs the new snapshot in parallel so clients continue to be served by the existing active config. Once the snapshot completes, this section disappears and `active_sync_rules` updates. Errors during deployment (snapshot failures, configuration problems) surface here rather than in `active_sync_rules`. +**`deploying_sync_rules`** — only present while a new sync config is being deployed and the initial replication is in progress. PowerSync runs this process in parallel so clients continue to be served by the existing active config. Once initial replication completes, this section disappears and `active_sync_rules` updates. Errors during initial replication surface here rather than in `active_sync_rules`. -For Postgres sources on version 13 or later, each connection entry in `active_sync_rules` also includes `wal_status`, `safe_wal_size`, and `max_slot_wal_keep_size`. These fields show how much WAL budget remains before the replication slot could be invalidated, which is particularly useful to monitor when deploying a new sync config against a large database. When the WAL budget drops below 50%, a warning appears in the sync rules errors array. If the slot is fully invalidated, the error is reported via `last_fatal_error` with code [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). +From version 1.20.5 of the PowerSync Service, each connection entry under `active_sync_rules` includes WAL information so you can see how much budget remains before Postgres could invalidate the replication slot: + +- `wal_status` — slot status from `pg_replication_slots` (Postgres 13+) +- `safe_wal_size` — bytes remaining before potential invalidation +- `max_slot_wal_keep_size` — configured limit in bytes + +This information is especially useful when you deploy a new sync config against a large database. When less than 50% of the WAL budget remains during a snapshot, PowerSync adds a warning to the response. If the slot is invalidated, the response includes error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). \ No newline at end of file From 561e994061d9092ab29f3c379281d7cbfbc2b8f9 Mon Sep 17 00:00:00 2001 From: Benita Volkmann Date: Wed, 22 Apr 2026 16:14:00 +0200 Subject: [PATCH 5/5] Consolidate with PR 404 and wording polish --- .../source-db/postgres-maintenance.mdx | 41 ++++++++++++++----- maintenance-ops/self-hosting/diagnostics.mdx | 35 +++++++++++----- 2 files changed, 55 insertions(+), 21 deletions(-) diff --git a/configuration/source-db/postgres-maintenance.mdx b/configuration/source-db/postgres-maintenance.mdx index 0109f237..2827b83c 100644 --- a/configuration/source-db/postgres-maintenance.mdx +++ b/configuration/source-db/postgres-maintenance.mdx @@ -1,12 +1,13 @@ --- title: "Postgres Maintenance" +description: "Manage Postgres replication slots and WAL lag for reliable PowerSync replication." --- ## Logical Replication Slots Postgres logical replication slots are used to keep track of [replication](/architecture/powersync-service#replication-from-the-source-database) progress (recorded as a [LSN](https://www.postgresql.org/docs/current/datatype-pg-lsn.html)). -Every time a new version of [Sync Streams or Sync Rules](/sync/overview) are deployed, PowerSync creates a new replication slot, then switches over and deletes the old replication slot when the reprocessing of the new Sync Streams/Rules version is done. +Every time a new version of [Sync Streams or Sync Rules](/sync/overview) is deployed, PowerSync creates a new replication slot. Once the new version is fully processed, PowerSync switches to use the new slot and deletes the old one. The replication slots can be viewed using this query: @@ -21,33 +22,51 @@ Example output: | powersync\_1\_c3c8cf21 | 0/70D8240 | 1 | 56 bytes | | powersync\_2\_e62d7e0f | 0/70D8240 | 1 | 56 bytes | -In some cases, a replication slot may remain without being used. In this case, the slot prevents Postgres from deleting older WAL entries. One such example is when a PowerSync instance has been deprovisioned. +In some cases, a replication slot may remain without being used. In this case, the slot prevents Postgres from deleting older WAL entries. For example, this happens when a PowerSync instance has been deprovisioned. -While this is desired behavior for slot replication downtime, it could result in excessive disk usage if the slot is not used anymore. +Keeping unused slots alive prevents WAL cleanup, which can lead to excessive disk usage. If a slot is no longer needed, it should be dropped. Inactive slots can be dropped using: -```sql +```bash select slot_name, pg_drop_replication_slot(slot_name) from pg_replication_slots where active = false; ``` -Postgres prevents active slots from being dropped. If it does happen (e.g. while a PowerSync instance is disconnected), PowerSync would automatically re-create the slot, and restart replication. +Postgres prevents active slots from being dropped. If an active slot is somehow dropped while a PowerSync instance is disconnected, PowerSync will automatically recreate the slot when it reconnects and restart replication. + +### Recovering from an invalidated slot + +A replication slot becomes invalidated when its `wal_status` is `lost`. This happens when the WAL data needed by the slot has been removed, typically because the replication lag exceeded `max_slot_wal_keep_size`. + +When this occurs, you will see an error such as: + +> Replication slot powersync\_1\_xxxx was invalidated (reason: wal\_removed). Increase max\_slot\_wal\_keep\_size on the source database and delete the existing slot to recover. + +To recover: + +1. Increase `max_slot_wal_keep_size` on the source Postgres database to prevent re-occurrence. See the [production readiness guide](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag) for sizing guidance. + +2. Drop the invalidated slot: + +```sql +SELECT pg_drop_replication_slot('powersync_1_xxxx'); +``` -### WAL Slot Invalidation +Replace `powersync_1_xxxx` with the actual slot name from the error message. -Postgres can invalidate a replication slot when the amount of retained WAL data exceeds the `max_slot_wal_keep_size` limit. This is most likely to happen during a long-running [initial snapshot](/architecture/powersync-service#initial-replication-vs-incremental-replication). PowerSync must hold the slot open while copying your entire dataset, and WAL accumulates throughout that time. +3. Restart the PowerSync Service. It will create a new replication slot and begin replication from scratch. -If the slot is invalidated mid-snapshot, PowerSync detects the problem and stops with error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues) instead of finishing a snapshot that will fail. On the source database, increase `max_slot_wal_keep_size` and delete the existing replication slot. PowerSync creates a new slot and restarts the snapshot. +If the slot was invalidated during the initial snapshot (before it completed), the PowerSync Service will not automatically retry. You must drop the invalidated slot manually before the service can recover. -During a snapshot, PowerSync warns when less than 50% of the WAL budget remains. You may see this warning in the PowerSync dashboard, in the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics) if you self-host, and in PowerSync Service logs. Increase `max_slot_wal_keep_size` or reduce snapshot work before the slot is invalidated. For how to choose a value, see [Managing & Monitoring Replication Lag](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag). +If the invalidation reason is `idle_timeout` (Postgres 18+), the slot was invalidated due to inactivity. In this case, increase `idle_replication_slot_timeout` on the source database instead. ### Maximum Replication Slots -Postgres is configured with a maximum number of replication slots per server. Since each PowerSync instance uses one replication slot for replication and an additional one while deploying a new Sync Streams/Rules version, the maximum number of PowerSync instances connected to one Postgres server is equal to the maximum number of replication slots, minus 1\. +Postgres is configured with a maximum number of replication slots per server. Each PowerSync instance uses one replication slot for replication and an additional one while deploying a new Sync Streams or Sync Rules version. The maximum number of PowerSync instances you can connect to one Postgres server is equal to the maximum number of replication slots, minus one. If other clients are also using replication slots, this number is reduced further. -The maximum number of slots can be configured by setting `max_replication_slots` (not all hosting providers expose this), and checked using: +To configure the maximum number of slots, set `max_replication_slots` (though not all hosting providers expose this setting). Check the current value using: ```sql select current_setting('max_replication_slots') diff --git a/maintenance-ops/self-hosting/diagnostics.mdx b/maintenance-ops/self-hosting/diagnostics.mdx index 2eece2d8..7f2d735a 100644 --- a/maintenance-ops/self-hosting/diagnostics.mdx +++ b/maintenance-ops/self-hosting/diagnostics.mdx @@ -7,7 +7,7 @@ All self-hosted PowerSync Service instances ship with a Diagnostics API for insp ## CLI -If you have the [PowerSync CLI](/tools/cli) installed, use `powersync status` to check instance status without calling the API directly. This works with any running PowerSync instance — local or remote. +If you have the [PowerSync CLI](/tools/cli) installed, use `powersync status` to check instance status without calling the API directly. This works with any running PowerSync instance, whether local or remote. ```bash powersync status @@ -41,18 +41,33 @@ curl -X POST http://localhost:8080/api/admin/v1/diagnostics \ ### Response structure -The response `data` object contains information about: +The response `data` object contains: -**`connections`** — whether PowerSync can reach the configured source database, and any connection-level errors. +**`connections`** — whether PowerSync can reach the configured source database and any connection-level errors. -**`active_sync_rules`** — the currently serving sync config (Sync Streams/Sync Rules). Shows which replication slot is in use, whether initial replication has completed, which tables are being replicated, and any replication lag or errors. +**`active_sync_rules`** — the currently serving sync config (Sync Streams or Sync Rules). Contains a `connections[]` array with details about each replication connection including slot name, WAL status, and tables being replicated. Also includes an `errors[]` array for warnings or errors. -**`deploying_sync_rules`** — only present while a new sync config is being deployed and the initial replication is in progress. PowerSync runs this process in parallel so clients continue to be served by the existing active config. Once initial replication completes, this section disappears and `active_sync_rules` updates. Errors during initial replication surface here rather than in `active_sync_rules`. +**`deploying_sync_rules`** — only present while a new sync config is being deployed and the initial replication is in progress. PowerSync runs this process in parallel so clients continue to be served by the existing active config. Once initial replication completes, this section disappears and `active_sync_rules` updates. -From version 1.20.5 of the PowerSync Service, each connection entry under `active_sync_rules` includes WAL information so you can see how much budget remains before Postgres could invalidate the replication slot: +Each connection in `active_sync_rules.connections[]` includes: -- `wal_status` — slot status from `pg_replication_slots` (Postgres 13+) -- `safe_wal_size` — bytes remaining before potential invalidation -- `max_slot_wal_keep_size` — configured limit in bytes +| Field | Description | +| --- | --- | +| `slot_name` | The name of the Postgres replication slot used by this sync rules version. | +| `initial_replication_done` | Whether the initial snapshot has completed. | +| `replication_lag_bytes` | Replication lag in bytes. | +| `wal_status` | The WAL status of the replication slot (`reserved`, `extended`, `unreserved`, or `lost`). | +| `safe_wal_size` | Remaining WAL budget in bytes before the slot risks invalidation. | +| `max_slot_wal_keep_size` | The configured `max_slot_wal_keep_size` value on the source Postgres database. | -This information is especially useful when you deploy a new sync config against a large database. When less than 50% of the WAL budget remains during a snapshot, PowerSync adds a warning to the response. If the slot is invalidated, the response includes error [`PSYNC_S1146`](/debugging/error-codes#psync_s11xx-postgres-replication-issues). \ No newline at end of file +### Errors and warnings + +Warnings and errors appear in the `errors[]` array at the sync rules level (`active_sync_rules.errors[]` or `deploying_sync_rules.errors[]`). This includes: + +- **Replication lag warnings** are raised if no replicated commit has been received in more than 5 minutes (warning level) or 15 minutes (fatal level). +- **WAL budget warnings** appear when the remaining WAL budget drops below 50%. +- **Replication errors** such as `PSYNC_S1146` appear when a replication slot is invalidated (when `wal_status` is `lost`). + + +For guidance on configuring `max_slot_wal_keep_size` and managing replication slots, see [Postgres maintenance](/configuration/source-db/postgres-maintenance). + \ No newline at end of file