Skip to content

feat: systemd support#295

Open
jason-lynch wants to merge 3 commits intofeat/PLAT-417/rename-binaryfrom
feat/PLAT-417/systemd
Open

feat: systemd support#295
jason-lynch wants to merge 3 commits intofeat/PLAT-417/rename-binaryfrom
feat/PLAT-417/systemd

Conversation

@jason-lynch
Copy link
Copy Markdown
Member

Summary

Adds preliminary support for orchestrating pgEdge Enterprise Postgres databases using SystemD instead of Docker Swarm.

Testing

This PR includes a lima-based development environment. See the changes in running-locally.md for complete instructions. If you're already set up to run the lima-based E2E fixtures, you can do:

# install circus
pipx install --include-deps circus

# deploy the dev-lima environment. this is more similar to the compose environment in
# that all ports are exposed on your host machine and we mount the project directory
# into all of the VMs.
make dev-lima-deploy

# that will take a while to set everything up
# once it's done, you can run the environment
make dev-lima-run

# then, in a new terminal, you can use the new dev-lima environment
use-dev-lima
cp1-req get-version
cp1-ssh
# etc..

# you can also run E2Es against this environment
make test-e2e E2E_PARALLEL=4 E2E_FIXTURE=dev-lima

Notes for Reviewers

As of this PR, the control-plane server does not respond to upgrades or other package changes that happen after the server starts. If you install a minor version update, you will be blocked from updating existing databases until you:

  • Performed the same upgrade on all hosts that are running the database
  • Update the postgres_version in your database spec

This will change in a subsequent PR.

PLAT-417

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 9, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a5df0b0f-5702-4b58-979c-01c58567eed2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces preliminary support for managing database instances using SystemD orchestration, alongside the existing Docker Swarm support. Changes include new SystemD orchestrator implementation, updated API schemas, test infrastructure enhancements, development environment tools using Lima, and corresponding service/workflow updates.

Changes

Cohort / File(s) Summary
Build and Environment Configuration
.gitignore, Makefile
Added gitignore entry for lima/data directory; added five phony Make targets (dev-lima-deploy, dev-lima-build, dev-lima-run, dev-lima-reset, dev-lima-teardown) delegating to lima/ subdirectory.
Dependencies and Licensing
go.mod, NOTICE.txt
Updated go.mod to add direct dependencies on github.com/coreos/go-systemd/v22 and github.com/elastic/gosigar, moving go-systemd from indirect to direct; added license notices for gosigar and dbus/v5.
API Schema Updates
api/apiv1/design/database.go, changes/unreleased/Added-20260309-165318.yaml
Updated patroni_port field description to clarify random port assignment when set to 0; added changelog entry for SystemD orchestrator support.
Database Service and Instance Management
server/internal/database/instance*.go, server/internal/database/service.go, server/internal/database/spec.go, server/internal/database/orchestrator.go
Extended Instance struct with Port, PatroniPort, and PgEdgeVersion fields; added InstanceStateUpdateOptions and UpdateState method; added CreatePgBackRestBackup and UpdateInstanceState service methods; updated Orchestrator interface signatures to include port/version parameters and accept *InstanceSpec instead of instanceID; added AllHostIDs to InstanceSpec.
Validation and Configuration
server/internal/api/apiv1/convert.go, server/internal/api/apiv1/validate*.go, server/internal/config/config.go
Updated database spec validation to accept orchestrator parameter and enforce port requirements for SystemD; added SystemD config struct with validation; updated convert.go to pass orchestrator to validation; expanded test coverage for SystemD-specific validation.
E2E Testing Infrastructure
e2e/fixture_test.go, e2e/fixtures/ansible.cfg, e2e/*_test.go (multiple files)
Refactored TestFixture to track orchestrator; replaced S3Enabled() with SkipIfS3Unsupported(), SkipIfServicesUnsupported(), and SkipIfUpgradesUnsupported() conditional skip helpers; updated all test database specs to include PatroniPort: pointerTo(0) alongside Port: pointerTo(0); added interpreter_python = auto_silent to Ansible config.
Development Documentation and Tools
docs/development/running-locally.md, hack/dev-env.zsh, hack/pgedge-cp-env.plugin.zsh
Expanded running-locally guide with SystemD orchestrator section including Lima prerequisites and make commands; added use-dev-lima() environment function and dynamic cp-etcdctl() supporting multiple environments; added ZSH_THEME_PGEDGE_CP_ENV_PROMPT_DEV_LIMA_STYLE theme variable for dev-lima prompt styling.
Lima Development Environment Setup
lima/Makefile, lima/ansible.cfg, lima/config.json, lima/lima-template.yaml, lima/vars.yaml, lima/inventory.yaml
Added Lima Makefile with targets for deploy/build/run/reset/teardown orchestrated via Ansible; configured Ansible settings; created SystemD-based runtime config with localhost binding; defined Lima VM template (Rocky 9, 4 CPU, 8GB RAM, 20GB disk); configured six development hosts inventory and machine variables.
Lima Ansible Automation
lima/deploy.yaml, lima/stop-dbs.yaml, lima/teardown.yaml, lima/roles/*/tasks/main.yaml, lima/roles/*/vars/main.yaml, lima/roles/*/templates/*
Added three orchestration playbooks (deploy, stop-dbs, teardown); implemented four Ansible roles: start_vms (initializes Lima VMs), install_prerequisites (installs PostgreSQL, Patroni, Go, etcd), stop_dbs (manages patroni systemd services), generate_test_config (renders test configuration); added version pinning and architecture-aware tool downloads.
Lima Runtime Script
lima/run.sh, lima/circus.ini
Added bash startup script that configures host-specific environment variables, manages data directories, and runs control-plane binary with prefixed logging; added Circus process manager configuration for SSH-based remote process supervision of six hosts.
SystemD Orchestrator Implementation
server/internal/orchestrator/systemd/orchestrator.go, server/internal/orchestrator/systemd/client.go, server/internal/orchestrator/systemd/packages.go, server/internal/orchestrator/systemd/dnf.go
Implemented core SystemD orchestrator with instance/host management; added DBus-based systemd client for unit lifecycle operations; added package management via DNF for PostgreSQL version detection; implemented parsed version tracking for Spock/PostgreSQL combinations.
SystemD Resource Types
server/internal/orchestrator/systemd/unit.go, server/internal/orchestrator/systemd/patroni_unit.go, server/internal/orchestrator/systemd/patroni_config.go, server/internal/orchestrator/systemd/pgbackrest_restore.go, server/internal/orchestrator/systemd/resources.go
Implemented four resource types: UnitResource for systemd unit file management, PatroniConfig for Patroni clustering, PgBackRestRestore for restore operations, and UnitResource wrapper; added registration function for resource types.
SystemD Dependency Injection
server/internal/orchestrator/systemd/provide.go, server/internal/orchestrator/provide.go
Added systemd provisioning module wiring Client, PackageManager, and Orchestrator; updated root orchestrator provider to support OrchestratorSystemD alongside Swarm; added Start method to Orchestrator interface.
Swarm Orchestrator Updates
server/internal/orchestrator/swarm/orchestrator.go
Updated Swarm orchestrator to implement new Start method and match updated interface signatures for GetInstanceConnectionInfo and CreatePgBackRestBackup.
Application and Service Integration
server/cmd/root.go, server/internal/app/app.go
Added SystemD resource type registration in root command; refactored app initialization to inject and start orchestrator during initialization phase.
Database Service Workflows
server/internal/workflows/activities/create_pgbackrest_backup.go, server/internal/workflows/activities/update_db_state.go, server/internal/workflows/activities/update_planned_instance_states.go, server/internal/workflows/delete_database.go, server/internal/monitor/instance_monitor.go
Updated workflow activities to use UpdateInstanceState instead of UpdateInstance; refactored backup activity to use service method instead of direct orchestrator call; added plan persistence step in delete database workflow; updated monitor to use state-specific updates.
Version and Configuration Utilities
server/internal/ds/versions.go, server/internal/postgres/hba/entry.go
Added MajorString() and MajorVersion() methods to Version; tightened semver regex; updated PostgreSQL pg_hba entry formatting to normalize IP addresses to CIDR notation with /32 or /128 masks; updated test golden files with normalized addresses.

Poem

🐰 Hops excitedly
SystemD joins the dance, no Swarm alone,
Six Lima hosts in circles, fully grown,
Patroni ports now dance when set to zero,
While resources emerge—our fluffy hero!
From DBus whispers to Ansible cheer,
Control-plane orchestration, crystal clear!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: systemd support' accurately reflects the main objective of the PR—adding SystemD orchestrator support as an alternative to Docker Swarm. It is concise, specific, and clearly summarizes the primary change.
Description check ✅ Passed The PR description includes all key sections from the template: a clear summary, comprehensive testing instructions with actual commands, and detailed notes for reviewers. The description is well-organized and provides sufficient context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/PLAT-417/systemd

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 5b65dd1 to 7512d78 Compare March 9, 2026 22:37
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch 4 times, most recently from 0292169 to fecca31 Compare March 10, 2026 14:33
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 3b2cb2d to 1811d33 Compare March 10, 2026 14:33
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch 2 times, most recently from 8cb89f9 to c752911 Compare March 10, 2026 16:39
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 32a39ff to 1ef89d0 Compare March 10, 2026 16:39
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from c752911 to 9c4262c Compare March 10, 2026 21:01
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 32ef891 to 5c8c403 Compare March 11, 2026 12:58
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 9c4262c to e179f4c Compare March 11, 2026 12:58
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 5c8c403 to 79fbfc1 Compare March 11, 2026 13:25
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from e179f4c to 40f2797 Compare March 11, 2026 13:25
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 79fbfc1 to b1b5cae Compare March 11, 2026 13:43
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 40f2797 to 0e319c3 Compare March 11, 2026 13:43
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from b1b5cae to 3b749f2 Compare March 11, 2026 13:49
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 0e319c3 to 900598e Compare March 11, 2026 13:49
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 3b749f2 to ccf2e10 Compare March 12, 2026 19:29
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 900598e to e181cc5 Compare March 12, 2026 19:29
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from ccf2e10 to 7607b74 Compare March 12, 2026 21:17
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from e181cc5 to 42824fe Compare March 12, 2026 21:17
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 7607b74 to 09c6ec7 Compare March 13, 2026 13:13
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 42824fe to 89dca9e Compare March 13, 2026 13:13
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 317ed90 to cff22d3 Compare March 23, 2026 21:25
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 984f974 to 470c226 Compare March 23, 2026 21:25
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from cff22d3 to 5855a34 Compare March 23, 2026 21:29
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 470c226 to 29fd4e0 Compare March 23, 2026 21:29
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 5855a34 to 6427259 Compare March 23, 2026 21:36
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 29fd4e0 to 620982a Compare March 23, 2026 21:36
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 6427259 to 2ae3509 Compare March 23, 2026 21:52
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 620982a to 0eb8c20 Compare March 23, 2026 21:52
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 2ae3509 to 122cbd5 Compare March 23, 2026 23:20
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 0eb8c20 to 2452275 Compare March 23, 2026 23:20
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 122cbd5 to 0dd6130 Compare March 23, 2026 23:33
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from 2452275 to ef68e04 Compare March 23, 2026 23:34
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 0dd6130 to 8e96c53 Compare March 23, 2026 23:36
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from ef68e04 to 3049b1e Compare March 23, 2026 23:36
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 8e96c53 to 80a92a0 Compare March 24, 2026 00:15
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch 2 times, most recently from fea717a to fe5ee10 Compare March 24, 2026 00:24
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 80a92a0 to 40fd983 Compare March 24, 2026 14:19
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from fe5ee10 to 9f7c75f Compare March 24, 2026 14:19
@jason-lynch
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
server/internal/postgres/hba/entry.go (1)

63-67: ⚠️ Potential issue | 🟠 Major

Avoid CIDR normalization when Mask is present.

Line 63 always rewrites bare IPs to CIDR, but Lines 65-67 can still append a separate mask. That can emit mixed address formats in one rule. Keep raw e.Address when e.Mask is non-empty.

Suggested fix
 func (e Entry) String() string {
 	var entry string
 	switch e.Type {
 	case EntryTypeInclude, EntryTypeIncludeIfExists, EntryTypeIncludeDir:
 		return fmt.Sprintf("%-17s %s", e.Type, e.IncludePath)
 	default:
-		entry = fmt.Sprintf("%-7s %-15s %-15s %-23s", e.Type, e.Database, e.User, transformAddress(e.Address))
+		address := e.Address
+		if e.Mask == "" {
+			address = transformAddress(address)
+		}
+		entry = fmt.Sprintf("%-7s %-15s %-15s %-23s", e.Type, e.Database, e.User, address)
 	}

Also applies to: 76-93

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/postgres/hba/entry.go` around lines 63 - 67, The code
currently always calls transformAddress(e.Address) when building the hba entry
(the variable entry) which normalizes bare IPs to CIDR even when e.Mask is set,
leading to mixed formats; change the logic in the entry construction to use the
raw e.Address (do not call transformAddress) whenever e.Mask != "" so the
address and separate mask remain unmodified, and apply the same correction to
the other HBA entry construction block that mirrors this behavior (the block
covering the subsequent entry building around the transformAddress usage for
e.Mask handling).
server/internal/database/spec.go (1)

554-580: ⚠️ Potential issue | 🟠 Major

InstanceSpec.Clone() drops AllHostIDs and can break downstream systemd config generation.

Line 560+ builds a cloned InstanceSpec, but the new field from Line 527 is never copied. Any flow that relies on cloned specs will silently lose peer host membership.

💡 Proposed fix
 func (s *InstanceSpec) Clone() *InstanceSpec {
 	users := make([]*User, len(s.DatabaseUsers))
 	for i, user := range s.DatabaseUsers {
 		users[i] = user.Clone()
 	}

 	return &InstanceSpec{
 		InstanceID:       s.InstanceID,
 		TenantID:         utils.ClonePointer(s.TenantID),
 		DatabaseID:       s.DatabaseID,
 		HostID:           s.HostID,
 		DatabaseName:     s.DatabaseName,
 		NodeName:         s.NodeName,
 		NodeOrdinal:      s.NodeOrdinal,
 		PgEdgeVersion:    s.PgEdgeVersion.Clone(),
 		Port:             utils.ClonePointer(s.Port),
 		PatroniPort:      utils.ClonePointer(s.PatroniPort),
 		CPUs:             s.CPUs,
 		MemoryBytes:      s.MemoryBytes,
 		DatabaseUsers:    users,
 		BackupConfig:     s.BackupConfig.Clone(),
 		RestoreConfig:    s.RestoreConfig.Clone(),
 		PostgreSQLConf:   maps.Clone(s.PostgreSQLConf),
 		ClusterSize:      s.ClusterSize,
 		OrchestratorOpts: s.OrchestratorOpts.Clone(),
+		AllHostIDs:       slices.Clone(s.AllHostIDs),
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/spec.go` around lines 554 - 580,
InstanceSpec.Clone() currently omits the AllHostIDs field so cloned specs lose
peer host membership; update the returned struct in InstanceSpec.Clone to
include a cloned copy of s.AllHostIDs (e.g., deep-copy the slice rather than
assigning the original) so the new InstanceSpec preserves AllHostIDs alongside
the other cloned fields (use the project’s slice-clone helper if present or
create a new slice and copy elements) — modify the return value in
InstanceSpec.Clone to set AllHostIDs: <cloned slice of s.AllHostIDs>.
e2e/cancel_task_test.go (1)

48-55: ⚠️ Potential issue | 🟡 Minor

Dead code after require.NoError.

The require.NoError(t, err) at line 48 will terminate the test if err != nil, making the subsequent error check at lines 52-55 unreachable.

🧹 Suggested fix: remove the redundant error handling
 	})
 	require.NoError(t, err)
-
 	creation_task := create_resp.Task
 	database := create_resp.Database
-	if err != nil {
-		t.Logf("problem creating test db %s", err)
-		return
-	}
 	t.Logf("successfully created cancel task test db")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@e2e/cancel_task_test.go` around lines 48 - 55, The test contains redundant
dead-code: after calling require.NoError(t, err) which fails the test on error,
the subsequent if err != nil { t.Logf(...); return } block is unreachable;
remove that redundant error check and logging (the variables creation_task :=
create_resp.Task and database := create_resp.Database can remain) so the test
relies on require.NoError for failure handling and does not contain the unused
conditional that checks err.
server/internal/api/apiv1/validate.go (2)

212-277: ⚠️ Potential issue | 🟠 Major

Validate the effective SystemD port pair, not just node overrides.

Once a node can inherit db.Port / db.PatroniPort, validatePorts(node.Port, node.PatroniPort, ...) is no longer sufficient. A spec like db.Port=5432 plus node.PatroniPort=5432 currently passes, even though postgres and patroni would bind the same host port at runtime. Compute the effective postgres/patroni ports first, then run both the conflict check and the required-field check on that pair.

💡 Suggested change
-	portPath := appendPath(path, "port")
-	errs = append(errs, validatePorts(node.Port, node.PatroniPort, portPath))
+	portPath := appendPath(path, "port")
+	patroniPortPath := appendPath(path, "patroni_port")
+	effectivePostgresPort := node.Port
+	effectivePatroniPort := node.PatroniPort
+	if orchestrator == config.OrchestratorSystemD {
+		if effectivePostgresPort == nil {
+			effectivePostgresPort = db.Port
+		}
+		if effectivePatroniPort == nil {
+			effectivePatroniPort = db.PatroniPort
+		}
+	}
+	errs = append(errs, validatePorts(effectivePostgresPort, effectivePatroniPort, portPath))
@@
 	switch orchestrator {
 	case config.OrchestratorSystemD:
-		if db.Port == nil && node.Port == nil {
-			portPath := appendPath(path, "port")
-			errs = append(errs, newValidationError(errors.New("port must be defined"), portPath))
+		if effectivePostgresPort == nil {
+			errs = append(errs, newValidationError(errors.New("port must be defined"), portPath))
 		}
-		if db.PatroniPort == nil && node.PatroniPort == nil {
-			portPath := appendPath(path, "patroni_port")
-			errs = append(errs, newValidationError(errors.New("patroni_port must be defined"), portPath))
+		if effectivePatroniPort == nil {
+			errs = append(errs, newValidationError(errors.New("patroni_port must be defined"), patroniPortPath))
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/api/apiv1/validate.go` around lines 212 - 277, validateNode
currently calls validatePorts(node.Port, node.PatroniPort, ...) and then
separately checks db.Port/db.PatroniPort presence, which misses conflicts when a
node inherits db-level ports; compute the effective ports by taking node.Port if
non-nil else db.Port (and same for PatroniPort) inside validateNode, pass those
effective values to validatePorts (so port pair conflict detection uses the
runtime values), and use those same effective values when enforcing the "must be
defined" checks for SystemD (replace db.Port==nil && node.Port==nil with
effective postgres port == nil, and same for patroni_port) while keeping
existing path/validation-error creation logic.

65-95: ⚠️ Potential issue | 🟠 Major

Also reject same-host port reuse across nodes for SystemD specs.

The current loop validates nodes independently, so a spec can still place two nodes on the same host_id and give them the same effective port or patroni_port. Under SystemD those listeners are bound directly on the host, so one of the instances will fail to start. This needs a per-host port-conflict check while iterating spec.Nodes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/api/apiv1/validate.go` around lines 65 - 95,
validateDatabaseSpec currently validates nodes independently and misses
conflicts where two nodes on the same host (host_id) request the same effective
port or patroni_port under SystemD; add a per-host port-usage tracker while
iterating spec.Nodes (inside validateDatabaseSpec) that maps host_id -> set of
used ports and patroni_ports, compute each node's effective port and
patroni_port (use node.Port and node.PatroniPort / utils.FromPointer logic
consistent with validatePorts), and when you see a port already used on the same
host add a validation error via newValidationError (use the same nodePath like
other node errors) describing the duplicate port; ensure this check only applies
for SystemD orchestrator values and include both listener ports (port and
patroni_port) so two nodes on the same host cannot reuse the same listener.
🧹 Nitpick comments (13)
lima/roles/generate_test_config/tasks/main.yaml (1)

7-10: Use FQCN consistently for the template module.

Line 3 uses the fully qualified collection name (ansible.builtin.file), but line 8 uses the short form (template). For consistency and to avoid potential name collisions, use ansible.builtin.template.

💡 Suggested fix for FQCN consistency
 - name: Generate test_config
-  template:
+  ansible.builtin.template:
     src: test_config.yaml.tmpl
     dest: ../e2e/fixtures/outputs/dev-lima.test_config.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/roles/generate_test_config/tasks/main.yaml` around lines 7 - 10, The
task named "Generate test_config" uses the short module name "template"; change
it to the fully-qualified collection name "ansible.builtin.template" to match
the FQCN usage elsewhere and avoid collisions—update the template invocation in
the "Generate test_config" task so its module is "ansible.builtin.template"
while keeping the same src and dest parameters.
lima/roles/stop_dbs/tasks/main.yaml (1)

2-3: Add changed_when for accurate idempotency reporting.

The command module always reports changed status. Adding changed_when: false (since reset-failed is idempotent) improves playbook output accuracy and allows proper --check mode behavior.

💡 Suggested fix
 - name: Reset failed services
   ansible.builtin.command: systemctl reset-failed
+  changed_when: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/roles/stop_dbs/tasks/main.yaml` around lines 2 - 3, The task named
"Reset failed services" currently uses ansible.builtin.command: systemctl
reset-failed which always reports changed; update that task to explicitly set
changed_when: false so the task is idempotent in reporting (and behaves
correctly in --check mode). Locate the task by the name "Reset failed services"
and add changed_when: false under the task definition to prevent false changed
events from this systemctl command.
lima/config.json (1)

7-9: Consider more verbose logging for development.

Setting api_server log level to error may hide useful information when debugging during development. Typically dev configs benefit from debug or info level logging.

💡 Suggested change for more verbose dev logging
   "component_levels": {
-    "api_server": "error"
+    "api_server": "debug"
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/config.json` around lines 7 - 9, The current development config sets
component_levels.api_server to "error", which suppresses useful debug info;
update the value for "api_server" under the "component_levels" object to a more
verbose level such as "debug" or "info" for local/dev use (keep the key name
"component_levels" and the "api_server" identifier unchanged) so that the API
server emits more helpful logs during development; ensure this change is limited
to dev configuration (not production) and restart the service to pick up the new
level.
lima/roles/install_prerequisites/tasks/main.yaml (2)

13-17: GPG check disabled for pgEdge repository.

Disabling GPG verification (disable_gpg_check: true) reduces package integrity assurance. While acceptable for a development environment, consider enabling GPG checks if a signed repository becomes available.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/roles/install_prerequisites/tasks/main.yaml` around lines 13 - 17, The
task labeled "Install pgEdge repository" currently sets disable_gpg_check: true
which bypasses package signature verification; change this to enable GPG checks
by removing or setting disable_gpg_check to false (or make it configurable via a
variable like pgedge_disable_gpg_check with a safe default of false) so the
package module performs signature validation; update any documentation/vars to
explain toggling only for development and ensure the repository provides a
signing key and that the playbook imports it before installation.

82-83: Delve installation task lacks idempotency.

The go install command will run on every playbook execution even if Delve is already installed. Consider adding a creates check or using a conditional.

💡 Proposed fix for idempotency
 - name: Install delve debugger
-  ansible.builtin.command: /usr/local/go/bin/go install github.com/go-delve/delve/cmd/dlv@latest
+  ansible.builtin.command:
+    cmd: /usr/local/go/bin/go install github.com/go-delve/delve/cmd/dlv@latest
+    creates: /root/go/bin/dlv

Note: Adjust the creates path based on the actual GOBIN or GOPATH location where dlv gets installed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/roles/install_prerequisites/tasks/main.yaml` around lines 82 - 83, The
"Install delve debugger" Ansible task currently always runs `go install
github.com/go-delve/delve/cmd/dlv@latest`; make it idempotent by adding a
`creates` parameter (e.g. creates: /usr/local/go/bin/dlv or the actual
GOBIN/GOPATH bin path such as "{{ lookup('env','GOBIN') |
default((lookup('pipe','go env GOPATH') + '/bin')) }}/dlv") or wrap the command
behind a conditional that checks for the `dlv` binary (e.g. `when: not
lookup('ansible.builtin.file', '/usr/local/go/bin/dlv').exists` or use `stat` to
set a fact). Update the task with one of these approaches so the `go install`
only runs when `dlv` is missing.
lima/run.sh (2)

62-65: Stderr output won't be prefixed with host ID.

The sed filter only processes stdout. Any error output from pgedge-control-plane will appear without the [host-X] prefix, which could make debugging multi-host issues more difficult.

💡 Proposed fix to prefix both stdout and stderr
 # The sed prefixes each output line with the host ID.
 /usr/sbin/pgedge-control-plane run \
     --config-path ${LIMA_DIR}/config.json \
-    | sed "s/^/[${PGEDGE_HOST_ID}] /"
+    2>&1 | sed "s/^/[${PGEDGE_HOST_ID}] /"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/run.sh` around lines 62 - 65, Stderr from pgedge-control-plane is not
being piped through sed so error lines lack the [${PGEDGE_HOST_ID}] prefix;
update the invocation of pgedge-control-plane (the "pgedge-control-plane run
--config-path ${LIMA_DIR}/config.json" command) to redirect stderr into stdout
before the pipe (e.g., add "2>&1" or use the shell pipe-fail/operator that
merges streams) so that sed receives both streams and prefixes error output as
well.

45-48: Minor: Trailing whitespace on line 46.

There's trailing whitespace after instances on line 46.

🧹 Proposed fix
     export PGEDGE_DATA_DIR=${LIMA_DIR}/data/${PGEDGE_HOST_ID}
-    export PGEDGE_SYSTEMD__INSTANCE_DATA_DIR=${LIMA_DIR}/data/${PGEDGE_HOST_ID}/instances 
-    
+    export PGEDGE_SYSTEMD__INSTANCE_DATA_DIR=${LIMA_DIR}/data/${PGEDGE_HOST_ID}/instances
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/run.sh` around lines 45 - 48, Remove the trailing whitespace at the end
of the export for PGEDGE_SYSTEMD__INSTANCE_DATA_DIR: edit the export statement
that assigns PGEDGE_SYSTEMD__INSTANCE_DATA_DIR (the line exporting
${LIMA_DIR}/data/${PGEDGE_HOST_ID}/instances) so it ends immediately after
"instances" with no trailing spaces or tabs; this is a simple whitespace cleanup
and does not change variable names or values.
docs/development/running-locally.md (1)

427-428: Minor: Use "macOS" instead of "MacOS".

Apple's operating system should be written as "macOS" (lowercase "m", uppercase "OS").

📝 Proposed fix
-Lima is an easy-to-use virtual machine runner that works well on MacOS.
+Lima is an easy-to-use virtual machine runner that works well on macOS.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/development/running-locally.md` around lines 427 - 428, Replace the
incorrect "MacOS" casing with "macOS" in the documentation sentence that
currently reads "Lima is an easy-to-use virtual machine runner that works well
on MacOS." (search for that exact string or the sentence in
docs/development/running-locally.md) and scan the surrounding text for any other
occurrences of "MacOS" to correct them to "macOS" for consistent branding.
lima/Makefile (1)

29-30: Mark run target as .PHONY.

The run target should be declared as .PHONY to be consistent with the other targets and to ensure it always executes regardless of whether a file named run exists.

💡 Proposed fix
 .PHONY: fix-clocks
 fix-clocks:
 	ansible --become -i ./inventory.yaml all -m command -a 'chronyc -a makestep'

+.PHONY: run
 run: build fix-clocks
 	LIMA_DIR=$(shell pwd) circusd ./circus.ini
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lima/Makefile` around lines 29 - 30, Add the run target to the Makefile's
phony declarations so it always executes regardless of a file named "run";
specifically, add a .PHONY declaration that includes the run target (e.g., add
".PHONY: run" alongside the other .PHONY entries) so the existing "run: build
fix-clocks" rule will always run.
server/internal/orchestrator/systemd/provide.go (1)

26-32: Note: TODO for OS-specific package manager selection.

The hardcoded &Dnf{} with the TODO comment is appropriate for preliminary systemd support targeting RHEL-like systems. Consider tracking this as a follow-up issue if Debian/Ubuntu support is planned.

Would you like me to open an issue to track adding OS detection for package manager selection?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/systemd/provide.go` around lines 26 - 32, The
providePackageManager function currently returns a hardcoded &Dnf{}; implement
OS detection and return the correct implementation: add a helper (e.g.,
isRHELLike or detectPackageManager) that checks /etc/os-release or uname and/or
lsb_release, then in providePackageManager choose &Dnf{} for RHEL-like or &Apt{}
(or DebianPackageManager) for Debian-like systems and return an error for
unsupported OSes; if you prefer to defer, create a tracked follow-up issue and
replace the hardcoded return with a clear TODO referencing providePackageManager
and Dnf to ensure this is not forgotten.
server/internal/orchestrator/systemd/dnf.go (1)

88-89: Consider expanding supported version lists as new releases ship.

The hardcoded supportedPostgresVersions and supportedSpockVersions will need updates when new major versions are released. Consider adding a comment noting this maintenance requirement, or potentially making these configurable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/systemd/dnf.go` around lines 88 - 89, The two
hardcoded version lists supportedPostgresVersions and supportedSpockVersions
need maintenance when new majors ship; update the file to either (a) add a clear
TODO comment above these variables explaining they must be updated on new
releases, or (b) make them configurable (e.g., load from configuration or
environment and fall back to the current literals) so new versions can be added
without code changes; ensure you update any code that references
supportedPostgresVersions and supportedSpockVersions to use the new configurable
source or retain the literals as a default.
server/internal/orchestrator/systemd/pgbackrest_restore.go (1)

224-235: Minor: afero.Fs is invoked twice in the same Create call.

The filesystem is already obtained at line 87-90 but is invoked again here. Consider passing fs as a parameter to renameDataDir to avoid the redundant DI lookup.

♻️ Proposed refactor
-func (p *PgBackRestRestore) renameDataDir(rc *resource.Context) error {
-	fs, err := do.Invoke[afero.Fs](rc.Injector)
-	if err != nil {
-		return err
-	}
-
+func (p *PgBackRestRestore) renameDataDir(fs afero.Fs) error {
 	if err := fs.Rename(p.Paths.Instance.PgData(), p.Paths.Instance.PgDataRestore()); err != nil {
 		return fmt.Errorf("failed to rename pgdata for restore: %w", err)
 	}

 	return nil
 }

And update the caller:

-	err = p.renameDataDir(rc)
+	err = p.renameDataDir(fs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/systemd/pgbackrest_restore.go` around lines 224
- 235, The method PgBackRestRestore.renameDataDir currently performs a redundant
dependency lookup by calling do.Invoke[afero.Fs](rc.Injector) again; change the
signature to accept an afero.Fs parameter (e.g., renameDataDir(fs afero.Fs, rc
*resource.Context) error), remove the do.Invoke call inside renameDataDir, and
use the passed fs to call Rename on p.Paths.Instance.PgData() ->
p.Paths.Instance.PgDataRestore(); update all callers to obtain fs once via
do.Invoke[afero.Fs](rc.Injector) (as already done earlier) and pass that fs into
renameDataDir to avoid the duplicate DI lookup.
server/internal/orchestrator/systemd/unit.go (1)

84-121: Consider cleanup on partial failure in Create.

If LinkUnit, Reload, EnableUnit, or RestartUnit fails after the file is written (line 102), the unit file will exist on disk but the unit may be in an inconsistent state. The resource framework will likely retry Create, which should be idempotent, but consider whether explicit cleanup or rollback would improve reliability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/systemd/unit.go` around lines 84 - 121, The
Create method may leave the unit file on disk when any of client.LinkUnit,
Reload, EnableUnit, or RestartUnit fails; modify UnitResource.Create to perform
rollback on partial failure: after writing and closing f, if any of
LinkUnit/Reload/EnableUnit/RestartUnit returns an error, attempt to undo by (a)
calling a reverse client operation if available (e.g., client.UnlinkUnit(ctx,
path) or client.DisableUnit/StopUnit), (b) calling client.Reload(ctx) again to
apply the undo if applicable, and (c) removing the on-disk file with
os.Remove(path) (ensure f is closed before removal); return the original
operation error (or wrap it) and suppress/aggregate any cleanup errors. Locate
this logic around the calls to LinkUnit, Reload, EnableUnit, RestartUnit in
UnitResource.Create and add a small rollback helper or inline cleanup that uses
the symbols client, path, and the error returned to perform the undo.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@changes/unreleased/Added-20260309-165318.yaml`:
- Around line 1-3: Update the YAML 'body' value to use the official lowercase
project name by replacing "SystemD" with "systemd" in the Added entry (the body
field in changes/unreleased/Added-20260309-165318.yaml) so the text reads
"Preliminary support for managing database instances with systemd."

In `@e2e/fixture_test.go`:
- Around line 153-160: The code assumes ListHosts returned at least one host and
directly indexes hosts.Hosts[0], which can panic; update the logic after calling
cli.ListHosts(ctx) to check that hosts != nil and len(hosts.Hosts) > 0 (or
equivalent) before accessing hosts.Hosts[0], and if empty return a clear error
(e.g., "no hosts returned from ListHosts") instead of proceeding to assign
orchestrator; ensure this guard is applied where orchestrator :=
hosts.Hosts[0].Orchestrator is set.

In `@lima/lima-template.yaml`:
- Around line 15-16: The VM mount entry currently grants write access to the
entire host home via the YAML keys location: "~" and writable: true; change
writable: true to writable: false (or remove the writable key to rely on
read-only default) and instead add narrowly-scoped mount entries for any
specific paths that must be writable (e.g., explicit host paths under mounts
with writable: true), ensuring the location: "~" mount is read-only to avoid
exposing the whole host home.

In `@server/internal/config/config.go`:
- Around line 380-382: The error message in the default branch appends an error
labeled "host_type" while validating the Orchestrator field; update the message
to correctly reference the Orchestrator (e.g., "orchestrator: unsupported
orchestrator %q") so it uses c.Orchestrator and maintains consistent labeling in
the errs append call inside the same switch/default block in config.go.

In `@server/internal/ds/versions.go`:
- Around line 67-70: Guard MajorVersion() against nil/empty receiver and avoid
slice aliasing: if v is nil or has zero components return &Version{Components:
nil} (or an empty slice) instead of slicing into v.Components, and when
returning the first component use a defensive copy (e.g., use slices.Clone on
v.Components[:1] similar to Clone()) so the returned Version does not share
backing storage with v; update MajorVersion to check len(v.Components) and
return a cloned single-element slice.

In `@server/internal/monitor/instance_monitor.go`:
- Around line 104-108: When transitioning an instance to
database.InstanceStateAvailable you must also clear the persisted error so the
API doesn't surface stale failure messages; update the
m.dbSvc.UpdateInstanceState call (or switch to using updateInstanceStatus) so
the InstanceStateUpdateOptions for m.dbSvc.UpdateInstanceState includes clearing
the stored error (e.g., set the status.Error/instance.Error to nil/empty) when
State == database.InstanceStateAvailable; reference m.dbSvc.UpdateInstanceState,
database.InstanceStateUpdateOptions, updateInstanceStatus(), and
storedToInstance()/instance.Error to locate and ensure the persisted error is
cleared on recovery.

In `@server/internal/orchestrator/systemd/client.go`:
- Around line 92-100: The waitForPid function incorrectly treats non-ESRCH
errors (like EPERM) from os.Process.Signal(syscall.Signal(0)) as “process gone”
and it ignores caller cancellation; change waitForPid to accept a
context.Context parameter and inside its loop only consider the process gone
when Signal(0) returns an error that unwraps to syscall.ESRCH, treat EPERM (or
other permission errors) as the process still present, and select on ctx.Done()
as well as the timeout ticker so the function returns promptly when the caller
cancels; update the caller site that calls waitForPid (the block that currently
calls waitForPid(pid, stopTimeout)) to pass the existing context variable into
the new waitForPid(ctx, pid, stopTimeout) signature.
- Around line 57-74: The StartUnit method (and the analogous functions receiving
job results) currently reads the job result from resCh without checking its
value and blocks without honoring context cancellation; update StartUnit (and
the other unit-operation methods that read res := <-resCh) to use a select that
waits on either resCh or ctx.Done(), returning a context-aware error if ctx is
canceled, and validate the job result string (res) — treat any value other than
"done" (e.g. "failed", "canceled", "timeout", "dependency", "skipped") as an
operation error and return a formatted error including the response and unit
name; ensure you keep the existing pid/logging but return non-nil on non-"done"
responses.

In `@server/internal/orchestrator/systemd/orchestrator.go`:
- Around line 374-376: GenerateInstanceRestoreResources is mutating the caller's
InstanceSpec by setting spec.InPlaceRestore = true before calling
GenerateInstanceResources, which can break retries or later comparisons; fix it
by making a shallow copy (or deep copy if necessary) of the provided
InstanceSpec inside GenerateInstanceRestoreResources, set InPlaceRestore = true
on that copy, and pass the copy to GenerateInstanceResources so the original
spec pointer/values remain unchanged (reference:
GenerateInstanceRestoreResources, GenerateInstanceResources,
spec.InPlaceRestore).
- Around line 499-515: validatePort/checkPortAvailable currently allow 0 and
out-of-range values because checkPortAvailable binds an ephemeral port; update
validation to reject port 0 and any port outside 1-65535 before probing
availability. Specifically, in validatePort (and/or at the top of
checkPortAvailable) add a range check for the dereferenced port value and return
a clear error like "port must be between 1 and 65535" if out of range; only call
net.Listen to probe availability after the range check and keep the existing
ptrEqual behavior intact.
- Around line 260-275: The UnitResource patroniUnit is incorrectly using the
local host config (o.cfg.HostID) for HostID; change patroniUnit.HostID to use
spec.HostID so the resource is anchored to the instance identity. Locate the
patroniUnit construction (the UnitResource literal in orchestrator.go) and
replace the HostID assignment from o.cfg.HostID to spec.HostID to ensure the
unit is instance-scoped and not tied to the local config.
- Around line 361-363: The SystemD orchestrator currently returns
"unimplemented" from GenerateServiceInstanceResources and
GetServiceInstanceStatus which will crash workflows that unconditionally call
these (e.g., plan_update and ServiceInstanceMonitor); fix by adding an explicit
orchestrator-type guard before any service-instance use or by implementing these
functions for SystemD: update callers (plan_update, ServiceInstanceMonitor
construction) to check the orchestrator type on the given ServiceInstanceSpec
and skip/return a no-op when not supported, or implement
GenerateServiceInstanceResources and GetServiceInstanceStatus on Orchestrator to
handle SystemD service units (produce appropriate
database.ServiceInstanceResources from database.ServiceInstanceSpec and return
status), and ensure tests cover both gating and functional behavior for SystemD.

In `@server/internal/orchestrator/systemd/packages.go`:
- Around line 47-58: PackageCmp and InstalledPostgresCmp can panic on nil
fields; add defensive nil checks before dereferencing Version and Postgres: in
PackageCmp verify a and b are non-nil and that a.Version and b.Version are
non-nil and return a sensible ordering (e.g., treat nil < non-nil or equal) or
error value; in InstalledPostgresCmp verify a and b and their Postgres fields
and nested Postgres.Version are non-nil and apply the same ordering rule; update
callers or document behavior if you choose to treat nils as errors, and
reference the InstalledPackage, InstalledPostgres, PackageCmp and
InstalledPostgresCmp symbols when making the changes.

---

Outside diff comments:
In `@e2e/cancel_task_test.go`:
- Around line 48-55: The test contains redundant dead-code: after calling
require.NoError(t, err) which fails the test on error, the subsequent if err !=
nil { t.Logf(...); return } block is unreachable; remove that redundant error
check and logging (the variables creation_task := create_resp.Task and database
:= create_resp.Database can remain) so the test relies on require.NoError for
failure handling and does not contain the unused conditional that checks err.

In `@server/internal/api/apiv1/validate.go`:
- Around line 212-277: validateNode currently calls validatePorts(node.Port,
node.PatroniPort, ...) and then separately checks db.Port/db.PatroniPort
presence, which misses conflicts when a node inherits db-level ports; compute
the effective ports by taking node.Port if non-nil else db.Port (and same for
PatroniPort) inside validateNode, pass those effective values to validatePorts
(so port pair conflict detection uses the runtime values), and use those same
effective values when enforcing the "must be defined" checks for SystemD
(replace db.Port==nil && node.Port==nil with effective postgres port == nil, and
same for patroni_port) while keeping existing path/validation-error creation
logic.
- Around line 65-95: validateDatabaseSpec currently validates nodes
independently and misses conflicts where two nodes on the same host (host_id)
request the same effective port or patroni_port under SystemD; add a per-host
port-usage tracker while iterating spec.Nodes (inside validateDatabaseSpec) that
maps host_id -> set of used ports and patroni_ports, compute each node's
effective port and patroni_port (use node.Port and node.PatroniPort /
utils.FromPointer logic consistent with validatePorts), and when you see a port
already used on the same host add a validation error via newValidationError (use
the same nodePath like other node errors) describing the duplicate port; ensure
this check only applies for SystemD orchestrator values and include both
listener ports (port and patroni_port) so two nodes on the same host cannot
reuse the same listener.

In `@server/internal/database/spec.go`:
- Around line 554-580: InstanceSpec.Clone() currently omits the AllHostIDs field
so cloned specs lose peer host membership; update the returned struct in
InstanceSpec.Clone to include a cloned copy of s.AllHostIDs (e.g., deep-copy the
slice rather than assigning the original) so the new InstanceSpec preserves
AllHostIDs alongside the other cloned fields (use the project’s slice-clone
helper if present or create a new slice and copy elements) — modify the return
value in InstanceSpec.Clone to set AllHostIDs: <cloned slice of s.AllHostIDs>.

In `@server/internal/postgres/hba/entry.go`:
- Around line 63-67: The code currently always calls transformAddress(e.Address)
when building the hba entry (the variable entry) which normalizes bare IPs to
CIDR even when e.Mask is set, leading to mixed formats; change the logic in the
entry construction to use the raw e.Address (do not call transformAddress)
whenever e.Mask != "" so the address and separate mask remain unmodified, and
apply the same correction to the other HBA entry construction block that mirrors
this behavior (the block covering the subsequent entry building around the
transformAddress usage for e.Mask handling).

---

Nitpick comments:
In `@docs/development/running-locally.md`:
- Around line 427-428: Replace the incorrect "MacOS" casing with "macOS" in the
documentation sentence that currently reads "Lima is an easy-to-use virtual
machine runner that works well on MacOS." (search for that exact string or the
sentence in docs/development/running-locally.md) and scan the surrounding text
for any other occurrences of "MacOS" to correct them to "macOS" for consistent
branding.

In `@lima/config.json`:
- Around line 7-9: The current development config sets
component_levels.api_server to "error", which suppresses useful debug info;
update the value for "api_server" under the "component_levels" object to a more
verbose level such as "debug" or "info" for local/dev use (keep the key name
"component_levels" and the "api_server" identifier unchanged) so that the API
server emits more helpful logs during development; ensure this change is limited
to dev configuration (not production) and restart the service to pick up the new
level.

In `@lima/Makefile`:
- Around line 29-30: Add the run target to the Makefile's phony declarations so
it always executes regardless of a file named "run"; specifically, add a .PHONY
declaration that includes the run target (e.g., add ".PHONY: run" alongside the
other .PHONY entries) so the existing "run: build fix-clocks" rule will always
run.

In `@lima/roles/generate_test_config/tasks/main.yaml`:
- Around line 7-10: The task named "Generate test_config" uses the short module
name "template"; change it to the fully-qualified collection name
"ansible.builtin.template" to match the FQCN usage elsewhere and avoid
collisions—update the template invocation in the "Generate test_config" task so
its module is "ansible.builtin.template" while keeping the same src and dest
parameters.

In `@lima/roles/install_prerequisites/tasks/main.yaml`:
- Around line 13-17: The task labeled "Install pgEdge repository" currently sets
disable_gpg_check: true which bypasses package signature verification; change
this to enable GPG checks by removing or setting disable_gpg_check to false (or
make it configurable via a variable like pgedge_disable_gpg_check with a safe
default of false) so the package module performs signature validation; update
any documentation/vars to explain toggling only for development and ensure the
repository provides a signing key and that the playbook imports it before
installation.
- Around line 82-83: The "Install delve debugger" Ansible task currently always
runs `go install github.com/go-delve/delve/cmd/dlv@latest`; make it idempotent
by adding a `creates` parameter (e.g. creates: /usr/local/go/bin/dlv or the
actual GOBIN/GOPATH bin path such as "{{ lookup('env','GOBIN') |
default((lookup('pipe','go env GOPATH') + '/bin')) }}/dlv") or wrap the command
behind a conditional that checks for the `dlv` binary (e.g. `when: not
lookup('ansible.builtin.file', '/usr/local/go/bin/dlv').exists` or use `stat` to
set a fact). Update the task with one of these approaches so the `go install`
only runs when `dlv` is missing.

In `@lima/roles/stop_dbs/tasks/main.yaml`:
- Around line 2-3: The task named "Reset failed services" currently uses
ansible.builtin.command: systemctl reset-failed which always reports changed;
update that task to explicitly set changed_when: false so the task is idempotent
in reporting (and behaves correctly in --check mode). Locate the task by the
name "Reset failed services" and add changed_when: false under the task
definition to prevent false changed events from this systemctl command.

In `@lima/run.sh`:
- Around line 62-65: Stderr from pgedge-control-plane is not being piped through
sed so error lines lack the [${PGEDGE_HOST_ID}] prefix; update the invocation of
pgedge-control-plane (the "pgedge-control-plane run --config-path
${LIMA_DIR}/config.json" command) to redirect stderr into stdout before the pipe
(e.g., add "2>&1" or use the shell pipe-fail/operator that merges streams) so
that sed receives both streams and prefixes error output as well.
- Around line 45-48: Remove the trailing whitespace at the end of the export for
PGEDGE_SYSTEMD__INSTANCE_DATA_DIR: edit the export statement that assigns
PGEDGE_SYSTEMD__INSTANCE_DATA_DIR (the line exporting
${LIMA_DIR}/data/${PGEDGE_HOST_ID}/instances) so it ends immediately after
"instances" with no trailing spaces or tabs; this is a simple whitespace cleanup
and does not change variable names or values.

In `@server/internal/orchestrator/systemd/dnf.go`:
- Around line 88-89: The two hardcoded version lists supportedPostgresVersions
and supportedSpockVersions need maintenance when new majors ship; update the
file to either (a) add a clear TODO comment above these variables explaining
they must be updated on new releases, or (b) make them configurable (e.g., load
from configuration or environment and fall back to the current literals) so new
versions can be added without code changes; ensure you update any code that
references supportedPostgresVersions and supportedSpockVersions to use the new
configurable source or retain the literals as a default.

In `@server/internal/orchestrator/systemd/pgbackrest_restore.go`:
- Around line 224-235: The method PgBackRestRestore.renameDataDir currently
performs a redundant dependency lookup by calling
do.Invoke[afero.Fs](rc.Injector) again; change the signature to accept an
afero.Fs parameter (e.g., renameDataDir(fs afero.Fs, rc *resource.Context)
error), remove the do.Invoke call inside renameDataDir, and use the passed fs to
call Rename on p.Paths.Instance.PgData() -> p.Paths.Instance.PgDataRestore();
update all callers to obtain fs once via do.Invoke[afero.Fs](rc.Injector) (as
already done earlier) and pass that fs into renameDataDir to avoid the duplicate
DI lookup.

In `@server/internal/orchestrator/systemd/provide.go`:
- Around line 26-32: The providePackageManager function currently returns a
hardcoded &Dnf{}; implement OS detection and return the correct implementation:
add a helper (e.g., isRHELLike or detectPackageManager) that checks
/etc/os-release or uname and/or lsb_release, then in providePackageManager
choose &Dnf{} for RHEL-like or &Apt{} (or DebianPackageManager) for Debian-like
systems and return an error for unsupported OSes; if you prefer to defer, create
a tracked follow-up issue and replace the hardcoded return with a clear TODO
referencing providePackageManager and Dnf to ensure this is not forgotten.

In `@server/internal/orchestrator/systemd/unit.go`:
- Around line 84-121: The Create method may leave the unit file on disk when any
of client.LinkUnit, Reload, EnableUnit, or RestartUnit fails; modify
UnitResource.Create to perform rollback on partial failure: after writing and
closing f, if any of LinkUnit/Reload/EnableUnit/RestartUnit returns an error,
attempt to undo by (a) calling a reverse client operation if available (e.g.,
client.UnlinkUnit(ctx, path) or client.DisableUnit/StopUnit), (b) calling
client.Reload(ctx) again to apply the undo if applicable, and (c) removing the
on-disk file with os.Remove(path) (ensure f is closed before removal); return
the original operation error (or wrap it) and suppress/aggregate any cleanup
errors. Locate this logic around the calls to LinkUnit, Reload, EnableUnit,
RestartUnit in UnitResource.Create and add a small rollback helper or inline
cleanup that uses the symbols client, path, and the error returned to perform
the undo.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4f75273a-59c4-4b48-b12b-634c6c4d1151

📥 Commits

Reviewing files that changed from the base of the PR and between 40fd983 and 9f7c75f.

⛔ Files ignored due to path filters (8)
  • api/apiv1/gen/control_plane/service.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/client/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/server/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.yaml is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.yaml is excluded by !**/gen/**
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (74)
  • .gitignore
  • Makefile
  • NOTICE.txt
  • api/apiv1/design/database.go
  • changes/unreleased/Added-20260309-165318.yaml
  • docs/development/running-locally.md
  • e2e/backup_restore_test.go
  • e2e/cancel_task_test.go
  • e2e/custom_db_create_test.go
  • e2e/db_create_test.go
  • e2e/db_update_add_node_test.go
  • e2e/failover_test.go
  • e2e/fixture_test.go
  • e2e/fixtures/ansible.cfg
  • e2e/load_test.go
  • e2e/minor_version_upgrade_test.go
  • e2e/service_provisioning_test.go
  • e2e/switchover_test.go
  • e2e/whole_cluster_test.go
  • go.mod
  • hack/dev-env.zsh
  • hack/pgedge-cp-env.plugin.zsh
  • lima/Makefile
  • lima/ansible.cfg
  • lima/circus.ini
  • lima/config.json
  • lima/deploy.yaml
  • lima/inventory.yaml
  • lima/lima-template.yaml
  • lima/roles/generate_test_config/tasks/main.yaml
  • lima/roles/generate_test_config/templates/test_config.yaml.tmpl
  • lima/roles/install_prerequisites/tasks/main.yaml
  • lima/roles/install_prerequisites/vars/main.yaml
  • lima/roles/start_vms/tasks/main.yaml
  • lima/roles/stop_dbs/tasks/main.yaml
  • lima/roles/teardown_vms/tasks/main.yaml
  • lima/run.sh
  • lima/stop-dbs.yaml
  • lima/teardown.yaml
  • lima/vars.yaml
  • server/cmd/root.go
  • server/internal/api/apiv1/convert.go
  • server/internal/api/apiv1/post_init_handlers.go
  • server/internal/api/apiv1/validate.go
  • server/internal/api/apiv1/validate_test.go
  • server/internal/app/app.go
  • server/internal/config/config.go
  • server/internal/database/instance.go
  • server/internal/database/instance_resource.go
  • server/internal/database/instance_store.go
  • server/internal/database/orchestrator.go
  • server/internal/database/service.go
  • server/internal/database/spec.go
  • server/internal/ds/versions.go
  • server/internal/monitor/instance_monitor.go
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/enable_fast_basebackup.yaml
  • server/internal/orchestrator/common/golden_test/TestPatroniConfigGenerator/minimal_systemd.yaml
  • server/internal/orchestrator/provide.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/systemd/client.go
  • server/internal/orchestrator/systemd/dnf.go
  • server/internal/orchestrator/systemd/orchestrator.go
  • server/internal/orchestrator/systemd/packages.go
  • server/internal/orchestrator/systemd/patroni_config.go
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/orchestrator/systemd/pgbackrest_restore.go
  • server/internal/orchestrator/systemd/provide.go
  • server/internal/orchestrator/systemd/resources.go
  • server/internal/orchestrator/systemd/unit.go
  • server/internal/postgres/hba/entry.go
  • server/internal/workflows/activities/create_pgbackrest_backup.go
  • server/internal/workflows/activities/update_db_state.go
  • server/internal/workflows/activities/update_planned_instance_states.go
  • server/internal/workflows/delete_database.go

@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 40fd983 to 307f1e9 Compare March 24, 2026 18:15
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch 2 times, most recently from b56d8cd to bf6b508 Compare March 25, 2026 18:16
Adds an orchestrator implementation for SystemD. As of this commit,
you'll need to install Postgres on the host before starting the Control
Plane:

```
dnf install -y epel-release dnf
dnf config-manager --set-enabled crb
dnf update -y --allowerasing
dnf install -y https://dnf.pgedge.com/reporpm/pgedge-release-latest.noarch.rpm
dnf install -y \
      pgedge-postgresql18 \
      pgedge-spock50_18 \
      pgedge-snowflake_18 \
      pgedge-lolor_18 \
      pgedge-postgresql18-contrib \
      pgedge-pgbackrest \
      pgedge-python3-psycopg2 \
      python3-pip
pip install 'patroni[etcd,jsonlogger]==4.1.0'
```

PLAT-417
Adds a development environment for the SystemD orchestrator, called
`dev-lima`. Just like the `compose` environment, this environment
exposes all services and databases on your host machine.
Adds support for running E2Es against the `dev-lima` environment.

PLAT-417
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/rename-binary branch from 307f1e9 to 6a1d107 Compare March 25, 2026 19:38
@jason-lynch jason-lynch force-pushed the feat/PLAT-417/systemd branch from bf6b508 to d41571c Compare March 25, 2026 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant