- Overview
- Installation
- Architecture Diagram
- Component Details
- Network Connectivity
- AAP Deployment Architecture
- AAP Cluster Management
- AAP Cluster Management (Manual Scripts)
- EFM Integration (EDB Failover Manager)
- Troubleshooting
- Ansible Automation
- Disaster Recovery Scenarios
- Scaling Considerations
- EDB Postgres for Kubernetes Architecture
This document describes the architecture of EnterpriseDB Postgres deployed Active/Passive across two clusters in different datacenters with in datacenter replication for the Ansible Automation Platform (AAP). This will acheive a NEAR HA type architecture, especially for failover to the databases synching in region/datacenter. A DR scenario should be exactly for if there is a catastrophic failure. Failing to a in site database should cause little to no intervention needed at the application layer. The main thing to note is for a DR failover any running jobs will be lost, however if it fails in site, the jobs should continue to run UNLESS the controller has a failure.
Preferred: Use the edb.postgres_operations Ansible collection for repeatable, automated installs. The guides below include Ansible playbooks and role usage, plus manual steps if needed.
| Deployment | Description | Guide |
|---|---|---|
| RHEL with Ansible (recommended) | Install EDB Postgres on RHEL 8/9 using the collection playbook and install_postgres_rhel role |
RHEL — Ansible |
| Kubernetes with Ansible (recommended) | Deploy PostgreSQL clusters on OpenShift/Kubernetes using the collection playbooks and deploy_cluster role |
Kubernetes — Ansible |
| RHEL (manual) | Traditional VM-based (systemd, PGD, EPAS); manual install and repo steps | RHEL — Manual |
| Kubernetes (manual) | Container-based; install operator and apply cluster YAML manually | Kubernetes — Manual |
For full Ansible collection usage, variables, and execution environment (AAP), see the collection README and GETTING_STARTED.
The global load balancer provides a single entry point for AAP access:
- DNS:
aap.example.com - Type: Active-Passive (DC1 primary, DC2 standby)
- Health Checks: Monitors AAP Controller availability in both datacenters
- Failover: Automatic failover to DC2 if DC1 becomes unavailable
- Routing: Priority-based routing (100% traffic to DC1 when healthy)
- Failback: Automatic or manual failback to DC1 when it recovers
- Protocols: HTTPS (port 443), WebSocket support for real-time job updates
For OpenShift AAP is deployed on Sepearate OpenShift clusters for high availability and geographic distribution. For RHEL you can do a single install across datacenters however you MUST TURN OFF THE SERVICES ON THE SECONDARY SITE
- Namespace:
ansible-automation-platform - AAP Gateway: 3 replicas for HA
- AAP Controller: 3 replicas for HA
- Automation Hub: 2 replicas
- Database: PostgreSQL cluster (1 primary + 2 replicas) managed by EDB operator
- Route:
aap-dc1.apps.ocp1.example.com
- Namespace:
ansible-automation-platform - AAP Gateway: 3 replicas for HA
- AAP Controller: 3 replicas for HA
- Automation Hub: 2 replicas
- Database: PostgreSQL cluster (1 primary + 2 replicas) managed by EDB operator
- Route:
aap-dc2.apps.ocp2.example.com
The AAP databases are replicated from active to passive datacenter:
- Method: PostgreSQL logical replication (Active → Passive) - Note: AAP's internal database uses logical replication for flexibility
- Direction: DC1 (Active) → DC2 (Passive)
- Mode: Asynchronous replication with minimal lag
- Shared Data: Job templates, inventory, credentials, execution history
- Failover: DC2 database promoted to read-write during failover
- Failback: Data synchronized back to DC1 when it recovers
EDB-managed application database clusters use physical replication:
- Method: PostgreSQL physical replication via streaming replication and WAL shipping
- Primary Method: Streaming replication from Primary to Designated Primary
- Fallback Method: WAL shipping via S3/object store (continuous WAL archiving)
- Within Cluster: Hot standby replicas use streaming replication from primary/designated primary
- Mode: Asynchronous streaming with optional synchronous mode
- Benefits: Block-level replication, faster failover, exact byte-for-byte replica
Users and automation clients connect to AAP through the global load balancer:
- URL:
https://aap.example.com - Protocol: HTTPS/443 with WebSocket support
- Load Balancing: Active-Passive (priority-based)
- Active Target: DC1 AAP (100% traffic when healthy)
- Passive Target: DC2 AAP (standby, only receives traffic during failover)
- Health Checks: Layer 7 health checks to AAP Controller endpoints
- Session Affinity: Sticky sessions for long-running jobs
- TLS Termination: At load balancer or end-to-end encryption
AAP can only talk to one Read Write(RW) database at a time:
- Protocol: PostgreSQL wire protocol (port 5432)
- Access: Via Kubernetes Services (ClusterIP within cluster, Routes/LoadBalancer for remote)
- Authentication: Certificate-based or password authentication
- Encryption: TLS/SSL enforced
- Connection Pooling: PgBouncer for efficient connection management
- Method: PostgreSQL physical replication (streaming + WAL shipping)
- Primary Mechanism: Streaming replication from Primary to Designated Secondaries
- Fallback Mechanism: WAL shipping via S3/object store
- Direction: DC1 (Primary Cluster) → DC2 (Replica Cluster)
- Network: Encrypted tunnel (VPN/Direct Connect/WAN) for streaming replication
- Replication Type: Asynchronous (default) or synchronous (configurable)
- Lag Monitoring: Both AAP instances monitor replication lag via EDB operator metrics
- Alerting: Alerts triggered if lag exceeds threshold (e.g., 30 seconds)
- Automatic Service Updates: EDB operator automatically updates
-rwservice during failover - Cross-Cluster Limitation: Automated failover across Kubernetes clusters must be handled externally (via AAP or higher-level orchestration)
For EDB-Managed Application Databases:
- Application → AAP Controller
- AAP Controller → DC1 Primary Database (via
-rwservice) - DC1 Primary → DC1 Hot Standby Replicas (streaming replication within cluster)
- DC1 Primary → DC2 Designated Primary (streaming replication across clusters)
- DC1 Primary → S3/Object Store (continuous WAL archiving - fallback)
- DC2 Designated Primary → DC2 Hot Standby Replicas (streaming replication within cluster)
EDB-Managed Clusters:
- DC1 Primary Cluster:
- Write operations via
prod-db-rwservice (routes to primary) - Read operations via
prod-db-roservice (routes to hot standby replicas) - Read operations via
prod-db-rservice (routes to any instance)
- Write operations via
- DC2 Replica Cluster:
- Read operations only via
prod-db-replica-roservice (routes to designated primary or replicas) - Cannot accept writes unless promoted
- Read operations only via
- Load Balancing: EDB operator manages service routing automatically
Service Behavior During Failover:
- EDB operator automatically updates
-rwservice to point to newly promoted primary - Applications experience seamless redirection without connection string changes
EDB-Managed PostgreSQL Backups:
- Scheduled backup job (initiated by AAP or CronJob via EDB operator)
- Backup pod created by EDB operator
- Database backup streamed to S3/object store (using Barman Cloud)
- WAL files continuously archived to S3 (automatic by EDB operator)
- WAL archiving serves dual purpose:
- Point-in-time recovery (PITR)
- Fallback replication mechanism for replica clusters
- Replica clusters can recover from WAL archive if streaming replication fails
- AAP monitors backup completion via operator metrics
- Alerts sent if backup fails
Backup Strategy per Datacenter:
- DC1: Full backups + continuous WAL archiving to S3 bucket (primary region)
- DC2: Independent backups to separate S3 bucket (DR region) for redundancy
Comprehensive documentation is available:
- Collection README: ansible_collections/edb/postgres_operations/README.md
- Collection GETTING_STARTED: ansible_collections/edb/postgres_operations/docs/GETTING_STARTED.md
- AAP Management Guide: ansible_collections/edb/postgres_operations/playbooks/AAP_MANAGEMENT.md
- Role READMEs: Individual README files in each role directory
- Installation: RHEL (Ansible / manual) · Kubernetes (Ansible / manual)
- RHEL AAP: docs/rhel-aap-architecture.md · OpenShift AAP: docs/openshift-aap-architecture.md
- Disaster Recovery Scenarios: docs/dr-scenarios.md
- EFM Integration: docs/enterprisefailovermanager.md
- Troubleshooting: docs/troubleshooting.md
- Manual Scripts: docs/manual-scripts-doc.md
- Kubernetes Architecture: docs/install-kubernetes-manual.md#edb-postgres-for-kubernetes-architecture
