diff --git a/business_continuity/backup_restore/backup_arch.adoc b/business_continuity/backup_restore/backup_arch.adoc index 93877bbb30..13cf513b8b 100644 --- a/business_continuity/backup_restore/backup_arch.adoc +++ b/business_continuity/backup_restore/backup_arch.adoc @@ -153,4 +153,4 @@ You can backup third-party resources with cluster backup and restore by adding t == Additional resources Learn more about the policies and capabilities of the backup and restore component by going to -xref:../backup_restore/backup_validate.adoc#backup-validation-using-a-policy[Validating your backup or restore configurations]. \ No newline at end of file +xref:../backup_restore/backup_validate.adoc#backup-validation-using-a-policy[Validating your backup or restore configurations]. diff --git a/business_continuity/backup_restore/backup_intro.adoc b/business_continuity/backup_restore/backup_intro.adoc index 4ceba2dd67..42693cfb94 100644 --- a/business_continuity/backup_restore/backup_intro.adoc +++ b/business_continuity/backup_restore/backup_intro.adoc @@ -32,3 +32,5 @@ Complete the following topics to learn more about the backup and restore operato * xref:../backup_restore/backup_return_hub.adoc#return-initial-hub[Returning to the initial hub cluster after a restore] * xref:../backup_restore/backup_hcp.adoc#config-hcp-backup[Backup and restore for hosted control planes and hosted clusters] + +* xref:../backup_restore/backup_restore_config_obs.adoc#backup-restore-obs-config[Backup and restore configuration for Observability] diff --git a/business_continuity/backup_restore/backup_restore_config_obs.adoc b/business_continuity/backup_restore/backup_restore_config_obs.adoc new file mode 100644 index 0000000000..c93340c84e --- /dev/null +++ b/business_continuity/backup_restore/backup_restore_config_obs.adoc @@ -0,0 +1,26 @@ +[#backup-restore-obs-config] += Backup and restore configuration for Observability + +The Observability service uses an S3-compatible object store to keep all time-series data collected from managed clusters. Because Observability is a stateful service, it is sensitive to active and passive backup patterns. You must configure Oservability to ensure that your data stays safe and keeps its continuity during the hub cluster migration or backup. + +*Notes:* + +- When a managed cluster is detached from the primary hub cluster and reattached to the backup hub cluster, metrics are not collected. To help connect the metrics, you can script the cluster migration for large fleets. + +- For product backup and restore, the Observability service automatically labels its resources with the `cluster.open-cluster-management.io/backup` label. + +.Resources that are automatically backuped up and restored for Observability +|==== +| Resource type | Resource name + +| ConfigMaps +| `observability-metrics-custom-allowlist`, `thanos-ruler-custom-rules`, `alertmanager-config`, `policy-acs-central-status`, Any ConfigMap labeled with `grafana-custom-dashboard` + +| Secrets +| `thanos-object-storage`, `observability-server-ca-certs`, `observability-client-ca-certs`, `observability-server-certs`, `observability-grafana-certs`, `alertmanager-byo-ca`, `alertmanager-byo-cert`, `proxy-byo-ca`, `proxy-byo-cert` +|==== + +== Additional resources + +- For the steps to complete the backup and restore for Observability, see xref:../backup_restore/backup_restore_obs.adoc#backup-restore-obs[Backing up and restoring Observability service]. + diff --git a/business_continuity/backup_restore/backup_restore_obs.adoc b/business_continuity/backup_restore/backup_restore_obs.adoc new file mode 100644 index 0000000000..a7688a217b --- /dev/null +++ b/business_continuity/backup_restore/backup_restore_obs.adoc @@ -0,0 +1,98 @@ +[#backup-restore-obs] += Backing up and restoring Observability service + +Backup and restore the Observability service to keep data safe and to support continuity during the hub cluster migration or backup. To help with disruption in metric data collection, use the same S3-compatible object store for both the primary and backup hub clusters. + +.Prerequsites + +- Ensure that you can run a restore operation for backup types by completing the xref:../business_continuity/backup_restore/backup_restore.adoc#restoring-backup-restore-operation[Using the restore operation for backup types] process. + +.Procedure + +Complete the following steps to backup and restore the Observability service: + +. To ensure the Observability service recognizes the hub cluster as the `local-cluster`, a managed hub cluster, change the `spec.disableHubSelfManagement` parameter in the `MultiClusterHub` custom resource to `false`. + ++ +*Note:* If you change the default name of your `local-cluster` to another value, the results appear within the changed local cluster name. + +. To preserve the tenant ID of the `observatorium` resource as you manually back up and restore the `observatorium` resource, run the following command: + ++ +[source,bash] +---- +oc get observatorium -n open-cluster-management-observability -o yaml > observatorium-backup.yaml +---- + +. To backup the `observability` deployment, run the following command: + ++ +[source,bash] +---- +oc get mco observability -o yaml > mco-cr-backup.yaml +---- + +. Shut down the Thanos compactor on your primary hub cluster by running the following command: + ++ +[source,bash] +---- +oc scale statefulset observability-thanos-compact -n open-cluster-management-observability --replicas=0 +---- + +.. Verify the compactor is not active by running the following command: + ++ +[source,bash] +---- +oc get pods observability-thanos-compact-0 -n open-cluster-management-observability +---- + +. Restore the `backup` resources such as the automatically backed-up ConfigMaps and Secrets listed in the backup and restore configuration for Observability. + +. To preserve the tenant ID for maintaing continuity in the metrics ingestion and querying, restore the `observatorium` resource to the backup hub cluster. Run the following command: + ++ +[source,bash] +---- +oc apply -f observatorium-backup.yaml +---- + +. Apply the backed up `MultiClusterObservability` custom resource to start the Observability service on the new restored hub cluster. Run the following command: + ++ +[source,bash] +---- +oc apply -f mco-cr-backup.yaml +---- ++ +The operator starts the Observability service and detects the existing `observatorium` resource, reusing the preserved tenant ID instead of creating a new one. + +. Verify that the Observability service runs on your new hub cluster. Run the following command: + ++ +[source,bash] +---- +oc get pods -n open-cluster-management-observability +---- + +. Verify that the `observability-controller` `managedclusteraddon` does not have a status in the `DEGRADED` column, and that the `PROGRESSING` status is not set to `False`. Run the following command: + ++ +[source,bash] +---- +oc get managedclusteraddons -A | awk 'NR==1 || /observability-controller/ +---- + +. Verify metrics collection from your managed clusters by accesing Grafana. + +. Verify that your managed clusters are connected to your new hub cluster by checking for the `Available` status for each managed cluster. + +. Shut down the Observability service on your previous hub cluster by removing the resources. Run the following command: + ++ +[source,bash] +---- +oc delete mco observability +---- + diff --git a/business_continuity/main.adoc b/business_continuity/main.adoc index 78319d3a64..9a3d8f7897 100644 --- a/business_continuity/main.adoc +++ b/business_continuity/main.adoc @@ -17,6 +17,8 @@ include::backup_restore/use_existing_hub_cluster.adoc[leveloffset=+3] include::backup_restore/tag_resources.adoc[leveloffset=+3] include::backup_restore/backup_return_hub.adoc[leveloffset=+3] include::backup_restore/backup_hcp.adoc[leveloffset=+3] +include::backup_restore/backup_restore_obs.adoc[leveloffset=+3] +include::backup_restore/backup_restore_config_obs.adoc[leveloffset=+3] include::volsync/volsync.adoc[leveloffset=+2] include::volsync/volsync_replicate.adoc[leveloffset=+3] include::volsync/volsync_convert_backup.adoc[leveloffset=+3] diff --git a/observability/observability_arch.adoc b/observability/observability_arch.adoc index efd883d276..0bbe7f1670 100644 --- a/observability/observability_arch.adoc +++ b/observability/observability_arch.adoc @@ -128,10 +128,11 @@ When you install {acm-short} the following persistent volumes (PV) must be creat To learn more about observability and the integrated components, see the following topics: -- See xref:../observability/observe_environments_intro.adoc#observing-environments-intro[Observability service] -- See xref:../observability/obs_config.adoc#obs-config[Observability configuration] -- See xref:../observability/observability_enable.adoc#enabling-observability-service[Enabling the observability service] -- See xref:../observability/design_grafan.adoc#using-grafana-dashboards[Using Grafana dashboards] -- See the link:https://thanos.io/v0.36/thanos/getting-started.md/[Thanos documentation] -- See the link:https://prometheus.io/docs/introduction/overview/[Prometheus Overview] -- See the link:https://prometheus.io/docs/alerting/latest/alertmanager/[Alertmanager documentation] +- For an introduction of the service, see xref:../observability/observe_environments_intro.adoc#observing-environments-intro[Observability service]. +- To learn about configuring the service, metric types labeling, and pod capacity, see xref:../observability/obs_config.adoc#obs-config[Observability configuration]. +- To enable the Observability service, see xref:../observability/observability_enable.adoc#enabling-observability-service[Enabling the Observability service]. +- For more information about viewing hub cluster and managed cluster metrics from Grafana, see xref:../observability/design_grafan.adoc#using-grafana-dashboards[Using Grafana dashboards]. +- Learn how you can backup and restore the Observability service. See link:../business_continuity/backup_restore/backup_restore_obs.adoc#backup-restore-obs[Backing up and restoring Observability service]. +- For more details about THanos, see the link:https://thanos.io/v0.36/thanos/getting-started.md/[Thanos documentation]. +- For a brief overview of Prometheus, see the link:https://prometheus.io/docs/introduction/overview/[Prometheus Overview]. +- See the link:https://prometheus.io/docs/alerting/latest/alertmanager/[Alertmanager documentation] to understand how you can send and receive alerts by using Alertmanager. diff --git a/observability/observe_environments_intro.adoc b/observability/observe_environments_intro.adoc index 7c7dbd121d..b929c37129 100644 --- a/observability/observe_environments_intro.adoc +++ b/observability/observe_environments_intro.adoc @@ -14,13 +14,5 @@ Read the following documentation for more details about the observability compon * xref:../observability/use_observability.adoc#using-observability[Using observability] * xref:../observability/observability_alerts.adoc#observability-alerts[Managing alerts] * xref:../observability/adv_config_obs.adoc#adv-config-obs[Observability advanced configuration] -** xref:../observability/obs_metrics.adoc#adding-custom-metrics[Adding custom metrics] -** xref:../observability/obs_proxy.adoc#config-proxy-obs[Configuring proxy settings for observability add-ons] -** xref:../observability/obs_custom_cert.adoc#customizing-route-cert[Customizing route certificate] -** xref:../observability/obs_custom_rules.adoc#creating-custom-rules[Creating custom rules] -** xref:../observability/obs_update_mco.adoc#updating-mco-custom-replicas[Updating the _MultiClusterObservability_ custom resource replicas from the console] -** xref:../observability/obs_pv_pvc.adoc#increase-decrease-pv-pvc[Increasing and decreasing persistent volumes and persistent volume claims] -** xref:../observability/obs_custom_alert.adoc#custom-obervatorium-alert-url[Customizing the managed cluster Observatorium API and Alertmanager URLs (Technology Preview)] -** xref:../observability/obs_rbac.adoc#configure-fine-grain-rbac[Configuring fine-grain RBAC (Technology Preview)] * xref:../observability/insights_intro.adoc#using-rh-insights[Using observability with Red Hat Insights] * xref:../observability/obs_right_size.adoc#optimize-work-right-size[Optimizing workloads by using right-sizing guides (Technology Preview)]