Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion business_continuity/backup_restore/backup_arch.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -153,4 +153,4 @@ You can backup third-party resources with cluster backup and restore by adding t
== Additional resources

Learn more about the policies and capabilities of the backup and restore component by going to
xref:../backup_restore/backup_validate.adoc#backup-validation-using-a-policy[Validating your backup or restore configurations].
xref:../backup_restore/backup_validate.adoc#backup-validation-using-a-policy[Validating your backup or restore configurations].
2 changes: 2 additions & 0 deletions business_continuity/backup_restore/backup_intro.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,5 @@ Complete the following topics to learn more about the backup and restore operato
* xref:../backup_restore/backup_return_hub.adoc#return-initial-hub[Returning to the initial hub cluster after a restore]

* xref:../backup_restore/backup_hcp.adoc#config-hcp-backup[Backup and restore for hosted control planes and hosted clusters]

* xref:../backup_restore/backup_restore_config_obs.adoc#backup-restore-obs-config[Backup and restore configuration for Observability]
26 changes: 26 additions & 0 deletions business_continuity/backup_restore/backup_restore_config_obs.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[#backup-restore-obs-config]
= Backup and restore configuration for Observability

The Observability service uses an S3-compatible object store to keep all time-series data collected from managed clusters. Because Observability is a stateful service, it is sensitive to active and passive backup patterns. You must configure Oservability to ensure that your data stays safe and keeps its continuity during the hub cluster migration or backup.

*Notes:*

- When a managed cluster is detached from the primary hub cluster and reattached to the backup hub cluster, metrics are not collected. To help connect the metrics, you can script the cluster migration for large fleets.

- For product backup and restore, the Observability service automatically labels its resources with the `cluster.open-cluster-management.io/backup` label.

.Resources that are automatically backuped up and restored for Observability
|====
| Resource type | Resource name

| ConfigMaps
| `observability-metrics-custom-allowlist`, `thanos-ruler-custom-rules`, `alertmanager-config`, `policy-acs-central-status`, Any ConfigMap labeled with `grafana-custom-dashboard`

| Secrets
| `thanos-object-storage`, `observability-server-ca-certs`, `observability-client-ca-certs`, `observability-server-certs`, `observability-grafana-certs`, `alertmanager-byo-ca`, `alertmanager-byo-cert`, `proxy-byo-ca`, `proxy-byo-cert`
|====

== Additional resources

- For the steps to complete the backup and restore for Observability, see xref:../backup_restore/backup_restore_obs.adoc#backup-restore-obs[Backing up and restoring Observability service].

98 changes: 98 additions & 0 deletions business_continuity/backup_restore/backup_restore_obs.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
[#backup-restore-obs]
= Backing up and restoring Observability service

Backup and restore the Observability service to keep data safe and to support continuity during the hub cluster migration or backup. To help with disruption in metric data collection, use the same S3-compatible object store for both the primary and backup hub clusters.

.Prerequsites

- Ensure that you can run a restore operation for backup types by completing the xref:../business_continuity/backup_restore/backup_restore.adoc#restoring-backup-restore-operation[Using the restore operation for backup types] process.

.Procedure

Complete the following steps to backup and restore the Observability service:

. To ensure the Observability service recognizes the hub cluster as the `local-cluster`, a managed hub cluster, change the `spec.disableHubSelfManagement` parameter in the `MultiClusterHub` custom resource to `false`.

+
*Note:* If you change the default name of your `local-cluster` to another value, the results appear within the changed local cluster name.

. To preserve the tenant ID of the `observatorium` resource as you manually back up and restore the `observatorium` resource, run the following command:

+
[source,bash]
----
oc get observatorium -n open-cluster-management-observability -o yaml > observatorium-backup.yaml
----

. To backup the `observability` deployment, run the following command:

+
[source,bash]
----
oc get mco observability -o yaml > mco-cr-backup.yaml
----

. Shut down the Thanos compactor on your primary hub cluster by running the following command:

+
[source,bash]
----
oc scale statefulset observability-thanos-compact -n open-cluster-management-observability --replicas=0
----

.. Verify the compactor is not active by running the following command:

+
[source,bash]
----
oc get pods observability-thanos-compact-0 -n open-cluster-management-observability
----

. Restore the `backup` resources such as the automatically backed-up ConfigMaps and Secrets listed in the backup and restore configuration for Observability.

. To preserve the tenant ID for maintaing continuity in the metrics ingestion and querying, restore the `observatorium` resource to the backup hub cluster. Run the following command:

+
[source,bash]
----
oc apply -f observatorium-backup.yaml
----

. Apply the backed up `MultiClusterObservability` custom resource to start the Observability service on the new restored hub cluster. Run the following command:

+
[source,bash]
----
oc apply -f mco-cr-backup.yaml
----
+
The operator starts the Observability service and detects the existing `observatorium` resource, reusing the preserved tenant ID instead of creating a new one.

. Verify that the Observability service runs on your new hub cluster. Run the following command:

+
[source,bash]
----
oc get pods -n open-cluster-management-observability
----

. Verify that the `observability-controller` `managedclusteraddon` does not have a status in the `DEGRADED` column, and that the `PROGRESSING` status is not set to `False`. Run the following command:

+
[source,bash]
----
oc get managedclusteraddons -A | awk 'NR==1 || /observability-controller/
----

. Verify metrics collection from your managed clusters by accesing Grafana.

. Verify that your managed clusters are connected to your new hub cluster by checking for the `Available` status for each managed cluster.

. Shut down the Observability service on your previous hub cluster by removing the resources. Run the following command:

+
[source,bash]
----
oc delete mco observability
----

2 changes: 2 additions & 0 deletions business_continuity/main.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ include::backup_restore/use_existing_hub_cluster.adoc[leveloffset=+3]
include::backup_restore/tag_resources.adoc[leveloffset=+3]
include::backup_restore/backup_return_hub.adoc[leveloffset=+3]
include::backup_restore/backup_hcp.adoc[leveloffset=+3]
include::backup_restore/backup_restore_obs.adoc[leveloffset=+3]
include::backup_restore/backup_restore_config_obs.adoc[leveloffset=+3]
include::volsync/volsync.adoc[leveloffset=+2]
include::volsync/volsync_replicate.adoc[leveloffset=+3]
include::volsync/volsync_convert_backup.adoc[leveloffset=+3]
Expand Down
15 changes: 8 additions & 7 deletions observability/observability_arch.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,11 @@ When you install {acm-short} the following persistent volumes (PV) must be creat

To learn more about observability and the integrated components, see the following topics:

- See xref:../observability/observe_environments_intro.adoc#observing-environments-intro[Observability service]
- See xref:../observability/obs_config.adoc#obs-config[Observability configuration]
- See xref:../observability/observability_enable.adoc#enabling-observability-service[Enabling the observability service]
- See xref:../observability/design_grafan.adoc#using-grafana-dashboards[Using Grafana dashboards]
- See the link:https://thanos.io/v0.36/thanos/getting-started.md/[Thanos documentation]
- See the link:https://prometheus.io/docs/introduction/overview/[Prometheus Overview]
- See the link:https://prometheus.io/docs/alerting/latest/alertmanager/[Alertmanager documentation]
- For an introduction of the service, see xref:../observability/observe_environments_intro.adoc#observing-environments-intro[Observability service].
- To learn about configuring the service, metric types labeling, and pod capacity, see xref:../observability/obs_config.adoc#obs-config[Observability configuration].
- To enable the Observability service, see xref:../observability/observability_enable.adoc#enabling-observability-service[Enabling the Observability service].
- For more information about viewing hub cluster and managed cluster metrics from Grafana, see xref:../observability/design_grafan.adoc#using-grafana-dashboards[Using Grafana dashboards].
- Learn how you can backup and restore the Observability service. See link:../business_continuity/backup_restore/backup_restore_obs.adoc#backup-restore-obs[Backing up and restoring Observability service].
- For more details about THanos, see the link:https://thanos.io/v0.36/thanos/getting-started.md/[Thanos documentation].
- For a brief overview of Prometheus, see the link:https://prometheus.io/docs/introduction/overview/[Prometheus Overview].
- See the link:https://prometheus.io/docs/alerting/latest/alertmanager/[Alertmanager documentation] to understand how you can send and receive alerts by using Alertmanager.
8 changes: 0 additions & 8 deletions observability/observe_environments_intro.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,5 @@ Read the following documentation for more details about the observability compon
* xref:../observability/use_observability.adoc#using-observability[Using observability]
* xref:../observability/observability_alerts.adoc#observability-alerts[Managing alerts]
* xref:../observability/adv_config_obs.adoc#adv-config-obs[Observability advanced configuration]
** xref:../observability/obs_metrics.adoc#adding-custom-metrics[Adding custom metrics]
** xref:../observability/obs_proxy.adoc#config-proxy-obs[Configuring proxy settings for observability add-ons]
** xref:../observability/obs_custom_cert.adoc#customizing-route-cert[Customizing route certificate]
** xref:../observability/obs_custom_rules.adoc#creating-custom-rules[Creating custom rules]
** xref:../observability/obs_update_mco.adoc#updating-mco-custom-replicas[Updating the _MultiClusterObservability_ custom resource replicas from the console]
** xref:../observability/obs_pv_pvc.adoc#increase-decrease-pv-pvc[Increasing and decreasing persistent volumes and persistent volume claims]
** xref:../observability/obs_custom_alert.adoc#custom-obervatorium-alert-url[Customizing the managed cluster Observatorium API and Alertmanager URLs (Technology Preview)]
** xref:../observability/obs_rbac.adoc#configure-fine-grain-rbac[Configuring fine-grain RBAC (Technology Preview)]
* xref:../observability/insights_intro.adoc#using-rh-insights[Using observability with Red Hat Insights]
* xref:../observability/obs_right_size.adoc#optimize-work-right-size[Optimizing workloads by using right-sizing guides (Technology Preview)]