stolostron · dockerymick · Nov 10, 2025 · Sep 15, 2025 · Sep 15, 2025 · Sep 16, 2025
diff --git a/business_continuity/backup_restore/backup_arch.adoc b/business_continuity/backup_restore/backup_arch.adoc
@@ -36,6 +36,7 @@ View the following list of the cluster backup and restore process, and how they
 ** `clusterclaim.hive.openshift.io`
 ** `clusterimageset.hive.openshift.io`
 ** `clustersync.hiveinternal.openshift.io`
+//do we want to add the api for observability here? 
 
 - Exclude all resources from the following API groups:
 ** `internal.open-cluster-management.io`
@@ -153,4 +154,4 @@ You can backup third-party resources with cluster backup and restore by adding t
 == Additional resources 
 
 Learn more about the policies and capabilities of the backup and restore component by going to 
-xref:../backup_restore/backup_validate.adoc#backup-validation-using-a-policy[Validating your backup or restore configurations]. 
+xref:../backup_restore/backup_validate.adoc#backup-validation-using-a-policy[Validating your backup or restore configurations]. 
diff --git a/business_continuity/backup_restore/backup_intro.adoc b/business_continuity/backup_restore/backup_intro.adoc
@@ -32,3 +32,5 @@ Complete the following topics to learn more about the backup and restore operato
 * xref:../backup_restore/backup_return_hub.adoc#return-initial-hub[Returning to the initial hub cluster after a restore]
 
 * xref:../backup_restore/backup_hcp.adoc#config-hcp-backup[Backup and restore for hosted control planes and hosted clusters]
+
+* xref:../backup_restore/backup_restore_config_obs.adoc#backup-restore-obs-config[Backup and restore configuration for Observability]
diff --git a/business_continuity/backup_restore/backup_restore_config_obs.adoc b/business_continuity/backup_restore/backup_restore_config_obs.adoc
@@ -0,0 +1,26 @@
+[#backup-restore-obs-config]
+= Backup and restore configuration for Observability
+
+The Observability service uses an S3-compatible object store to persist all time-series data collected from managed clusters. Because Observability is a stateful service, it is sensitive to active and passive backup patterns. You must configure Oservability to ensure data continuity and integrity during hub cluster migration or backup.
-The Observability service uses an S3-compatible object store to persist all time-series data collected from managed clusters. Because Observability is a stateful service, it is sensitive to active and passive backup patterns. You must configure Oservability to ensure data continuity and integrity during hub cluster migration or backup.
+The Observability service uses an S3-compatible object store to persist all time-series data collected from managed clusters. Because Observability is a stateful service, it is sensitive to active and passive backup patterns. You must configure Oservability to ensure that your data stays safe and keeps its continuity during the hub cluster migration or backup.
-The Observability service uses an S3-compatible object store to persist all time-series data collected from managed clusters. Because Observability is a stateful service, it is sensitive to active and passive backup patterns. You must configure Oservability to ensure data continuity and integrity during hub cluster migration or backup.
+The Observability service uses an S3-compatible object store to persist all time-series data collected from managed clusters. Because Observability is a stateful service, it is sensitive to active and passive backup patterns. You must configure Oservability to ensure that your data stays safe and keeps its continuity during the hub cluster migration or backup.
+
+*Notes:* 
+
+- When a managed cluster is detached from the primary hub cluster and reattached to the backup hub cluster, metrics are not collected. To minimize gaps, consider scripting the cluster migration for large fleets.
+
+- For product backup and restore, the Observability service automatically labels its resources with the `cluster.open-cluster-management.io/backup` label.
+
+.Resources that are automatically backuped up and restored for Observability
+|====
+| Resource type | Resource name  
+
+| ConfigMaps 
+| `observability-metrics-custom-allowlist`, `thanos-ruler-custom-rules`, `alertmanager-config`, `policy-acs-central-status`, Any ConfigMap labeled with `grafana-custom-dashboard`
+
+| Secrets
+| `thanos-object-storage`, `observability-server-ca-certs`, `observability-client-ca-certs`, `observability-server-certs`, `observability-grafana-certs`, `alertmanager-byo-ca`, `alertmanager-byo-cert`, `proxy-byo-ca`, `proxy-byo-cert`
+|====
+
+== Additional resources
+
+- For the steps to complete the backup and restore for Observability, see xref:../backup_restore/backup_restore_obs.adoc#backup-restore-obs[Backing up and restoring Observability service].
+
diff --git a/business_continuity/backup_restore/backup_restore_obs.adoc b/business_continuity/backup_restore/backup_restore_obs.adoc
@@ -0,0 +1,77 @@
+[#backup-restore-obs]
+= Backing up and restoring Observability service
+
+Backup and restore the Observability service to maintain data continuity and integrity during hub cluster migration or backup. It is assumed that the same S3-compatible object store is used across both the primary and backup hub clusters to minimize disruption in metric data collection.
+
+.Prerequsites
+
+- are there prereqs? 
+
+[#backup-restore-obs-procedure]
+== Backing up and restoring Observability procedure
+
+. To ensure the Observability service recognizes the hub cluster as the `local-cluster`, change the `spec.disableHubSelfManagement` parameter in the `MultiClusterHub` custom resource to `false`. 
+. Manually back up and restore the `observatium` resource and `observability` deployment to ensure continuity across hub clusters.
+.. To preserve the tenant ID of the `observatorium` resource during the restore, run the following command:
+
++
+[source,bash]
+----
+oc get observatorium -n open-cluster-management-observability -o yaml > observatorium-backup.yaml
+----
+
+. To backup the `observability` deployment, run the following command:
+
++
+[source,bash]
+----
+oc get mco observability -o yaml > mco-cr-backup.yaml
+----
+
+. Shut down the Thanos compactor on your primary hub cluster.
+.. To prevent write conflicts and deduplication issues while working on the same object storage, stop the Thanos compactor before starting the restore operation on the backup hub cluster. Run the following command:
+
++
+[source,bash]
+----
+oc scale statefulset observability-thanos-compact -n open-cluster-management-observability --replicas=0
+----
+
+.. Verify the compactor is stopped by running the following command:
+
++
+[source,bash]
+----
+oc get pods observability-thanos-compact-0 -n open-cluster-management-observability
+----
+
+. To restore the `backup` resources, see xref:../business_continuity/backup_restore/backup_restore.adoc#restoring-backup-restore-operation[Using the restore operation for backup types]. You can restore the automatically backed-up ConfigMaps and Secrets listed in the backup and restore configuration for Observability.
+
+. To preserve the tenant ID for maintaing continuity in the metrics ingestion and querying, restore the `observatorium` resource to the backup hub cluster. Run the following command:
+
++
+[source,bash]
+----
+oc apply -f observatorium-backup.yaml
+----
+
+. Apply the backed up `MultiClusterObservability` custom resource to start the Observability service on the new restored hub cluster. Run the following command:
+
++
+[source,bash]
+----
+oc apply -f mco-cr-backup.yaml
+----
++
+The operator starts the Observability service and detects the existing `observatorium` resource, reusing the preserved tenant ID instead of creating a new one.
+
+. Migrate managed clusters to the new hub cluster.
-. Migrate managed clusters to the new hub cluster.
+. Migrate managed clusters to the new hub cluster. Complete the following steps:
-. Migrate managed clusters to the new hub cluster.
+. Migrate managed clusters to the new hub cluster. Complete the following steps:
+.. Detach managed clusters from the primary hub cluster and reattach the managed clusters to the new restored hub cluster. After attaching the managed clusters, the clusters resume sending metrics to the Observability service.
+
+. Shut down Observability on your primary hub cluster and clear in-memory metrics to S3 object after migrating all managed clusters. Run the following command:
-. Shut down Observability on your primary hub cluster and clear in-memory metrics to S3 object after migrating all managed clusters. Run the following command:
+. Shut down the Observability service on your primary hub cluster and clear in-memory metrics to S3 object after migrating all managed clusters. Run the following command:
-. Shut down Observability on your primary hub cluster and clear in-memory metrics to S3 object after migrating all managed clusters. Run the following command:
+. Shut down the Observability service on your primary hub cluster and clear in-memory metrics to S3 object after migrating all managed clusters. Run the following command:
+
++
+[source,bash]
+----
+oc delete mco observability
+----
diff --git a/business_continuity/main.adoc b/business_continuity/main.adoc
@@ -17,6 +17,8 @@ include::backup_restore/use_existing_hub_cluster.adoc[leveloffset=+3]
 include::backup_restore/tag_resources.adoc[leveloffset=+3]
 include::backup_restore/backup_return_hub.adoc[leveloffset=+3]
 include::backup_restore/backup_hcp.adoc[leveloffset=+3]
+include::backup_restore/backup_restore_obs.adoc[leveloffset=+3]
+include::backup_restore/backup_restore_config_obs.adoc[leveloffset=+3]
 include::volsync/volsync.adoc[leveloffset=+2]
 include::volsync/volsync_replicate.adoc[leveloffset=+3]
 include::volsync/volsync_convert_backup.adoc[leveloffset=+3]
Original file line number	Diff line number	Diff line change
Expand Up		@@ -32,3 +32,5 @@ Complete the following topics to learn more about the backup and restore operato
		* xref:../backup_restore/backup_return_hub.adoc#return-initial-hub[Returning to the initial hub cluster after a restore]

		* xref:../backup_restore/backup_hcp.adoc#config-hcp-backup[Backup and restore for hosted control planes and hosted clusters]

		* xref:../backup_restore/backup_restore_config_obs.adoc#backup-restore-obs-config[Backup and restore configuration for Observability]