Skip to content

Commit ee81308

Browse files
https://issues.redhat.com/browse/ACM-22475 Backup and restore Observability (#8222)
* https://issues.redhat.com/browse/ACM-22475 * More changes * more updates, modular writing * Removed hidden comments * Apply suggestions from code review Moving a prereq to steps * Reduced table syntax * Update backup_restore_obs.adoc * Updates * Update backup_restore_obs.adoc * More updates after reviewing local-cluster details * Removing hidden comment Addressed by development * Update business_continuity/backup_restore/backup_restore_config_obs.adoc Co-authored-by: jc-berger <[email protected]> * Update business_continuity/backup_restore/backup_restore_config_obs.adoc Co-authored-by: jc-berger <[email protected]> * Update business_continuity/backup_restore/backup_restore_obs.adoc Co-authored-by: jc-berger <[email protected]> * Few more updates after initial peer review * Update backup_restore_config_obs.adoc * Updates after review from peer * Update business_continuity/backup_restore/backup_restore_obs.adoc * Updates after dev lead review * Update backup_restore_obs.adoc * Updates after second review from peer --------- Co-authored-by: jc-berger <[email protected]>
1 parent 0cf1995 commit ee81308

File tree

6 files changed

+137
-8
lines changed

6 files changed

+137
-8
lines changed

business_continuity/backup_restore/backup_arch.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,4 +153,4 @@ You can backup third-party resources with cluster backup and restore by adding t
153153
== Additional resources
154154

155155
Learn more about the policies and capabilities of the backup and restore component by going to
156-
xref:../backup_restore/backup_validate.adoc#backup-validation-using-a-policy[Validating your backup or restore configurations].
156+
xref:../backup_restore/backup_validate.adoc#backup-validation-using-a-policy[Validating your backup or restore configurations].

business_continuity/backup_restore/backup_intro.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,5 @@ Complete the following topics to learn more about the backup and restore operato
3232
* xref:../backup_restore/backup_restore_hub.adoc#restore-data-initial-hub[Restoring data to the initial hub cluster]
3333
3434
* xref:../backup_restore/backup_hcp.adoc#config-hcp-backup[Backup and restore for hosted control planes and hosted clusters]
35+
36+
* xref:../backup_restore/backup_restore_config_obs.adoc#backup-restore-obs-config[Backup and restore configuration for Observability]
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
[#backup-restore-obs-config]
2+
= Backup and restore configuration for Observability
3+
4+
The Observability service uses an S3-compatible object store to keep all time-series data collected from managed clusters. Because Observability is a stateful service, it is sensitive to active and passive backup patterns. You must configure Oservability to ensure that your data stays safe and keeps its continuity during the hub cluster migration or backup.
5+
6+
*Notes:*
7+
8+
- When a managed cluster is detached from the primary hub cluster and reattached to the backup hub cluster, metrics are not collected. To help connect the metrics, you can script the cluster migration for large fleets.
9+
10+
- For product backup and restore, the Observability service automatically labels its resources with the `cluster.open-cluster-management.io/backup` label.
11+
12+
.Resources that are automatically backuped up and restored for Observability
13+
|====
14+
| Resource type | Resource name
15+
16+
| ConfigMaps
17+
| `observability-metrics-custom-allowlist`, `thanos-ruler-custom-rules`, `alertmanager-config`, `policy-acs-central-status`, Any ConfigMap labeled with `grafana-custom-dashboard`
18+
19+
| Secrets
20+
| `thanos-object-storage`, `observability-server-ca-certs`, `observability-client-ca-certs`, `observability-server-certs`, `observability-grafana-certs`, `alertmanager-byo-ca`, `alertmanager-byo-cert`, `proxy-byo-ca`, `proxy-byo-cert`
21+
|====
22+
23+
== Additional resources
24+
25+
- For the steps to complete the backup and restore for Observability, see xref:../backup_restore/backup_restore_obs.adoc#backup-restore-obs[Backing up and restoring Observability service].
26+
Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
[#backup-restore-obs]
2+
= Backing up and restoring Observability service
3+
4+
Backup and restore the Observability service to keep data safe and to support continuity during the hub cluster migration or backup. To help with disruption in metric data collection, use the same S3-compatible object store for both the primary and backup hub clusters.
5+
6+
.Prerequsites
7+
8+
- Ensure that you can run a restore operation for backup types by completing the xref:../business_continuity/backup_restore/backup_restore.adoc#restoring-backup-restore-operation[Using the restore operation for backup types] process.
9+
10+
.Procedure
11+
12+
Complete the following steps to backup and restore the Observability service:
13+
14+
. To ensure the Observability service recognizes the hub cluster as the `local-cluster`, a managed hub cluster, change the `spec.disableHubSelfManagement` parameter in the `MultiClusterHub` custom resource to `false`.
15+
16+
+
17+
*Note:* If you change the default name of your `local-cluster` to another value, the results appear within the changed local cluster name.
18+
19+
. To preserve the tenant ID of the `observatorium` resource as you manually back up and restore the `observatorium` resource, run the following command:
20+
21+
+
22+
[source,bash]
23+
----
24+
oc get observatorium -n open-cluster-management-observability -o yaml > observatorium-backup.yaml
25+
----
26+
27+
. To backup the `observability` deployment, run the following command:
28+
29+
+
30+
[source,bash]
31+
----
32+
oc get mco observability -o yaml > mco-cr-backup.yaml
33+
----
34+
35+
. Shut down the Thanos compactor on your primary hub cluster by running the following command:
36+
37+
+
38+
[source,bash]
39+
----
40+
oc scale statefulset observability-thanos-compact -n open-cluster-management-observability --replicas=0
41+
----
42+
43+
.. Verify the compactor is not active by running the following command:
44+
45+
+
46+
[source,bash]
47+
----
48+
oc get pods observability-thanos-compact-0 -n open-cluster-management-observability
49+
----
50+
51+
. Restore the `backup` resources such as the automatically backed-up ConfigMaps and Secrets listed in the backup and restore configuration for Observability.
52+
53+
. To preserve the tenant ID for maintaing continuity in the metrics ingestion and querying, restore the `observatorium` resource to the backup hub cluster. Run the following command:
54+
55+
+
56+
[source,bash]
57+
----
58+
oc apply -f observatorium-backup.yaml
59+
----
60+
61+
. Apply the backed up `MultiClusterObservability` custom resource to start the Observability service on the new restored hub cluster. Run the following command:
62+
63+
+
64+
[source,bash]
65+
----
66+
oc apply -f mco-cr-backup.yaml
67+
----
68+
+
69+
The operator starts the Observability service and detects the existing `observatorium` resource, reusing the preserved tenant ID instead of creating a new one.
70+
71+
. Verify that the Observability service runs on your new hub cluster. Run the following command:
72+
73+
+
74+
[source,bash]
75+
----
76+
oc get pods -n open-cluster-management-observability
77+
----
78+
79+
. Verify that the `observability-controller` `managedclusteraddon` does not have a status in the `DEGRADED` column, and that the `PROGRESSING` status is not set to `False`. Run the following command:
80+
81+
+
82+
[source,bash]
83+
----
84+
oc get managedclusteraddons -A | awk 'NR==1 || /observability-controller/
85+
----
86+
87+
. Verify metrics collection from your managed clusters by accesing Grafana.
88+
89+
. Verify that your managed clusters are connected to your new hub cluster by checking for the `Available` status for each managed cluster.
90+
91+
. Shut down the Observability service on your previous hub cluster by removing the resources. Run the following command:
92+
93+
+
94+
[source,bash]
95+
----
96+
oc delete mco observability
97+
----
98+

business_continuity/main.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ include::backup_restore/use_existing_hub_cluster.adoc[leveloffset=+3]
1717
include::backup_restore/tag_resources.adoc[leveloffset=+3]
1818
include::backup_restore/backup_restore_hub.adoc[leveloffset=+3]
1919
include::backup_restore/backup_hcp.adoc[leveloffset=+3]
20+
include::backup_restore/backup_restore_obs.adoc[leveloffset=+3]
21+
include::backup_restore/backup_restore_config_obs.adoc[leveloffset=+3]
2022
include::volsync/volsync.adoc[leveloffset=+2]
2123
include::volsync/volsync_replicate.adoc[leveloffset=+3]
2224
include::volsync/volsync_convert_backup.adoc[leveloffset=+3]

observability/observability_arch.adoc

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -128,10 +128,11 @@ When you install {acm-short} the following persistent volumes (PV) must be creat
128128

129129
To learn more about observability and the integrated components, see the following topics:
130130

131-
- See xref:../observability/observe_environments_intro.adoc#observing-environments-intro[Observability service]
132-
- See xref:../observability/obs_config.adoc#obs-config[Observability configuration]
133-
- See xref:../observability/observability_enable.adoc#enabling-observability-service[Enabling the observability service]
134-
- See xref:../observability/design_grafan.adoc#using-grafana-dashboards[Using Grafana dashboards]
135-
- See the link:https://thanos.io/v0.36/thanos/getting-started.md/[Thanos documentation]
136-
- See the link:https://prometheus.io/docs/introduction/overview/[Prometheus Overview]
137-
- See the link:https://prometheus.io/docs/alerting/latest/alertmanager/[Alertmanager documentation]
131+
- For an introduction of the service, see xref:../observability/observe_environments_intro.adoc#observing-environments-intro[Observability service].
132+
- To learn about configuring the service, metric types labeling, and pod capacity, see xref:../observability/obs_config.adoc#obs-config[Observability configuration].
133+
- To enable the Observability service, see xref:../observability/observability_enable.adoc#enabling-observability-service[Enabling the Observability service].
134+
- For more information about viewing hub cluster and managed cluster metrics from Grafana, see xref:../observability/design_grafan.adoc#using-grafana-dashboards[Using Grafana dashboards].
135+
- Learn how you can backup and restore the Observability service. See link:../business_continuity/backup_restore/backup_restore_obs.adoc#backup-restore-obs[Backing up and restoring Observability service].
136+
- For more details about THanos, see the link:https://thanos.io/v0.36/thanos/getting-started.md/[Thanos documentation].
137+
- For a brief overview of Prometheus, see the link:https://prometheus.io/docs/introduction/overview/[Prometheus Overview].
138+
- See the link:https://prometheus.io/docs/alerting/latest/alertmanager/[Alertmanager documentation] to understand how you can send and receive alerts by using Alertmanager.

0 commit comments

Comments
 (0)