You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Describe the SLOs (Service Level Objectives) for this project.
526
554
527
-
TBD
555
+
OCM defines the following SLOs:
556
+
-**Cluster Availability**: 99.9% of managed clusters should be in "Available" status during business hours
557
+
-**ManifestWork Success Rate**: 99.5% of ManifestWork deployments should succeed within 5 minutes
558
+
-**Addon Availability**: 99% of enabled addons should be in "Available" status across all managed clusters
559
+
-**Certificate Rotation**: 100% of certificate rotations should complete successfully before expiration
560
+
-**Hub Recovery Time**: Hub cluster recovery should complete within 30 minutes in disaster scenarios
528
561
529
562
* What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
530
563
@@ -535,10 +568,39 @@ TBD
535
568
### Dependencies
536
569
537
570
* Describe the specific running services the project depends on in the cluster.
538
-
* Describe the project’s dependency lifecycle policy.
571
+
572
+
OCM depends on the following cluster services:
573
+
-**Kubernetes API Server**: Core dependency for all OCM operations and CRD storage
574
+
-**etcd**: Stores all OCM custom resources and cluster state information
575
+
-**kube-controller-manager**: Required for leader election and resource management
576
+
-**CoreDNS/kube-dns**: Name resolution for inter-component communication
577
+
-**kubelet**: Manages OCM pods on cluster nodes
578
+
579
+
* Describe the project's dependency lifecycle policy.
580
+
581
+
OCM follows a conservative dependency lifecycle policy:
582
+
- Kubernetes dependencies are updated to the latest stable version with each OCM release
583
+
- Go dependencies are updated monthly via Dependabot automated PRs for security patches
584
+
- Major dependency upgrades are planned during quarterly releases with backward compatibility testing
585
+
- Legacy dependencies are deprecated with a minimum 2-release migration period
586
+
- Critical security vulnerabilities in dependencies trigger immediate patch releases
587
+
- All dependency changes require approval from project maintainers and CI validation
588
+
539
589
* How does the project incorporate and consider source composition analysis as part of its development and security hygiene? Describe how this source composition analysis (SCA) is tracked.
590
+
591
+
OCM incorporates SCA through multiple automated tools and processes:
592
+
-**GitHub Security Scanning**: Enabled for vulnerability detection in source code and dependencies
593
+
-**Dependabot**: Automatically tracks dependency vulnerabilities and creates PRs for security updates
594
+
-**SBOM Generation**: Creates Software Bill of Materials for all container images using SPDX format
595
+
-**License Scanning**: Ensures all dependencies comply with project license requirements
596
+
-**Supply Chain Security**: Uses Cosign and Sigstore for image signing and attestation
597
+
-**Trivy Integration**: Scans container images for known CVEs in CI/CD pipeline
598
+
-**Tracking**: SCA results are monitored via GitHub Security Dashboard and dependency update PRs
599
+
540
600
* Describe how the project implements changes based on source composition analysis (SCA) and the timescale.
541
601
602
+
N/A
603
+
542
604
### Troubleshooting
543
605
544
606
* How does this project recover if a key component or feature becomes unavailable? e.g Kubernetes API server, etcd, database, leader node, etc.
@@ -549,7 +611,14 @@ TBD
549
611
550
612
* Describe the known failure modes.
551
613
552
-
TBD
614
+
Known failure modes in OCM include:
615
+
-**Hub Cluster Failure**: Complete hub unavailability causes loss of centralized management, but managed clusters continue running existing workloads
616
+
-**Network Partitions**: Spoke clusters unable to reach hub lose management capabilities until connectivity is restored
617
+
-**Certificate Expiration**: Failed certificate rotation can break hub-spoke communication requiring manual intervention
618
+
-**etcd Corruption**: Hub cluster data loss requires backup restoration and managed cluster re-registration
619
+
-**Resource Exhaustion**: Too many clusters or ManifestWorks can overwhelm hub resources causing performance degradation
620
+
-**API Server Overload**: High API request volume can cause timeouts and failed operations
621
+
-**Addon Failures**: Individual addon crashes affect specific functionality but don't impact core cluster management
553
622
554
623
### Security
555
624
@@ -616,4 +685,11 @@ TBD
616
685
617
686
* Cloud Native Threat Modeling
618
687
* How does the project ensure its security reporting and response team is representative of its community diversity (organizational and individual)?
619
-
* How does the project invite and rotate security reporting team members?
688
+
689
+
OCM does not currently have a formal security reporting and response team structure separate from the maintainer team.
690
+
The project would benefit from establishing a dedicated security response team with diverse representation as it matures.
691
+
692
+
* How does the project invite and rotate security reporting team members?
693
+
694
+
Currently, OCM does not have a formal process for inviting and rotating security reporting team members as security
695
+
responsibilities are handled by the general maintainer team.
0 commit comments