(WIP) Network policies POC by isabella-janssen · Pull Request #5755 · openshift/machine-config-operator

isabella-janssen · 2026-03-10T16:04:09Z

- What I did

- How to verify it

- Description for the changelog

Summary by CodeRabbit

Chores
- Added deny-all NetworkPolicy manifests to isolate machine-config-daemon, machine-config-server, and kube-rbac-proxy-crio (Ingress and Egress).
New Features
- Operator now syncs NetworkPolicy objects and includes cluster network CIDR in rendering.
- Added an allow NetworkPolicy for machine-config-controller to permit monitoring, kubelet probes, API server and DNS access.

coderabbitai · 2026-03-10T16:04:29Z

Walkthrough

Adds multiple NetworkPolicy manifests (deny-all and an allow template), extends renderConfig with ClusterNetworkCIDR, adds getClusterNetworkCIDR and syncNetworkPolicies, updates getRenderConfig signatures and call sites, and inserts NetworkPolicies into the operator sync pipeline.

Changes

Cohort / File(s)	Summary
NetworkPolicy manifests (deny-all) `manifests/machineconfigdaemon/networkpolicy.yaml`, `manifests/machineconfigserver/networkpolicy.yaml`, `manifests/machineconfigcontroller/networkpolicy.yaml`, `install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml`	Added NetworkPolicy resources selecting pods by `k8s-app` labels (`machine-config-daemon`, `machine-config-server`, `machine-config-controller`, `kube-rbac-proxy-crio`) with `policyTypes: [Ingress, Egress]` and no explicit allow rules (deny-all). Install manifest targets `openshift-machine-config-operator` and includes annotations.
NetworkPolicy manifest (allow rules template) `templates/common/network-policies/machine-config-controller-allow.yaml`	New templated NetworkPolicy `mco-allow-controller` permitting specific Ingress (Prometheus metrics, kubelet probes) and Egress (API server, DNS) traffic; parameterized by `TargetNamespace`, `ControllerAppLabel`, and `ClusterNetworkCIDR`.
Operator: render & config plumbing `pkg/operator/render.go`, `pkg/operator/bootstrap.go`, `pkg/operator/sync.go`, `pkg/operator/operator.go`	Added `ClusterNetworkCIDR string` to `renderConfig`; updated `getRenderConfig` signature and its callers to accept `clusterNetworkCIDR`; added `getClusterNetworkCIDR()` helper; implemented `syncNetworkPolicies(config *renderConfig, ...)` which renders and server-side-applies NetworkPolicy objects; inserted this sync step into the operator sync sequence.

Sequence Diagram(s)

sequenceDiagram
  participant Operator
  participant ClusterConfig as "Cluster NetworkStatus"
  participant Render as "Renderer"
  participant KubeAPI as "Kubernetes API"

  Operator->>ClusterConfig: getClusterNetworkCIDR()
  ClusterConfig-->>Operator: clusterNetworkCIDR
  Operator->>Render: getRenderConfig(..., clusterNetworkCIDR)
  Render-->>Operator: rendered NetworkPolicy objs
  Operator->>KubeAPI: Server-Side Apply NetworkPolicies (ForceOwnership, fieldManager)
  KubeAPI-->>Operator: apply result/status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title "(WIP) Network policies POC" is vague and generic, using non-descriptive terms like "POC" and "Network policies" without conveying specific information about the actual changes or their purpose.	Replace with a more specific title that describes the main change, such as "Add network policies for machine config components" or "Implement deny-all network policies for pod security".

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Stable And Deterministic Test Names	✅ Passed	The custom check for stable and deterministic Ginkgo test names is not applicable to this pull request as the PR only modifies network policy manifests and Go operator code files without any Ginkgo test definitions.
Test Structure And Quality	✅ Passed	This pull request does not include any Ginkgo test code. The PR adds implementation changes to pkg/operator/sync.go and creates YAML NetworkPolicy manifests. The existing test file uses standard Go testing with testify, not Ginkgo. Since the custom check reviews Ginkgo test code quality and this PR adds no Ginkgo tests, the check is not applicable.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-03-10T16:05:07Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai

🧹 Nitpick comments (1)

manifests/machineconfigdaemon/networkpolicy.yaml (1)
38-38: Add a newline at the end of file.

YAML files should end with a newline character to follow POSIX conventions and avoid potential parsing issues with some tools.
Proposed fix
   policyTypes:
   - Ingress
   - Egress
+
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/machineconfigdaemon/networkpolicy.yaml` at line 38, The file
manifests/machineconfigdaemon/networkpolicy.yaml is missing a trailing newline
(the diff ends with the token "Egress"); open that file and add a single newline
character at the end of the file so the file ends with "\n" (ensure the final
line containing "Egress" is terminated), then save and commit the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@manifests/machineconfigdaemon/networkpolicy.yaml`:
- Line 38: The file manifests/machineconfigdaemon/networkpolicy.yaml is missing
a trailing newline (the diff ends with the token "Egress"); open that file and
add a single newline character at the end of the file so the file ends with "\n"
(ensure the final line containing "Egress" is terminated), then save and commit
the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1d4369a6-8878-453b-89a4-f0f6bb99ef9d

📥 Commits

Reviewing files that changed from the base of the PR and between 7a8a698 and 1ed0e29.

📒 Files selected for processing (1)

manifests/machineconfigdaemon/networkpolicy.yaml

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml`:
- Around line 1-16: The NetworkPolicy resource named
kube-rbac-proxy-crio-deny-all currently denies all traffic for pods with label
k8s-app: kube-rbac-proxy-crio; update it to allow required traffic instead of a
blanket deny by adding explicit allow rules: permit Ingress from the
Prometheus/monitoring service(s) or the monitoring namespace (e.g.,
label/selectors used by your Prometheus stack) so metrics scraping can reach the
kube-rbac-proxy-crio pods, and permit Egress to the API server (or to the
namespace/service account performing SubjectAccessReview calls) so authorization
requests succeed. Also reconcile the namespace field: replace the hardcoded
namespace openshift-machine-config-operator with the templated value
{{.TargetNamespace}} if other NetworkPolicy files use that template and this
file should be templated as well.

In `@manifests/machineconfigdaemon/networkpolicy.yaml`:
- Around line 1-12: The NetworkPolicy resource machine-config-daemon-deny-all
currently sets spec.policyTypes to [Ingress, Egress] with no allow rules and
will block required outbound traffic; update the NetworkPolicy (metadata.name
machine-config-daemon-deny-all, podSelector matchLabels k8s-app:
machine-config-daemon) to include explicit spec.egress rules that permit egress
to the Kubernetes API server (control-plane IPs/FQDN and port 6443) and to image
registries/container runtime endpoints (HTTP(S) ports and registry CIDRs) and
any node reporting endpoints, or remove Egress from spec.policyTypes if you
intend to only restrict Ingress; ensure the rules are narrowly scoped (by CIDR,
namespaceSelector, or ipBlock) rather than wide-open.

In `@manifests/machineconfigserver/networkpolicy.yaml`:
- Around line 1-12: The current NetworkPolicy (kind: NetworkPolicy,
metadata.name: machine-config-server-deny-all) defines only policyTypes:
[Ingress, Egress] and will block all traffic to pods selected by
podSelector.matchLabels.k8s-app: machine-config-server; update this resource to
add explicit ingress rules that allow TCP traffic on the MCS port (22623) from
the appropriate sources (e.g., node CIDRs, kubelet IP ranges, and any load
balancer CIDRs) and, if needed, add egress rules to permit responses;
specifically modify the spec to include an ingress section with a port:
22623/protocol: TCP and from: entries for the node networks or
service/loadbalancer sources so nodes can fetch Ignition configs while
preserving other restrictions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 48690d87-cd2e-48d1-9cca-ec16faea0a5d

📥 Commits

Reviewing files that changed from the base of the PR and between 1ed0e29 and f7c6266.

📒 Files selected for processing (3)

install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml
manifests/machineconfigdaemon/networkpolicy.yaml
manifests/machineconfigserver/networkpolicy.yaml

coderabbitai · 2026-03-10T16:17:25Z

install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml

+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: kube-rbac-proxy-crio-deny-all
+  namespace: openshift-machine-config-operator
+  annotations:
+    include.release.openshift.io/ibm-cloud-managed: "true"
+    include.release.openshift.io/self-managed-high-availability: "true"
+    include.release.openshift.io/single-node-developer: "true"
+spec:
+  podSelector:
+    matchLabels:
+      k8s-app: kube-rbac-proxy-crio
+  policyTypes:
+  - Ingress
+  - Egress


⚠️ Potential issue | 🟠 Major

Deny-all policy will block metrics scraping and API server communication.

This policy blocks all traffic for kube-rbac-proxy-crio pods. The kube-rbac-proxy needs:

Ingress: To receive metrics scrape requests from Prometheus/monitoring stack

Egress: To communicate with the API server for authorization (SubjectAccessReview)

Without allow rules, metrics collection will fail and the proxy cannot authorize requests.

Additionally, this file uses a hardcoded namespace: openshift-machine-config-operator while the other two NetworkPolicy files use the {{.TargetNamespace}} template. Confirm whether this inconsistency is intentional (e.g., this file is applied directly while others go through templating).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml` around lines 1 - 16, The NetworkPolicy resource named kube-rbac-proxy-crio-deny-all currently denies all traffic for pods with label k8s-app: kube-rbac-proxy-crio; update it to allow required traffic instead of a blanket deny by adding explicit allow rules: permit Ingress from the Prometheus/monitoring service(s) or the monitoring namespace (e.g., label/selectors used by your Prometheus stack) so metrics scraping can reach the kube-rbac-proxy-crio pods, and permit Egress to the API server (or to the namespace/service account performing SubjectAccessReview calls) so authorization requests succeed. Also reconcile the namespace field: replace the hardcoded namespace openshift-machine-config-operator with the templated value {{.TargetNamespace}} if other NetworkPolicy files use that template and this file should be templated as well.

coderabbitai · 2026-03-10T16:17:25Z

manifests/machineconfigdaemon/networkpolicy.yaml

+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: machine-config-daemon-deny-all
+  namespace: {{.TargetNamespace}}
+spec:
+  podSelector:
+    matchLabels:
+      k8s-app: machine-config-daemon
+  policyTypes:
+  - Ingress
+  - Egress


⚠️ Potential issue | 🟠 Major

Deny-all policy will break machine-config-daemon functionality.

Similar to the machine-config-server policy, this defines Ingress and Egress policyTypes without allow rules. The machine-config-daemon requires egress connectivity to:

Communicate with the Kubernetes API server

Pull container images

Report node status and configuration state

Blocking all egress will prevent MCD from functioning. You'll need to add appropriate egress rules for API server access and any other required destinations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@manifests/machineconfigdaemon/networkpolicy.yaml` around lines 1 - 12, The NetworkPolicy resource machine-config-daemon-deny-all currently sets spec.policyTypes to [Ingress, Egress] with no allow rules and will block required outbound traffic; update the NetworkPolicy (metadata.name machine-config-daemon-deny-all, podSelector matchLabels k8s-app: machine-config-daemon) to include explicit spec.egress rules that permit egress to the Kubernetes API server (control-plane IPs/FQDN and port 6443) and to image registries/container runtime endpoints (HTTP(S) ports and registry CIDRs) and any node reporting endpoints, or remove Egress from spec.policyTypes if you intend to only restrict Ingress; ensure the rules are narrowly scoped (by CIDR, namespaceSelector, or ipBlock) rather than wide-open.

coderabbitai · 2026-03-10T16:17:25Z

manifests/machineconfigserver/networkpolicy.yaml

+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: machine-config-server-deny-all
+  namespace: {{.TargetNamespace}}
+spec:
+  podSelector:
+    matchLabels:
+      k8s-app: machine-config-server
+  policyTypes:
+  - Ingress
+  - Egress


⚠️ Potential issue | 🟠 Major

Deny-all policy will block machine-config-server traffic.

This NetworkPolicy defines Ingress and Egress policyTypes without any allow rules, which effectively blocks all inbound and outbound traffic for machine-config-server pods. Since machine-config-server serves Ignition configs to nodes during cluster bootstrap and provisioning, this policy will prevent nodes from fetching their configurations.

For a functional policy, you'll need to add ingress rules allowing traffic on the MCS port (typically 22623) from the appropriate sources (e.g., node CIDRs or the load balancer).

Example structure with ingress rules

spec: podSelector: matchLabels: k8s-app: machine-config-server policyTypes: - Ingress - Egress + ingress: + - from: + - ipBlock: + cidr: <node-cidr> # Or appropriate source selector + ports: + - protocol: TCP + port: 22623

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@manifests/machineconfigserver/networkpolicy.yaml` around lines 1 - 12, The current NetworkPolicy (kind: NetworkPolicy, metadata.name: machine-config-server-deny-all) defines only policyTypes: [Ingress, Egress] and will block all traffic to pods selected by podSelector.matchLabels.k8s-app: machine-config-server; update this resource to add explicit ingress rules that allow TCP traffic on the MCS port (22623) from the appropriate sources (e.g., node CIDRs, kubelet IP ranges, and any load balancer CIDRs) and, if needed, add egress rules to permit responses; specifically modify the spec to include an ingress section with a port: 22623/protocol: TCP and from: entries for the node networks or service/loadbalancer sources so nodes can fetch Ignition configs while preserving other restrictions.

openshift-ci · 2026-03-10T16:18:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: isabella-janssen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [isabella-janssen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

pkg/operator/sync.go (3)

2636-2646: Remove or explain commented-out code before merging.

The commented-out manifest entries and conditional block appear to be development/debugging artifacts. For a production PR, these should either be:

Removed entirely, or
Replaced with TODOs explaining the intended future implementation

♻️ Suggested cleanup

 func (optr *Operator) syncNetworkPolicies(config *renderConfig, _ *configv1.ClusterOperator) error {
-	// 1. Define your manifest list
 	manifests := []string{
-		// "common/network-policies/00-default-deny-all.yaml",
 		"common/network-policies/machine-config-controller-allow.yaml",
-		// "common/network-policies/03-allow-operator.yaml",
 	}
-
-	// if config.IsImageBuildEnabled {
-	//     manifests = append(manifests, "common/network-policies/02-allow-os-builder.yaml")
-	// }
+	// TODO: Add additional network policies as needed:
+	// - deny-all base policy
+	// - allow policies for other MCO components (MCD, MCS, MOB)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/sync.go` around lines 2636 - 2646, In syncNetworkPolicies,
remove the commented-out manifest entries
("common/network-policies/00-default-deny-all.yaml",
"common/network-policies/03-allow-operator.yaml") and the commented conditional
block for config.IsImageBuildEnabled, or if you intend to keep them as a
reminder, replace them with a single concise TODO comment that explains the
intended future behavior (e.g., when/why to add those manifests and the
condition to append "02-allow-os-builder.yaml"); ensure the manifests slice only
contains active entries and that the TODO references syncNetworkPolicies and the
image-build condition so maintainers can find the intended change.

2003-2016: Consider documenting the single-CIDR limitation for dual-stack clusters.

The function only returns the first cluster network CIDR. For dual-stack clusters with both IPv4 and IPv6 CIDRs, this may not cover all pod traffic. If the network policies need to allow traffic from all cluster CIDRs, this could leave a gap.

📝 Suggested documentation improvement

 func (optr *Operator) getClusterNetworkCIDR() (string, error) {
 	// Fetch the cluster-wide Network configuration
 	network, err := optr.networkLister.Get("cluster")
 	if err != nil {
 		return "", err
 	}
 
-	// Always use Status as it represents the current reality of the CNI
+	// Always use Status as it represents the current reality of the CNI.
+	// Note: For dual-stack clusters, this returns only the primary (first) CIDR.
+	// Network policies using this value may need adjustment for full dual-stack support.
 	if len(network.Status.ClusterNetwork) > 0 {
 		return network.Status.ClusterNetwork[0].CIDR, nil
 	}
 
 	return "", fmt.Errorf("no cluster network CIDR found in status")
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/sync.go` around lines 2003 - 2016, getClusterNetworkCIDR only
returns the first CIDR from network.Status.ClusterNetwork which silently ignores
additional CIDRs in dual-stack clusters; update the function's doc comment
(above func getClusterNetworkCIDR) to explicitly state this single-CIDR
limitation and the potential gap for dual-stack clusters, and either (a) add a
TODO note recommending changing the signature to return all CIDRs (e.g.,
[]string) so callers can handle IPv4+IPv6, or (b) call out that callers must
handle multiple CIDRs themselves — reference getClusterNetworkCIDR so reviewers
can find and adjust callers or change the signature later.

2648-2681: Consider adding retry logic for transient API errors.

Unlike applyManifests (which wraps operations in retry.OnError), this function doesn't retry on transient failures. A network hiccup during the patch could fail the entire sync unnecessarily.

♻️ Suggested retry wrapper

+import "k8s.io/client-go/util/retry"
+
 func (optr *Operator) syncNetworkPolicies(config *renderConfig, _ *configv1.ClusterOperator) error {
 	manifests := []string{
 		"common/network-policies/machine-config-controller-allow.yaml",
 	}
 
-	for _, path := range manifests {
+	return retry.OnError(retry.DefaultRetry, mcoResourceApply.IsApplyErrorRetriable, func() error {
+		for _, path := range manifests {
 		// Render the template using the operator's internal asset renderer
 		npBytes, err := renderAsset(config, path)
 		if err != nil {
 			return fmt.Errorf("failed to render %s: %w", path, err)
 		}
 
 		// ... rest of the loop body ...
 
 		if err != nil {
 			return fmt.Errorf("failed to apply network policy %s: %w", netPol.Name, err)
 		}
+		}
+		return nil
+	})
-	}
-	return nil
 }

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/sync.go`:
- Around line 656-662: The call to getClusterNetworkCIDR in syncRenderConfig can
return empty/missing data during early bringup and currently aborts the entire
sync; change syncRenderConfig/optr.renderConfig construction to tolerate a
missing CIDR by catching the error from getClusterNetworkCIDR and treating it as
non-fatal (e.g., set clusterNetworkCIDR to "" and log/debug the condition) or
gate syncNetworkPolicies to only run when inClusterBringup is false;
specifically modify the code around getClusterNetworkCIDR and the
optr.renderConfig assignment in syncRenderConfig to not return the error
directly, and ensure syncNetworkPolicies checks for an empty clusterNetworkCIDR
or inClusterBringup before applying NetworkPolicy logic.
- Around line 2649-2654: The manifest path
"common/network-policies/machine-config-controller-allow.yaml" used when calling
renderAsset(config, path) is not embedded in the manifests FS and will fail at
runtime; fix by either moving the template file into the embedded manifests tree
(e.g., add it under manifests/common/network-policies/) or change the path
string to the actual embedded location (for example
"machineconfigcontroller/machine-config-controller-allow.yaml") so that
renderAsset (and any manifests.ReadFile usage) can find the file at runtime.

---

Nitpick comments:
In `@pkg/operator/sync.go`:
- Around line 2636-2646: In syncNetworkPolicies, remove the commented-out
manifest entries ("common/network-policies/00-default-deny-all.yaml",
"common/network-policies/03-allow-operator.yaml") and the commented conditional
block for config.IsImageBuildEnabled, or if you intend to keep them as a
reminder, replace them with a single concise TODO comment that explains the
intended future behavior (e.g., when/why to add those manifests and the
condition to append "02-allow-os-builder.yaml"); ensure the manifests slice only
contains active entries and that the TODO references syncNetworkPolicies and the
image-build condition so maintainers can find the intended change.
- Around line 2003-2016: getClusterNetworkCIDR only returns the first CIDR from
network.Status.ClusterNetwork which silently ignores additional CIDRs in
dual-stack clusters; update the function's doc comment (above func
getClusterNetworkCIDR) to explicitly state this single-CIDR limitation and the
potential gap for dual-stack clusters, and either (a) add a TODO note
recommending changing the signature to return all CIDRs (e.g., []string) so
callers can handle IPv4+IPv6, or (b) call out that callers must handle multiple
CIDRs themselves — reference getClusterNetworkCIDR so reviewers can find and
adjust callers or change the signature later.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 17c000e3-6754-4a2f-bec2-39477751eaea

📥 Commits

Reviewing files that changed from the base of the PR and between f7c6266 and 5ad247f.

📒 Files selected for processing (6)

manifests/machineconfigcontroller/networkpolicy.yaml
pkg/operator/bootstrap.go
pkg/operator/operator.go
pkg/operator/render.go
pkg/operator/sync.go
templates/common/network-policies/machine-config-controller-allow.yaml

✅ Files skipped from review due to trivial changes (2)

manifests/machineconfigcontroller/networkpolicy.yaml
templates/common/network-policies/machine-config-controller-allow.yaml

coderabbitai · 2026-03-10T20:55:44Z

pkg/operator/sync.go

+	// Get the Cluster Network CIDR for the MCC's allow NetworkPolicy
+	clusterNetworkCIDR, err := optr.getClusterNetworkCIDR()
+	if err != nil {
+		return err
+	}
+
+	optr.renderConfig = getRenderConfig(optr.namespace, string(kubeAPIServerServingCABytes), spec, &imgs.RenderConfigImages, infra, pointerConfigData, apiServer, fmt.Sprintf("%d", optr.logLevel), clusterNetworkCIDR)


⚠️ Potential issue | 🟠 Major

Network CIDR fetch failure will block entire operator sync.

If the cluster Network resource doesn't have Status.ClusterNetwork populated (e.g., during early bringup before CNI is fully configured), this will cause syncRenderConfig to fail, blocking all subsequent sync operations and potentially degrading the operator.

Consider adding graceful handling similar to how proxy is handled (lines 2112-2114) by tolerating missing/empty state during bringup:

🛡️ Proposed fix for graceful degradation

// Get the Cluster Network CIDR for the MCC's allow NetworkPolicy clusterNetworkCIDR, err := optr.getClusterNetworkCIDR() if err != nil { - return err + // During early cluster bringup, the network status may not be populated yet. + // Log a warning but continue with empty CIDR; network policies will be applied + // on subsequent syncs once the network is ready. + if optr.inClusterBringup { + klog.Warningf("Could not get cluster network CIDR during bringup: %v", err) + clusterNetworkCIDR = "" + } else { + return err + } }

Alternatively, if network policies are critical, consider moving syncNetworkPolicies to only run after inClusterBringup is false, or adding a check within syncNetworkPolicies to skip if CIDR is empty.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/operator/sync.go` around lines 656 - 662, The call to getClusterNetworkCIDR in syncRenderConfig can return empty/missing data during early bringup and currently aborts the entire sync; change syncRenderConfig/optr.renderConfig construction to tolerate a missing CIDR by catching the error from getClusterNetworkCIDR and treating it as non-fatal (e.g., set clusterNetworkCIDR to "" and log/debug the condition) or gate syncNetworkPolicies to only run when inClusterBringup is false; specifically modify the code around getClusterNetworkCIDR and the optr.renderConfig assignment in syncRenderConfig to not return the error directly, and ensure syncNetworkPolicies checks for an empty clusterNetworkCIDR or inClusterBringup before applying NetworkPolicy logic.

coderabbitai · 2026-03-10T20:55:44Z

pkg/operator/sync.go

+	for _, path := range manifests {
+		// Render the template using the operator's internal asset renderer
+		npBytes, err := renderAsset(config, path)
+		if err != nil {
+			return fmt.Errorf("failed to render %s: %w", path, err)
+		}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify the network policy manifest path exists in the embedded filesystem echo "Searching for network policy manifest files..." # Check the templates directory echo "=== Checking templates/ ===" fd -t f "machine-config-controller-allow" templates/ 2>/dev/null || echo "Not found in templates/" # Check the manifests directory echo "" echo "=== Checking manifests/ ===" fd -t f "machine-config-controller-allow" manifests/ 2>/dev/null || echo "Not found in manifests/" # Check for any network-policies directories echo "" echo "=== Searching for network-policies directories ===" fd -t d "network-policies" . 2>/dev/null # Check how manifests.ReadFile is used and what paths are valid echo "" echo "=== Checking manifests package embed directive ===" rg -n "//go:embed" pkg/manifests/ manifests/ 2>/dev/null | head -20

Repository: openshift/machine-config-operator

Length of output: 430

🏁 Script executed:

# Search for renderAsset function definition rg -n "func renderAsset" pkg/operator/ -A 10

Repository: openshift/machine-config-operator

Length of output: 643

🏁 Script executed:

# Examine manifests.go to understand the embed directive cat -n pkg/manifests/manifests.go | head -30

Repository: openshift/machine-config-operator

Length of output: 135

🏁 Script executed:

# Check how other manifest paths are used in the sync.go file rg -n "renderAsset" pkg/operator/sync.go -B 2 -A 2 | head -50

Repository: openshift/machine-config-operator

Length of output: 1485

🏁 Script executed:

# Find where manifests.go actually exists find . -name "manifests.go" -type f 2>/dev/null

Repository: openshift/machine-config-operator

Length of output: 101

🏁 Script executed:

# Find where "common/network-policies/machine-config-controller-allow.yaml" is referenced rg -n "common/network-policies/machine-config-controller-allow" . -A 2 -B 2

Repository: openshift/machine-config-operator

Length of output: 421

🏁 Script executed:

# Find newAssetRenderer implementation rg -n "func newAssetRenderer" pkg/operator/ -A 15

Repository: openshift/machine-config-operator

Length of output: 833

🏁 Script executed:

# Read manifests.go to see the embed directive cat -n ./manifests/manifests.go | head -30

Repository: openshift/machine-config-operator

Length of output: 802

🏁 Script executed:

# List files in the templates directory to see the actual path structure fd -t f "" templates/common/network-policies/ -x echo

Repository: openshift/machine-config-operator

Length of output: 147

🏁 Script executed:

# Check other path patterns used in sync.go to understand the naming convention rg -n '"\w+/\w+\.yaml"' pkg/operator/sync.go | head -20

Repository: openshift/machine-config-operator

Length of output: 138

🏁 Script executed:

# Check what's actually in the manifests directory ls -la manifests/common/network-policies/ 2>/dev/null || echo "Directory does not exist in manifests/"

Repository: openshift/machine-config-operator

Length of output: 115

🏁 Script executed:

# Check the full directory structure of manifests/ fd -t d "" manifests/ | head -20

Repository: openshift/machine-config-operator

Length of output: 283

🏁 Script executed:

# Check if templates are part of the build process rg -n "templates" go.mod pkg/ Makefile build/ 2>/dev/null | grep -i "copy\|build\|embed" | head -10

Repository: openshift/machine-config-operator

Length of output: 522

🏁 Script executed:

# Check if there are any other embed directives that include templates/common rg -n "go:embed.*common" . 2>/dev/null

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

# List all files in the manifests directory to see what's available fd -t f "" manifests/ | sort

Repository: openshift/machine-config-operator

Length of output: 4338

🏁 Script executed:

# Check git history or comments around this code to see if this is new rg -n "network-policies" pkg/operator/sync.go -B 5 -A 5

Repository: openshift/machine-config-operator

Length of output: 790

The manifest path will fail at runtime because the file is not embedded in the manifests directory.

The path "common/network-policies/machine-config-controller-allow.yaml" references a file that exists in templates/common/network-policies/ but is not included in the embedded filesystem. The manifests/ directory contains no common/network-policies/ subdirectory, and manifests.ReadFile() only reads from files embedded via the //go:embed * directive in manifests/manifests.go. This will cause the code to fail when renderAsset() attempts to load the file.

Either move the file to manifests/common/network-policies/ or update the path to match an existing manifest location (e.g., "machineconfigcontroller/machine-config-controller-allow.yaml" if placing it in manifests/machineconfigcontroller/).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/operator/sync.go` around lines 2649 - 2654, The manifest path "common/network-policies/machine-config-controller-allow.yaml" used when calling renderAsset(config, path) is not embedded in the manifests FS and will fail at runtime; fix by either moving the template file into the embedded manifests tree (e.g., add it under manifests/common/network-policies/) or change the path string to the actual embedded location (for example "machineconfigcontroller/machine-config-controller-allow.yaml") so that renderAsset (and any manifests.ReadFile usage) can find the file at runtime.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

templates/common/network-policies/machine-config-controller-allow.yaml (1)
28-43: Tighten the egress exceptions with pod selectors.

These rules currently allow TCP/6443 to any pod in openshift-kube-apiserver and DNS on 53 to any pod in openshift-dns. For a deny-by-default rollout, I'd strongly prefer adding pod selectors for the actual apiserver and DNS backends so this stays least-privilege.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@templates/common/network-policies/machine-config-controller-allow.yaml`
around lines 28 - 43, The egress "to" entries for "Allow API Server Access" and
"Allow DNS Lookups" are too broad — narrow them by adding podSelector blocks
alongside the existing namespaceSelector so the rules target only the apiserver
and DNS backend pods; update the "to: # Allow API Server Access" entry that
permits TCP port 6443 to include a podSelector matching the kube-apiserver pod
label(s) and update the "to: # Allow DNS Lookups" entry that permits UDP/TCP
port 53 to include a podSelector matching the DNS backend pod label(s) used by
openshift-dns (use the actual labels from those deployments), ensuring
least-privilege egress while keeping the existing ports and namespaceSelector.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@templates/common/network-policies/machine-config-controller-allow.yaml`:
- Around line 21-26: The ingress rule labelled "Allow Kubelet Health Probes
(from Node Network)" currently uses {{.ClusterNetworkCIDR}} which is the pod
overlay and is too broad; update the rule to use the correct node/machine
network CIDR (e.g., replace {{.ClusterNetworkCIDR}} with the appropriate
variable for node/machine network such as {{.MachineNetworkCIDR}} or
{{.NodeNetworkCIDR}}) so only node IPs can access TCP port 9443, or if the
intent was to allow pod-to-controller access, change the human-facing comment to
say "from Cluster (pod) Network" to match {{.ClusterNetworkCIDR}}. Ensure the
change touches the ipBlock.cidr entry and/or the comment text associated with
port 9443.

---

Nitpick comments:
In `@templates/common/network-policies/machine-config-controller-allow.yaml`:
- Around line 28-43: The egress "to" entries for "Allow API Server Access" and
"Allow DNS Lookups" are too broad — narrow them by adding podSelector blocks
alongside the existing namespaceSelector so the rules target only the apiserver
and DNS backend pods; update the "to: # Allow API Server Access" entry that
permits TCP port 6443 to include a podSelector matching the kube-apiserver pod
label(s) and update the "to: # Allow DNS Lookups" entry that permits UDP/TCP
port 53 to include a podSelector matching the DNS backend pod label(s) used by
openshift-dns (use the actual labels from those deployments), ensuring
least-privilege egress while keeping the existing ports and namespaceSelector.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ae4392e6-beee-4bcc-adfd-5c9c6a22d83d

📥 Commits

Reviewing files that changed from the base of the PR and between 5ad247f and 570ea58.

📒 Files selected for processing (1)

templates/common/network-policies/machine-config-controller-allow.yaml

coderabbitai · 2026-03-10T21:09:40Z

templates/common/network-policies/machine-config-controller-allow.yaml

+    - from: # Allow Kubelet Health Probes (from Node Network)
+      - ipBlock:
+          cidr: {{.ClusterNetworkCIDR}}
+      ports:
+      - protocol: TCP
+        port: 9443


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for ClusterNetworkCIDR usage in the repository fd -t f '\.yaml\|\.yml\|\.go\|\.md' | xargs grep -l "ClusterNetworkCIDR" | head -20

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

# Also search for NodeNetworkCIDR or ServiceNetworkCIDR to understand the variable naming rg "NetworkCIDR" --type yaml --type md -A 2 -B 2 | head -80

Repository: openshift/machine-config-operator

Length of output: 585

🏁 Script executed:

# Look for the actual file and surrounding context cat templates/common/network-policies/machine-config-controller-allow.yaml

Repository: openshift/machine-config-operator

Length of output: 1237

🏁 Script executed:

# Search for documentation or comments about kubelet probes and network CIDR rg "kubelet.*probe|health.*probe" --type yaml --type md -A 3 -B 3 | head -100

Repository: openshift/machine-config-operator

Length of output: 6902

🏁 Script executed:

# Search for where ClusterNetworkCIDR is defined as a variable/parameter rg "ClusterNetworkCIDR" -B 5 -A 5 --type go --type yaml --type md

Repository: openshift/machine-config-operator

Length of output: 8443

🏁 Script executed:

# Look for other network CIDR definitions to understand the pattern rg "NetworkCIDR|NetworkNetwork" --type go --type yaml | head -50

Repository: openshift/machine-config-operator

Length of output: 5480

🏁 Script executed:

# Search for how these template variables are populated rg "\.ClusterNetworkCIDR|ClusterNetworkCIDR:" --type go -B 3 -A 3 | head -80

Repository: openshift/machine-config-operator

Length of output: 395

🏁 Script executed:

# Check for any documentation or README files about network setup fd -t f '\.md$|README' | xargs grep -i "ClusterNetworkCIDR\|network.*cidr" | head -30

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

# Read the getClusterNetworkCIDR implementation sed -n '/func (optr \*Operator) getClusterNetworkCIDR/,/^}/p' pkg/operator/sync.go

Repository: openshift/machine-config-operator

Length of output: 517

🏁 Script executed:

# Search for any node network or machine network CIDR usage rg "NodeNetwork|MachineNetwork|node.*cidr" --type go -A 3 -B 3 | head -100

Repository: openshift/machine-config-operator

Length of output: 9425

🏁 Script executed:

# Look for kubelet configuration or probe documentation in the codebase rg "9443|kubelet.*probe|Kubelet Health" -A 3 -B 3 --type yaml --type go | head -80

Repository: openshift/machine-config-operator

Length of output: 7475

🏁 Script executed:

# Search for what uses or listens on port 9443 rg "9443|machine.*config.*controller" --type go --type yaml -B 3 -A 3 | grep -A 10 -B 10 "9443\|probe\|webhook" | head -150

Repository: openshift/machine-config-operator

Length of output: 11461

🏁 Script executed:

# Look for webhook configurations or health probe definitions rg "webhook|healthz|readiness.*probe|liveness.*probe" --type yaml -A 5 -B 5 | head -120

Repository: openshift/machine-config-operator

Length of output: 15321

🏁 Script executed:

# Search the codebase for how the machine-config-controller is accessed on port 9443 fd -t f '\.go$' | xargs grep -l "9443" | head -10

Repository: openshift/machine-config-operator

Length of output: 402

🏁 Script executed:

# Find machine-config-controller deployment manifests fd -t f "\.yaml\|\.yml" | xargs grep -l "machine-config-controller" | head -20

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

# Look for any deployment/pod specs for machine-config-controller fd -t f "deployment\|pod\|statefulset" -path "*/machine-config*"

Repository: openshift/machine-config-operator

Length of output: 245

🏁 Script executed:

# Search for webhook or TLS server configuration in machine-config-controller code rg "ValidatingWebhook|MutatingWebhook|listener.*9443|Serve.*9443" --type go -A 5 -B 5

Repository: openshift/machine-config-operator

Length of output: 50390

🏁 Script executed:

# Look for any references to what calls/probes this service rg "machine.*config.*controller.*9443|9443.*machine" --type yaml --type go

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

# Search in the actual pkg directory for machine-config-controller main code find pkg -name "*.go" -type f | xargs grep -l "port.*9443\|9443.*port" 2>/dev/null | head -10

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

# Search for where machine-config-controller might have a server/listen on a port rg "ListenAndServe|server.*Start|Addr.*Port" pkg/controller --type go -B 5 -A 5 | head -80

Repository: openshift/machine-config-operator

Length of output: 789

🏁 Script executed:

# Let's look at what the file names suggest about network policies ls -la templates/common/network-policies/

Repository: openshift/machine-config-operator

Length of output: 276

🏁 Script executed:

# Search for metrics or server port configuration in machine-config-controller rg "Addr.*:80|Addr.*:8080|Addr.*:9000|Addr.*:9" pkg/controller --type go -B 3 -A 3 | head -60

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

# Look for any operator code that manages network policies rg "mco-allow\|mco.*allow\|NetworkPolicy" pkg/operator --type go -B 3 -A 3 | head -100

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

# Check if port 9443 is specifically for webhooks or health checks rg "webhook|health|readiness" pkg/controller --type go | head -30

Repository: openshift/machine-config-operator

Length of output: 59

🏁 Script executed:

# Look at what the comment in the NetworkPolicy file says around the problematic rule head -n 30 templates/common/network-policies/machine-config-controller-allow.yaml

Repository: openshift/machine-config-operator

Length of output: 891

Use the correct CIDR source for the intended access.

The comment says "from Node Network" but the rule uses {{.ClusterNetworkCIDR}}, which represents the pod network overlay, not node infrastructure. This makes the ingress rule much broader than intended—allowing every pod in the cluster to access port 9443—while potentially missing the actual kubelet/node source if probes originate from outside the pod network. Either use the appropriate node/machine network CIDR, or validate and update the comment to match the pod network intent.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@templates/common/network-policies/machine-config-controller-allow.yaml` around lines 21 - 26, The ingress rule labelled "Allow Kubelet Health Probes (from Node Network)" currently uses {{.ClusterNetworkCIDR}} which is the pod overlay and is too broad; update the rule to use the correct node/machine network CIDR (e.g., replace {{.ClusterNetworkCIDR}} with the appropriate variable for node/machine network such as {{.MachineNetworkCIDR}} or {{.NodeNetworkCIDR}}) so only node IPs can access TCP port 9443, or if the intent was to allow pod-to-controller access, change the human-facing comment to say "from Cluster (pod) Network" to match {{.ClusterNetworkCIDR}}. Ensure the change touches the ipBlock.cidr entry and/or the comment text associated with port 9443.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2026

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

test that host networked pods are not impacted by network policies

f7c6266

isabella-janssen force-pushed the network-policies branch from 1ed0e29 to f7c6266 Compare March 10, 2026 16:14

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2026

handle MCC deny and allow

5ad247f

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

fix mcc allow label

570ea58

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

Conversation

isabella-janssen commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

openshift-ci bot commented Mar 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Mar 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isabella-janssen commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading