Skip to content

(WIP) Network policies POC#5755

Draft
isabella-janssen wants to merge 3 commits intoopenshift:mainfrom
isabella-janssen:network-policies
Draft

(WIP) Network policies POC#5755
isabella-janssen wants to merge 3 commits intoopenshift:mainfrom
isabella-janssen:network-policies

Conversation

@isabella-janssen
Copy link
Member

@isabella-janssen isabella-janssen commented Mar 10, 2026

- What I did

- How to verify it

- Description for the changelog

Summary by CodeRabbit

  • Chores

    • Added deny-all NetworkPolicy manifests to isolate machine-config-daemon, machine-config-server, and kube-rbac-proxy-crio (Ingress and Egress).
  • New Features

    • Operator now syncs NetworkPolicy objects and includes cluster network CIDR in rendering.
    • Added an allow NetworkPolicy for machine-config-controller to permit monitoring, kubelet probes, API server and DNS access.

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Walkthrough

Adds multiple NetworkPolicy manifests (deny-all and an allow template), extends renderConfig with ClusterNetworkCIDR, adds getClusterNetworkCIDR and syncNetworkPolicies, updates getRenderConfig signatures and call sites, and inserts NetworkPolicies into the operator sync pipeline.

Changes

Cohort / File(s) Summary
NetworkPolicy manifests (deny-all)
manifests/machineconfigdaemon/networkpolicy.yaml, manifests/machineconfigserver/networkpolicy.yaml, manifests/machineconfigcontroller/networkpolicy.yaml, install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml
Added NetworkPolicy resources selecting pods by k8s-app labels (machine-config-daemon, machine-config-server, machine-config-controller, kube-rbac-proxy-crio) with policyTypes: [Ingress, Egress] and no explicit allow rules (deny-all). Install manifest targets openshift-machine-config-operator and includes annotations.
NetworkPolicy manifest (allow rules template)
templates/common/network-policies/machine-config-controller-allow.yaml
New templated NetworkPolicy mco-allow-controller permitting specific Ingress (Prometheus metrics, kubelet probes) and Egress (API server, DNS) traffic; parameterized by TargetNamespace, ControllerAppLabel, and ClusterNetworkCIDR.
Operator: render & config plumbing
pkg/operator/render.go, pkg/operator/bootstrap.go, pkg/operator/sync.go, pkg/operator/operator.go
Added ClusterNetworkCIDR string to renderConfig; updated getRenderConfig signature and its callers to accept clusterNetworkCIDR; added getClusterNetworkCIDR() helper; implemented syncNetworkPolicies(config *renderConfig, ...) which renders and server-side-applies NetworkPolicy objects; inserted this sync step into the operator sync sequence.

Sequence Diagram(s)

sequenceDiagram
  participant Operator
  participant ClusterConfig as "Cluster NetworkStatus"
  participant Render as "Renderer"
  participant KubeAPI as "Kubernetes API"

  Operator->>ClusterConfig: getClusterNetworkCIDR()
  ClusterConfig-->>Operator: clusterNetworkCIDR
  Operator->>Render: getRenderConfig(..., clusterNetworkCIDR)
  Render-->>Operator: rendered NetworkPolicy objs
  Operator->>KubeAPI: Server-Side Apply NetworkPolicies (ForceOwnership, fieldManager)
  KubeAPI-->>Operator: apply result/status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title "(WIP) Network policies POC" is vague and generic, using non-descriptive terms like "POC" and "Network policies" without conveying specific information about the actual changes or their purpose. Replace with a more specific title that describes the main change, such as "Add network policies for machine config components" or "Implement deny-all network policies for pod security".
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Stable And Deterministic Test Names ✅ Passed The custom check for stable and deterministic Ginkgo test names is not applicable to this pull request as the PR only modifies network policy manifests and Go operator code files without any Ginkgo test definitions.
Test Structure And Quality ✅ Passed This pull request does not include any Ginkgo test code. The PR adds implementation changes to pkg/operator/sync.go and creates YAML NetworkPolicy manifests. The existing test file uses standard Go testing with testify, not Ginkgo. Since the custom check reviews Ginkgo test code quality and this PR adds no Ginkgo tests, the check is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 10, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 10, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
manifests/machineconfigdaemon/networkpolicy.yaml (1)

38-38: Add a newline at the end of file.

YAML files should end with a newline character to follow POSIX conventions and avoid potential parsing issues with some tools.

Proposed fix
   policyTypes:
   - Ingress
   - Egress
+
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/machineconfigdaemon/networkpolicy.yaml` at line 38, The file
manifests/machineconfigdaemon/networkpolicy.yaml is missing a trailing newline
(the diff ends with the token "Egress"); open that file and add a single newline
character at the end of the file so the file ends with "\n" (ensure the final
line containing "Egress" is terminated), then save and commit the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@manifests/machineconfigdaemon/networkpolicy.yaml`:
- Line 38: The file manifests/machineconfigdaemon/networkpolicy.yaml is missing
a trailing newline (the diff ends with the token "Egress"); open that file and
add a single newline character at the end of the file so the file ends with "\n"
(ensure the final line containing "Egress" is terminated), then save and commit
the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1d4369a6-8878-453b-89a4-f0f6bb99ef9d

📥 Commits

Reviewing files that changed from the base of the PR and between 7a8a698 and 1ed0e29.

📒 Files selected for processing (1)
  • manifests/machineconfigdaemon/networkpolicy.yaml

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml`:
- Around line 1-16: The NetworkPolicy resource named
kube-rbac-proxy-crio-deny-all currently denies all traffic for pods with label
k8s-app: kube-rbac-proxy-crio; update it to allow required traffic instead of a
blanket deny by adding explicit allow rules: permit Ingress from the
Prometheus/monitoring service(s) or the monitoring namespace (e.g.,
label/selectors used by your Prometheus stack) so metrics scraping can reach the
kube-rbac-proxy-crio pods, and permit Egress to the API server (or to the
namespace/service account performing SubjectAccessReview calls) so authorization
requests succeed. Also reconcile the namespace field: replace the hardcoded
namespace openshift-machine-config-operator with the templated value
{{.TargetNamespace}} if other NetworkPolicy files use that template and this
file should be templated as well.

In `@manifests/machineconfigdaemon/networkpolicy.yaml`:
- Around line 1-12: The NetworkPolicy resource machine-config-daemon-deny-all
currently sets spec.policyTypes to [Ingress, Egress] with no allow rules and
will block required outbound traffic; update the NetworkPolicy (metadata.name
machine-config-daemon-deny-all, podSelector matchLabels k8s-app:
machine-config-daemon) to include explicit spec.egress rules that permit egress
to the Kubernetes API server (control-plane IPs/FQDN and port 6443) and to image
registries/container runtime endpoints (HTTP(S) ports and registry CIDRs) and
any node reporting endpoints, or remove Egress from spec.policyTypes if you
intend to only restrict Ingress; ensure the rules are narrowly scoped (by CIDR,
namespaceSelector, or ipBlock) rather than wide-open.

In `@manifests/machineconfigserver/networkpolicy.yaml`:
- Around line 1-12: The current NetworkPolicy (kind: NetworkPolicy,
metadata.name: machine-config-server-deny-all) defines only policyTypes:
[Ingress, Egress] and will block all traffic to pods selected by
podSelector.matchLabels.k8s-app: machine-config-server; update this resource to
add explicit ingress rules that allow TCP traffic on the MCS port (22623) from
the appropriate sources (e.g., node CIDRs, kubelet IP ranges, and any load
balancer CIDRs) and, if needed, add egress rules to permit responses;
specifically modify the spec to include an ingress section with a port:
22623/protocol: TCP and from: entries for the node networks or
service/loadbalancer sources so nodes can fetch Ignition configs while
preserving other restrictions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 48690d87-cd2e-48d1-9cca-ec16faea0a5d

📥 Commits

Reviewing files that changed from the base of the PR and between 1ed0e29 and f7c6266.

📒 Files selected for processing (3)
  • install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml
  • manifests/machineconfigdaemon/networkpolicy.yaml
  • manifests/machineconfigserver/networkpolicy.yaml

Comment on lines +1 to +16
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: kube-rbac-proxy-crio-deny-all
namespace: openshift-machine-config-operator
annotations:
include.release.openshift.io/ibm-cloud-managed: "true"
include.release.openshift.io/self-managed-high-availability: "true"
include.release.openshift.io/single-node-developer: "true"
spec:
podSelector:
matchLabels:
k8s-app: kube-rbac-proxy-crio
policyTypes:
- Ingress
- Egress
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Deny-all policy will block metrics scraping and API server communication.

This policy blocks all traffic for kube-rbac-proxy-crio pods. The kube-rbac-proxy needs:

  • Ingress: To receive metrics scrape requests from Prometheus/monitoring stack
  • Egress: To communicate with the API server for authorization (SubjectAccessReview)

Without allow rules, metrics collection will fail and the proxy cannot authorize requests.

Additionally, this file uses a hardcoded namespace: openshift-machine-config-operator while the other two NetworkPolicy files use the {{.TargetNamespace}} template. Confirm whether this inconsistency is intentional (e.g., this file is applied directly while others go through templating).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml`
around lines 1 - 16, The NetworkPolicy resource named
kube-rbac-proxy-crio-deny-all currently denies all traffic for pods with label
k8s-app: kube-rbac-proxy-crio; update it to allow required traffic instead of a
blanket deny by adding explicit allow rules: permit Ingress from the
Prometheus/monitoring service(s) or the monitoring namespace (e.g.,
label/selectors used by your Prometheus stack) so metrics scraping can reach the
kube-rbac-proxy-crio pods, and permit Egress to the API server (or to the
namespace/service account performing SubjectAccessReview calls) so authorization
requests succeed. Also reconcile the namespace field: replace the hardcoded
namespace openshift-machine-config-operator with the templated value
{{.TargetNamespace}} if other NetworkPolicy files use that template and this
file should be templated as well.

Comment on lines +1 to +12
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: machine-config-daemon-deny-all
namespace: {{.TargetNamespace}}
spec:
podSelector:
matchLabels:
k8s-app: machine-config-daemon
policyTypes:
- Ingress
- Egress
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Deny-all policy will break machine-config-daemon functionality.

Similar to the machine-config-server policy, this defines Ingress and Egress policyTypes without allow rules. The machine-config-daemon requires egress connectivity to:

  • Communicate with the Kubernetes API server
  • Pull container images
  • Report node status and configuration state

Blocking all egress will prevent MCD from functioning. You'll need to add appropriate egress rules for API server access and any other required destinations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/machineconfigdaemon/networkpolicy.yaml` around lines 1 - 12, The
NetworkPolicy resource machine-config-daemon-deny-all currently sets
spec.policyTypes to [Ingress, Egress] with no allow rules and will block
required outbound traffic; update the NetworkPolicy (metadata.name
machine-config-daemon-deny-all, podSelector matchLabels k8s-app:
machine-config-daemon) to include explicit spec.egress rules that permit egress
to the Kubernetes API server (control-plane IPs/FQDN and port 6443) and to image
registries/container runtime endpoints (HTTP(S) ports and registry CIDRs) and
any node reporting endpoints, or remove Egress from spec.policyTypes if you
intend to only restrict Ingress; ensure the rules are narrowly scoped (by CIDR,
namespaceSelector, or ipBlock) rather than wide-open.

Comment on lines +1 to +12
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: machine-config-server-deny-all
namespace: {{.TargetNamespace}}
spec:
podSelector:
matchLabels:
k8s-app: machine-config-server
policyTypes:
- Ingress
- Egress
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Deny-all policy will block machine-config-server traffic.

This NetworkPolicy defines Ingress and Egress policyTypes without any allow rules, which effectively blocks all inbound and outbound traffic for machine-config-server pods. Since machine-config-server serves Ignition configs to nodes during cluster bootstrap and provisioning, this policy will prevent nodes from fetching their configurations.

For a functional policy, you'll need to add ingress rules allowing traffic on the MCS port (typically 22623) from the appropriate sources (e.g., node CIDRs or the load balancer).

Example structure with ingress rules
 spec:
   podSelector:
     matchLabels:
       k8s-app: machine-config-server
   policyTypes:
   - Ingress
   - Egress
+  ingress:
+  - from:
+    - ipBlock:
+        cidr: <node-cidr>  # Or appropriate source selector
+    ports:
+    - protocol: TCP
+      port: 22623
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@manifests/machineconfigserver/networkpolicy.yaml` around lines 1 - 12, The
current NetworkPolicy (kind: NetworkPolicy, metadata.name:
machine-config-server-deny-all) defines only policyTypes: [Ingress, Egress] and
will block all traffic to pods selected by podSelector.matchLabels.k8s-app:
machine-config-server; update this resource to add explicit ingress rules that
allow TCP traffic on the MCS port (22623) from the appropriate sources (e.g.,
node CIDRs, kubelet IP ranges, and any load balancer CIDRs) and, if needed, add
egress rules to permit responses; specifically modify the spec to include an
ingress section with a port: 22623/protocol: TCP and from: entries for the node
networks or service/loadbalancer sources so nodes can fetch Ignition configs
while preserving other restrictions.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: isabella-janssen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 10, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
pkg/operator/sync.go (3)

2636-2646: Remove or explain commented-out code before merging.

The commented-out manifest entries and conditional block appear to be development/debugging artifacts. For a production PR, these should either be:

  • Removed entirely, or
  • Replaced with TODOs explaining the intended future implementation
♻️ Suggested cleanup
 func (optr *Operator) syncNetworkPolicies(config *renderConfig, _ *configv1.ClusterOperator) error {
-	// 1. Define your manifest list
 	manifests := []string{
-		// "common/network-policies/00-default-deny-all.yaml",
 		"common/network-policies/machine-config-controller-allow.yaml",
-		// "common/network-policies/03-allow-operator.yaml",
 	}
-
-	// if config.IsImageBuildEnabled {
-	//     manifests = append(manifests, "common/network-policies/02-allow-os-builder.yaml")
-	// }
+	// TODO: Add additional network policies as needed:
+	// - deny-all base policy
+	// - allow policies for other MCO components (MCD, MCS, MOB)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/sync.go` around lines 2636 - 2646, In syncNetworkPolicies,
remove the commented-out manifest entries
("common/network-policies/00-default-deny-all.yaml",
"common/network-policies/03-allow-operator.yaml") and the commented conditional
block for config.IsImageBuildEnabled, or if you intend to keep them as a
reminder, replace them with a single concise TODO comment that explains the
intended future behavior (e.g., when/why to add those manifests and the
condition to append "02-allow-os-builder.yaml"); ensure the manifests slice only
contains active entries and that the TODO references syncNetworkPolicies and the
image-build condition so maintainers can find the intended change.

2003-2016: Consider documenting the single-CIDR limitation for dual-stack clusters.

The function only returns the first cluster network CIDR. For dual-stack clusters with both IPv4 and IPv6 CIDRs, this may not cover all pod traffic. If the network policies need to allow traffic from all cluster CIDRs, this could leave a gap.

📝 Suggested documentation improvement
 func (optr *Operator) getClusterNetworkCIDR() (string, error) {
 	// Fetch the cluster-wide Network configuration
 	network, err := optr.networkLister.Get("cluster")
 	if err != nil {
 		return "", err
 	}
 
-	// Always use Status as it represents the current reality of the CNI
+	// Always use Status as it represents the current reality of the CNI.
+	// Note: For dual-stack clusters, this returns only the primary (first) CIDR.
+	// Network policies using this value may need adjustment for full dual-stack support.
 	if len(network.Status.ClusterNetwork) > 0 {
 		return network.Status.ClusterNetwork[0].CIDR, nil
 	}
 
 	return "", fmt.Errorf("no cluster network CIDR found in status")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/sync.go` around lines 2003 - 2016, getClusterNetworkCIDR only
returns the first CIDR from network.Status.ClusterNetwork which silently ignores
additional CIDRs in dual-stack clusters; update the function's doc comment
(above func getClusterNetworkCIDR) to explicitly state this single-CIDR
limitation and the potential gap for dual-stack clusters, and either (a) add a
TODO note recommending changing the signature to return all CIDRs (e.g.,
[]string) so callers can handle IPv4+IPv6, or (b) call out that callers must
handle multiple CIDRs themselves — reference getClusterNetworkCIDR so reviewers
can find and adjust callers or change the signature later.

2648-2681: Consider adding retry logic for transient API errors.

Unlike applyManifests (which wraps operations in retry.OnError), this function doesn't retry on transient failures. A network hiccup during the patch could fail the entire sync unnecessarily.

♻️ Suggested retry wrapper
+import "k8s.io/client-go/util/retry"
+
 func (optr *Operator) syncNetworkPolicies(config *renderConfig, _ *configv1.ClusterOperator) error {
 	manifests := []string{
 		"common/network-policies/machine-config-controller-allow.yaml",
 	}
 
-	for _, path := range manifests {
+	return retry.OnError(retry.DefaultRetry, mcoResourceApply.IsApplyErrorRetriable, func() error {
+		for _, path := range manifests {
 		// Render the template using the operator's internal asset renderer
 		npBytes, err := renderAsset(config, path)
 		if err != nil {
 			return fmt.Errorf("failed to render %s: %w", path, err)
 		}
 
 		// ... rest of the loop body ...
 
 		if err != nil {
 			return fmt.Errorf("failed to apply network policy %s: %w", netPol.Name, err)
 		}
+		}
+		return nil
+	})
-	}
-	return nil
 }
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/operator/sync.go`:
- Around line 656-662: The call to getClusterNetworkCIDR in syncRenderConfig can
return empty/missing data during early bringup and currently aborts the entire
sync; change syncRenderConfig/optr.renderConfig construction to tolerate a
missing CIDR by catching the error from getClusterNetworkCIDR and treating it as
non-fatal (e.g., set clusterNetworkCIDR to "" and log/debug the condition) or
gate syncNetworkPolicies to only run when inClusterBringup is false;
specifically modify the code around getClusterNetworkCIDR and the
optr.renderConfig assignment in syncRenderConfig to not return the error
directly, and ensure syncNetworkPolicies checks for an empty clusterNetworkCIDR
or inClusterBringup before applying NetworkPolicy logic.
- Around line 2649-2654: The manifest path
"common/network-policies/machine-config-controller-allow.yaml" used when calling
renderAsset(config, path) is not embedded in the manifests FS and will fail at
runtime; fix by either moving the template file into the embedded manifests tree
(e.g., add it under manifests/common/network-policies/) or change the path
string to the actual embedded location (for example
"machineconfigcontroller/machine-config-controller-allow.yaml") so that
renderAsset (and any manifests.ReadFile usage) can find the file at runtime.

---

Nitpick comments:
In `@pkg/operator/sync.go`:
- Around line 2636-2646: In syncNetworkPolicies, remove the commented-out
manifest entries ("common/network-policies/00-default-deny-all.yaml",
"common/network-policies/03-allow-operator.yaml") and the commented conditional
block for config.IsImageBuildEnabled, or if you intend to keep them as a
reminder, replace them with a single concise TODO comment that explains the
intended future behavior (e.g., when/why to add those manifests and the
condition to append "02-allow-os-builder.yaml"); ensure the manifests slice only
contains active entries and that the TODO references syncNetworkPolicies and the
image-build condition so maintainers can find the intended change.
- Around line 2003-2016: getClusterNetworkCIDR only returns the first CIDR from
network.Status.ClusterNetwork which silently ignores additional CIDRs in
dual-stack clusters; update the function's doc comment (above func
getClusterNetworkCIDR) to explicitly state this single-CIDR limitation and the
potential gap for dual-stack clusters, and either (a) add a TODO note
recommending changing the signature to return all CIDRs (e.g., []string) so
callers can handle IPv4+IPv6, or (b) call out that callers must handle multiple
CIDRs themselves — reference getClusterNetworkCIDR so reviewers can find and
adjust callers or change the signature later.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 17c000e3-6754-4a2f-bec2-39477751eaea

📥 Commits

Reviewing files that changed from the base of the PR and between f7c6266 and 5ad247f.

📒 Files selected for processing (6)
  • manifests/machineconfigcontroller/networkpolicy.yaml
  • pkg/operator/bootstrap.go
  • pkg/operator/operator.go
  • pkg/operator/render.go
  • pkg/operator/sync.go
  • templates/common/network-policies/machine-config-controller-allow.yaml
✅ Files skipped from review due to trivial changes (2)
  • manifests/machineconfigcontroller/networkpolicy.yaml
  • templates/common/network-policies/machine-config-controller-allow.yaml

Comment on lines +656 to +662
// Get the Cluster Network CIDR for the MCC's allow NetworkPolicy
clusterNetworkCIDR, err := optr.getClusterNetworkCIDR()
if err != nil {
return err
}

optr.renderConfig = getRenderConfig(optr.namespace, string(kubeAPIServerServingCABytes), spec, &imgs.RenderConfigImages, infra, pointerConfigData, apiServer, fmt.Sprintf("%d", optr.logLevel), clusterNetworkCIDR)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Network CIDR fetch failure will block entire operator sync.

If the cluster Network resource doesn't have Status.ClusterNetwork populated (e.g., during early bringup before CNI is fully configured), this will cause syncRenderConfig to fail, blocking all subsequent sync operations and potentially degrading the operator.

Consider adding graceful handling similar to how proxy is handled (lines 2112-2114) by tolerating missing/empty state during bringup:

🛡️ Proposed fix for graceful degradation
 	// Get the Cluster Network CIDR for the MCC's allow NetworkPolicy
 	clusterNetworkCIDR, err := optr.getClusterNetworkCIDR()
 	if err != nil {
-		return err
+		// During early cluster bringup, the network status may not be populated yet.
+		// Log a warning but continue with empty CIDR; network policies will be applied
+		// on subsequent syncs once the network is ready.
+		if optr.inClusterBringup {
+			klog.Warningf("Could not get cluster network CIDR during bringup: %v", err)
+			clusterNetworkCIDR = ""
+		} else {
+			return err
+		}
 	}

Alternatively, if network policies are critical, consider moving syncNetworkPolicies to only run after inClusterBringup is false, or adding a check within syncNetworkPolicies to skip if CIDR is empty.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/sync.go` around lines 656 - 662, The call to
getClusterNetworkCIDR in syncRenderConfig can return empty/missing data during
early bringup and currently aborts the entire sync; change
syncRenderConfig/optr.renderConfig construction to tolerate a missing CIDR by
catching the error from getClusterNetworkCIDR and treating it as non-fatal
(e.g., set clusterNetworkCIDR to "" and log/debug the condition) or gate
syncNetworkPolicies to only run when inClusterBringup is false; specifically
modify the code around getClusterNetworkCIDR and the optr.renderConfig
assignment in syncRenderConfig to not return the error directly, and ensure
syncNetworkPolicies checks for an empty clusterNetworkCIDR or inClusterBringup
before applying NetworkPolicy logic.

Comment on lines +2649 to +2654
for _, path := range manifests {
// Render the template using the operator's internal asset renderer
npBytes, err := renderAsset(config, path)
if err != nil {
return fmt.Errorf("failed to render %s: %w", path, err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Verify the network policy manifest path exists in the embedded filesystem

echo "Searching for network policy manifest files..."

# Check the templates directory
echo "=== Checking templates/ ==="
fd -t f "machine-config-controller-allow" templates/ 2>/dev/null || echo "Not found in templates/"

# Check the manifests directory  
echo ""
echo "=== Checking manifests/ ==="
fd -t f "machine-config-controller-allow" manifests/ 2>/dev/null || echo "Not found in manifests/"

# Check for any network-policies directories
echo ""
echo "=== Searching for network-policies directories ==="
fd -t d "network-policies" . 2>/dev/null

# Check how manifests.ReadFile is used and what paths are valid
echo ""
echo "=== Checking manifests package embed directive ==="
rg -n "//go:embed" pkg/manifests/ manifests/ 2>/dev/null | head -20

Repository: openshift/machine-config-operator

Length of output: 430


🏁 Script executed:

# Search for renderAsset function definition
rg -n "func renderAsset" pkg/operator/ -A 10

Repository: openshift/machine-config-operator

Length of output: 643


🏁 Script executed:

# Examine manifests.go to understand the embed directive
cat -n pkg/manifests/manifests.go | head -30

Repository: openshift/machine-config-operator

Length of output: 135


🏁 Script executed:

# Check how other manifest paths are used in the sync.go file
rg -n "renderAsset" pkg/operator/sync.go -B 2 -A 2 | head -50

Repository: openshift/machine-config-operator

Length of output: 1485


🏁 Script executed:

# Find where manifests.go actually exists
find . -name "manifests.go" -type f 2>/dev/null

Repository: openshift/machine-config-operator

Length of output: 101


🏁 Script executed:

# Find where "common/network-policies/machine-config-controller-allow.yaml" is referenced
rg -n "common/network-policies/machine-config-controller-allow" . -A 2 -B 2

Repository: openshift/machine-config-operator

Length of output: 421


🏁 Script executed:

# Find newAssetRenderer implementation
rg -n "func newAssetRenderer" pkg/operator/ -A 15

Repository: openshift/machine-config-operator

Length of output: 833


🏁 Script executed:

# Read manifests.go to see the embed directive
cat -n ./manifests/manifests.go | head -30

Repository: openshift/machine-config-operator

Length of output: 802


🏁 Script executed:

# List files in the templates directory to see the actual path structure
fd -t f "" templates/common/network-policies/ -x echo

Repository: openshift/machine-config-operator

Length of output: 147


🏁 Script executed:

# Check other path patterns used in sync.go to understand the naming convention
rg -n '"\w+/\w+\.yaml"' pkg/operator/sync.go | head -20

Repository: openshift/machine-config-operator

Length of output: 138


🏁 Script executed:

# Check what's actually in the manifests directory
ls -la manifests/common/network-policies/ 2>/dev/null || echo "Directory does not exist in manifests/"

Repository: openshift/machine-config-operator

Length of output: 115


🏁 Script executed:

# Check the full directory structure of manifests/
fd -t d "" manifests/ | head -20

Repository: openshift/machine-config-operator

Length of output: 283


🏁 Script executed:

# Check if templates are part of the build process
rg -n "templates" go.mod pkg/ Makefile build/ 2>/dev/null | grep -i "copy\|build\|embed" | head -10

Repository: openshift/machine-config-operator

Length of output: 522


🏁 Script executed:

# Check if there are any other embed directives that include templates/common
rg -n "go:embed.*common" . 2>/dev/null

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# List all files in the manifests directory to see what's available
fd -t f "" manifests/ | sort

Repository: openshift/machine-config-operator

Length of output: 4338


🏁 Script executed:

# Check git history or comments around this code to see if this is new
rg -n "network-policies" pkg/operator/sync.go -B 5 -A 5

Repository: openshift/machine-config-operator

Length of output: 790


The manifest path will fail at runtime because the file is not embedded in the manifests directory.

The path "common/network-policies/machine-config-controller-allow.yaml" references a file that exists in templates/common/network-policies/ but is not included in the embedded filesystem. The manifests/ directory contains no common/network-policies/ subdirectory, and manifests.ReadFile() only reads from files embedded via the //go:embed * directive in manifests/manifests.go. This will cause the code to fail when renderAsset() attempts to load the file.

Either move the file to manifests/common/network-policies/ or update the path to match an existing manifest location (e.g., "machineconfigcontroller/machine-config-controller-allow.yaml" if placing it in manifests/machineconfigcontroller/).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/sync.go` around lines 2649 - 2654, The manifest path
"common/network-policies/machine-config-controller-allow.yaml" used when calling
renderAsset(config, path) is not embedded in the manifests FS and will fail at
runtime; fix by either moving the template file into the embedded manifests tree
(e.g., add it under manifests/common/network-policies/) or change the path
string to the actual embedded location (for example
"machineconfigcontroller/machine-config-controller-allow.yaml") so that
renderAsset (and any manifests.ReadFile usage) can find the file at runtime.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
templates/common/network-policies/machine-config-controller-allow.yaml (1)

28-43: Tighten the egress exceptions with pod selectors.

These rules currently allow TCP/6443 to any pod in openshift-kube-apiserver and DNS on 53 to any pod in openshift-dns. For a deny-by-default rollout, I'd strongly prefer adding pod selectors for the actual apiserver and DNS backends so this stays least-privilege.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@templates/common/network-policies/machine-config-controller-allow.yaml`
around lines 28 - 43, The egress "to" entries for "Allow API Server Access" and
"Allow DNS Lookups" are too broad — narrow them by adding podSelector blocks
alongside the existing namespaceSelector so the rules target only the apiserver
and DNS backend pods; update the "to: # Allow API Server Access" entry that
permits TCP port 6443 to include a podSelector matching the kube-apiserver pod
label(s) and update the "to: # Allow DNS Lookups" entry that permits UDP/TCP
port 53 to include a podSelector matching the DNS backend pod label(s) used by
openshift-dns (use the actual labels from those deployments), ensuring
least-privilege egress while keeping the existing ports and namespaceSelector.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@templates/common/network-policies/machine-config-controller-allow.yaml`:
- Around line 21-26: The ingress rule labelled "Allow Kubelet Health Probes
(from Node Network)" currently uses {{.ClusterNetworkCIDR}} which is the pod
overlay and is too broad; update the rule to use the correct node/machine
network CIDR (e.g., replace {{.ClusterNetworkCIDR}} with the appropriate
variable for node/machine network such as {{.MachineNetworkCIDR}} or
{{.NodeNetworkCIDR}}) so only node IPs can access TCP port 9443, or if the
intent was to allow pod-to-controller access, change the human-facing comment to
say "from Cluster (pod) Network" to match {{.ClusterNetworkCIDR}}. Ensure the
change touches the ipBlock.cidr entry and/or the comment text associated with
port 9443.

---

Nitpick comments:
In `@templates/common/network-policies/machine-config-controller-allow.yaml`:
- Around line 28-43: The egress "to" entries for "Allow API Server Access" and
"Allow DNS Lookups" are too broad — narrow them by adding podSelector blocks
alongside the existing namespaceSelector so the rules target only the apiserver
and DNS backend pods; update the "to: # Allow API Server Access" entry that
permits TCP port 6443 to include a podSelector matching the kube-apiserver pod
label(s) and update the "to: # Allow DNS Lookups" entry that permits UDP/TCP
port 53 to include a podSelector matching the DNS backend pod label(s) used by
openshift-dns (use the actual labels from those deployments), ensuring
least-privilege egress while keeping the existing ports and namespaceSelector.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ae4392e6-beee-4bcc-adfd-5c9c6a22d83d

📥 Commits

Reviewing files that changed from the base of the PR and between 5ad247f and 570ea58.

📒 Files selected for processing (1)
  • templates/common/network-policies/machine-config-controller-allow.yaml

Comment on lines +21 to +26
- from: # Allow Kubelet Health Probes (from Node Network)
- ipBlock:
cidr: {{.ClusterNetworkCIDR}}
ports:
- protocol: TCP
port: 9443
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for ClusterNetworkCIDR usage in the repository
fd -t f '\.yaml\|\.yml\|\.go\|\.md' | xargs grep -l "ClusterNetworkCIDR" | head -20

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# Also search for NodeNetworkCIDR or ServiceNetworkCIDR to understand the variable naming
rg "NetworkCIDR" --type yaml --type md -A 2 -B 2 | head -80

Repository: openshift/machine-config-operator

Length of output: 585


🏁 Script executed:

# Look for the actual file and surrounding context
cat templates/common/network-policies/machine-config-controller-allow.yaml

Repository: openshift/machine-config-operator

Length of output: 1237


🏁 Script executed:

# Search for documentation or comments about kubelet probes and network CIDR
rg "kubelet.*probe|health.*probe" --type yaml --type md -A 3 -B 3 | head -100

Repository: openshift/machine-config-operator

Length of output: 6902


🏁 Script executed:

# Search for where ClusterNetworkCIDR is defined as a variable/parameter
rg "ClusterNetworkCIDR" -B 5 -A 5 --type go --type yaml --type md

Repository: openshift/machine-config-operator

Length of output: 8443


🏁 Script executed:

# Look for other network CIDR definitions to understand the pattern
rg "NetworkCIDR|NetworkNetwork" --type go --type yaml | head -50

Repository: openshift/machine-config-operator

Length of output: 5480


🏁 Script executed:

# Search for how these template variables are populated
rg "\.ClusterNetworkCIDR|ClusterNetworkCIDR:" --type go -B 3 -A 3 | head -80

Repository: openshift/machine-config-operator

Length of output: 395


🏁 Script executed:

# Check for any documentation or README files about network setup
fd -t f '\.md$|README' | xargs grep -i "ClusterNetworkCIDR\|network.*cidr" | head -30

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# Read the getClusterNetworkCIDR implementation
sed -n '/func (optr \*Operator) getClusterNetworkCIDR/,/^}/p' pkg/operator/sync.go

Repository: openshift/machine-config-operator

Length of output: 517


🏁 Script executed:

# Search for any node network or machine network CIDR usage
rg "NodeNetwork|MachineNetwork|node.*cidr" --type go -A 3 -B 3 | head -100

Repository: openshift/machine-config-operator

Length of output: 9425


🏁 Script executed:

# Look for kubelet configuration or probe documentation in the codebase
rg "9443|kubelet.*probe|Kubelet Health" -A 3 -B 3 --type yaml --type go | head -80

Repository: openshift/machine-config-operator

Length of output: 7475


🏁 Script executed:

# Search for what uses or listens on port 9443
rg "9443|machine.*config.*controller" --type go --type yaml -B 3 -A 3 | grep -A 10 -B 10 "9443\|probe\|webhook" | head -150

Repository: openshift/machine-config-operator

Length of output: 11461


🏁 Script executed:

# Look for webhook configurations or health probe definitions
rg "webhook|healthz|readiness.*probe|liveness.*probe" --type yaml -A 5 -B 5 | head -120

Repository: openshift/machine-config-operator

Length of output: 15321


🏁 Script executed:

# Search the codebase for how the machine-config-controller is accessed on port 9443
fd -t f '\.go$' | xargs grep -l "9443" | head -10

Repository: openshift/machine-config-operator

Length of output: 402


🏁 Script executed:

# Find machine-config-controller deployment manifests
fd -t f "\.yaml\|\.yml" | xargs grep -l "machine-config-controller" | head -20

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# Look for any deployment/pod specs for machine-config-controller
fd -t f "deployment\|pod\|statefulset" -path "*/machine-config*"

Repository: openshift/machine-config-operator

Length of output: 245


🏁 Script executed:

# Search for webhook or TLS server configuration in machine-config-controller code
rg "ValidatingWebhook|MutatingWebhook|listener.*9443|Serve.*9443" --type go -A 5 -B 5

Repository: openshift/machine-config-operator

Length of output: 50390


🏁 Script executed:

# Look for any references to what calls/probes this service
rg "machine.*config.*controller.*9443|9443.*machine" --type yaml --type go

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# Search in the actual pkg directory for machine-config-controller main code
find pkg -name "*.go" -type f | xargs grep -l "port.*9443\|9443.*port" 2>/dev/null | head -10

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# Search for where machine-config-controller might have a server/listen on a port
rg "ListenAndServe|server.*Start|Addr.*Port" pkg/controller --type go -B 5 -A 5 | head -80

Repository: openshift/machine-config-operator

Length of output: 789


🏁 Script executed:

# Let's look at what the file names suggest about network policies
ls -la templates/common/network-policies/

Repository: openshift/machine-config-operator

Length of output: 276


🏁 Script executed:

# Search for metrics or server port configuration in machine-config-controller
rg "Addr.*:80|Addr.*:8080|Addr.*:9000|Addr.*:9" pkg/controller --type go -B 3 -A 3 | head -60

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# Look for any operator code that manages network policies
rg "mco-allow\|mco.*allow\|NetworkPolicy" pkg/operator --type go -B 3 -A 3 | head -100

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# Check if port 9443 is specifically for webhooks or health checks
rg "webhook|health|readiness" pkg/controller --type go | head -30

Repository: openshift/machine-config-operator

Length of output: 59


🏁 Script executed:

# Look at what the comment in the NetworkPolicy file says around the problematic rule
head -n 30 templates/common/network-policies/machine-config-controller-allow.yaml

Repository: openshift/machine-config-operator

Length of output: 891


Use the correct CIDR source for the intended access.

The comment says "from Node Network" but the rule uses {{.ClusterNetworkCIDR}}, which represents the pod network overlay, not node infrastructure. This makes the ingress rule much broader than intended—allowing every pod in the cluster to access port 9443—while potentially missing the actual kubelet/node source if probes originate from outside the pod network. Either use the appropriate node/machine network CIDR, or validate and update the comment to match the pod network intent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@templates/common/network-policies/machine-config-controller-allow.yaml`
around lines 21 - 26, The ingress rule labelled "Allow Kubelet Health Probes
(from Node Network)" currently uses {{.ClusterNetworkCIDR}} which is the pod
overlay and is too broad; update the rule to use the correct node/machine
network CIDR (e.g., replace {{.ClusterNetworkCIDR}} with the appropriate
variable for node/machine network such as {{.MachineNetworkCIDR}} or
{{.NodeNetworkCIDR}}) so only node IPs can access TCP port 9443, or if the
intent was to allow pod-to-controller access, change the human-facing comment to
say "from Cluster (pod) Network" to match {{.ClusterNetworkCIDR}}. Ensure the
change touches the ipBlock.cidr entry and/or the comment text associated with
port 9443.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant