(WIP) Network policies POC#5755
Conversation
WalkthroughAdds multiple NetworkPolicy manifests (deny-all and an allow template), extends renderConfig with ClusterNetworkCIDR, adds getClusterNetworkCIDR and syncNetworkPolicies, updates getRenderConfig signatures and call sites, and inserts NetworkPolicies into the operator sync pipeline. Changes
Sequence Diagram(s)sequenceDiagram
participant Operator
participant ClusterConfig as "Cluster NetworkStatus"
participant Render as "Renderer"
participant KubeAPI as "Kubernetes API"
Operator->>ClusterConfig: getClusterNetworkCIDR()
ClusterConfig-->>Operator: clusterNetworkCIDR
Operator->>Render: getRenderConfig(..., clusterNetworkCIDR)
Render-->>Operator: rendered NetworkPolicy objs
Operator->>KubeAPI: Server-Side Apply NetworkPolicies (ForceOwnership, fieldManager)
KubeAPI-->>Operator: apply result/status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
manifests/machineconfigdaemon/networkpolicy.yaml (1)
38-38: Add a newline at the end of file.YAML files should end with a newline character to follow POSIX conventions and avoid potential parsing issues with some tools.
Proposed fix
policyTypes: - Ingress - Egress +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@manifests/machineconfigdaemon/networkpolicy.yaml` at line 38, The file manifests/machineconfigdaemon/networkpolicy.yaml is missing a trailing newline (the diff ends with the token "Egress"); open that file and add a single newline character at the end of the file so the file ends with "\n" (ensure the final line containing "Egress" is terminated), then save and commit the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@manifests/machineconfigdaemon/networkpolicy.yaml`:
- Line 38: The file manifests/machineconfigdaemon/networkpolicy.yaml is missing
a trailing newline (the diff ends with the token "Egress"); open that file and
add a single newline character at the end of the file so the file ends with "\n"
(ensure the final line containing "Egress" is terminated), then save and commit
the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1d4369a6-8878-453b-89a4-f0f6bb99ef9d
📒 Files selected for processing (1)
manifests/machineconfigdaemon/networkpolicy.yaml
1ed0e29 to
f7c6266
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml`:
- Around line 1-16: The NetworkPolicy resource named
kube-rbac-proxy-crio-deny-all currently denies all traffic for pods with label
k8s-app: kube-rbac-proxy-crio; update it to allow required traffic instead of a
blanket deny by adding explicit allow rules: permit Ingress from the
Prometheus/monitoring service(s) or the monitoring namespace (e.g.,
label/selectors used by your Prometheus stack) so metrics scraping can reach the
kube-rbac-proxy-crio pods, and permit Egress to the API server (or to the
namespace/service account performing SubjectAccessReview calls) so authorization
requests succeed. Also reconcile the namespace field: replace the hardcoded
namespace openshift-machine-config-operator with the templated value
{{.TargetNamespace}} if other NetworkPolicy files use that template and this
file should be templated as well.
In `@manifests/machineconfigdaemon/networkpolicy.yaml`:
- Around line 1-12: The NetworkPolicy resource machine-config-daemon-deny-all
currently sets spec.policyTypes to [Ingress, Egress] with no allow rules and
will block required outbound traffic; update the NetworkPolicy (metadata.name
machine-config-daemon-deny-all, podSelector matchLabels k8s-app:
machine-config-daemon) to include explicit spec.egress rules that permit egress
to the Kubernetes API server (control-plane IPs/FQDN and port 6443) and to image
registries/container runtime endpoints (HTTP(S) ports and registry CIDRs) and
any node reporting endpoints, or remove Egress from spec.policyTypes if you
intend to only restrict Ingress; ensure the rules are narrowly scoped (by CIDR,
namespaceSelector, or ipBlock) rather than wide-open.
In `@manifests/machineconfigserver/networkpolicy.yaml`:
- Around line 1-12: The current NetworkPolicy (kind: NetworkPolicy,
metadata.name: machine-config-server-deny-all) defines only policyTypes:
[Ingress, Egress] and will block all traffic to pods selected by
podSelector.matchLabels.k8s-app: machine-config-server; update this resource to
add explicit ingress rules that allow TCP traffic on the MCS port (22623) from
the appropriate sources (e.g., node CIDRs, kubelet IP ranges, and any load
balancer CIDRs) and, if needed, add egress rules to permit responses;
specifically modify the spec to include an ingress section with a port:
22623/protocol: TCP and from: entries for the node networks or
service/loadbalancer sources so nodes can fetch Ignition configs while
preserving other restrictions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 48690d87-cd2e-48d1-9cca-ec16faea0a5d
📒 Files selected for processing (3)
install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yamlmanifests/machineconfigdaemon/networkpolicy.yamlmanifests/machineconfigserver/networkpolicy.yaml
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: kube-rbac-proxy-crio-deny-all | ||
| namespace: openshift-machine-config-operator | ||
| annotations: | ||
| include.release.openshift.io/ibm-cloud-managed: "true" | ||
| include.release.openshift.io/self-managed-high-availability: "true" | ||
| include.release.openshift.io/single-node-developer: "true" | ||
| spec: | ||
| podSelector: | ||
| matchLabels: | ||
| k8s-app: kube-rbac-proxy-crio | ||
| policyTypes: | ||
| - Ingress | ||
| - Egress |
There was a problem hiding this comment.
Deny-all policy will block metrics scraping and API server communication.
This policy blocks all traffic for kube-rbac-proxy-crio pods. The kube-rbac-proxy needs:
- Ingress: To receive metrics scrape requests from Prometheus/monitoring stack
- Egress: To communicate with the API server for authorization (SubjectAccessReview)
Without allow rules, metrics collection will fail and the proxy cannot authorize requests.
Additionally, this file uses a hardcoded namespace: openshift-machine-config-operator while the other two NetworkPolicy files use the {{.TargetNamespace}} template. Confirm whether this inconsistency is intentional (e.g., this file is applied directly while others go through templating).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@install/0000_80_machine-config_00_networkpolicy-kube-rbac-proxy-crio.yaml`
around lines 1 - 16, The NetworkPolicy resource named
kube-rbac-proxy-crio-deny-all currently denies all traffic for pods with label
k8s-app: kube-rbac-proxy-crio; update it to allow required traffic instead of a
blanket deny by adding explicit allow rules: permit Ingress from the
Prometheus/monitoring service(s) or the monitoring namespace (e.g.,
label/selectors used by your Prometheus stack) so metrics scraping can reach the
kube-rbac-proxy-crio pods, and permit Egress to the API server (or to the
namespace/service account performing SubjectAccessReview calls) so authorization
requests succeed. Also reconcile the namespace field: replace the hardcoded
namespace openshift-machine-config-operator with the templated value
{{.TargetNamespace}} if other NetworkPolicy files use that template and this
file should be templated as well.
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: machine-config-daemon-deny-all | ||
| namespace: {{.TargetNamespace}} | ||
| spec: | ||
| podSelector: | ||
| matchLabels: | ||
| k8s-app: machine-config-daemon | ||
| policyTypes: | ||
| - Ingress | ||
| - Egress |
There was a problem hiding this comment.
Deny-all policy will break machine-config-daemon functionality.
Similar to the machine-config-server policy, this defines Ingress and Egress policyTypes without allow rules. The machine-config-daemon requires egress connectivity to:
- Communicate with the Kubernetes API server
- Pull container images
- Report node status and configuration state
Blocking all egress will prevent MCD from functioning. You'll need to add appropriate egress rules for API server access and any other required destinations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@manifests/machineconfigdaemon/networkpolicy.yaml` around lines 1 - 12, The
NetworkPolicy resource machine-config-daemon-deny-all currently sets
spec.policyTypes to [Ingress, Egress] with no allow rules and will block
required outbound traffic; update the NetworkPolicy (metadata.name
machine-config-daemon-deny-all, podSelector matchLabels k8s-app:
machine-config-daemon) to include explicit spec.egress rules that permit egress
to the Kubernetes API server (control-plane IPs/FQDN and port 6443) and to image
registries/container runtime endpoints (HTTP(S) ports and registry CIDRs) and
any node reporting endpoints, or remove Egress from spec.policyTypes if you
intend to only restrict Ingress; ensure the rules are narrowly scoped (by CIDR,
namespaceSelector, or ipBlock) rather than wide-open.
| apiVersion: networking.k8s.io/v1 | ||
| kind: NetworkPolicy | ||
| metadata: | ||
| name: machine-config-server-deny-all | ||
| namespace: {{.TargetNamespace}} | ||
| spec: | ||
| podSelector: | ||
| matchLabels: | ||
| k8s-app: machine-config-server | ||
| policyTypes: | ||
| - Ingress | ||
| - Egress |
There was a problem hiding this comment.
Deny-all policy will block machine-config-server traffic.
This NetworkPolicy defines Ingress and Egress policyTypes without any allow rules, which effectively blocks all inbound and outbound traffic for machine-config-server pods. Since machine-config-server serves Ignition configs to nodes during cluster bootstrap and provisioning, this policy will prevent nodes from fetching their configurations.
For a functional policy, you'll need to add ingress rules allowing traffic on the MCS port (typically 22623) from the appropriate sources (e.g., node CIDRs or the load balancer).
Example structure with ingress rules
spec:
podSelector:
matchLabels:
k8s-app: machine-config-server
policyTypes:
- Ingress
- Egress
+ ingress:
+ - from:
+ - ipBlock:
+ cidr: <node-cidr> # Or appropriate source selector
+ ports:
+ - protocol: TCP
+ port: 22623🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@manifests/machineconfigserver/networkpolicy.yaml` around lines 1 - 12, The
current NetworkPolicy (kind: NetworkPolicy, metadata.name:
machine-config-server-deny-all) defines only policyTypes: [Ingress, Egress] and
will block all traffic to pods selected by podSelector.matchLabels.k8s-app:
machine-config-server; update this resource to add explicit ingress rules that
allow TCP traffic on the MCS port (22623) from the appropriate sources (e.g.,
node CIDRs, kubelet IP ranges, and any load balancer CIDRs) and, if needed, add
egress rules to permit responses; specifically modify the spec to include an
ingress section with a port: 22623/protocol: TCP and from: entries for the node
networks or service/loadbalancer sources so nodes can fetch Ignition configs
while preserving other restrictions.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: isabella-janssen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
pkg/operator/sync.go (3)
2636-2646: Remove or explain commented-out code before merging.The commented-out manifest entries and conditional block appear to be development/debugging artifacts. For a production PR, these should either be:
- Removed entirely, or
- Replaced with TODOs explaining the intended future implementation
♻️ Suggested cleanup
func (optr *Operator) syncNetworkPolicies(config *renderConfig, _ *configv1.ClusterOperator) error { - // 1. Define your manifest list manifests := []string{ - // "common/network-policies/00-default-deny-all.yaml", "common/network-policies/machine-config-controller-allow.yaml", - // "common/network-policies/03-allow-operator.yaml", } - - // if config.IsImageBuildEnabled { - // manifests = append(manifests, "common/network-policies/02-allow-os-builder.yaml") - // } + // TODO: Add additional network policies as needed: + // - deny-all base policy + // - allow policies for other MCO components (MCD, MCS, MOB)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/sync.go` around lines 2636 - 2646, In syncNetworkPolicies, remove the commented-out manifest entries ("common/network-policies/00-default-deny-all.yaml", "common/network-policies/03-allow-operator.yaml") and the commented conditional block for config.IsImageBuildEnabled, or if you intend to keep them as a reminder, replace them with a single concise TODO comment that explains the intended future behavior (e.g., when/why to add those manifests and the condition to append "02-allow-os-builder.yaml"); ensure the manifests slice only contains active entries and that the TODO references syncNetworkPolicies and the image-build condition so maintainers can find the intended change.
2003-2016: Consider documenting the single-CIDR limitation for dual-stack clusters.The function only returns the first cluster network CIDR. For dual-stack clusters with both IPv4 and IPv6 CIDRs, this may not cover all pod traffic. If the network policies need to allow traffic from all cluster CIDRs, this could leave a gap.
📝 Suggested documentation improvement
func (optr *Operator) getClusterNetworkCIDR() (string, error) { // Fetch the cluster-wide Network configuration network, err := optr.networkLister.Get("cluster") if err != nil { return "", err } - // Always use Status as it represents the current reality of the CNI + // Always use Status as it represents the current reality of the CNI. + // Note: For dual-stack clusters, this returns only the primary (first) CIDR. + // Network policies using this value may need adjustment for full dual-stack support. if len(network.Status.ClusterNetwork) > 0 { return network.Status.ClusterNetwork[0].CIDR, nil } return "", fmt.Errorf("no cluster network CIDR found in status") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/sync.go` around lines 2003 - 2016, getClusterNetworkCIDR only returns the first CIDR from network.Status.ClusterNetwork which silently ignores additional CIDRs in dual-stack clusters; update the function's doc comment (above func getClusterNetworkCIDR) to explicitly state this single-CIDR limitation and the potential gap for dual-stack clusters, and either (a) add a TODO note recommending changing the signature to return all CIDRs (e.g., []string) so callers can handle IPv4+IPv6, or (b) call out that callers must handle multiple CIDRs themselves — reference getClusterNetworkCIDR so reviewers can find and adjust callers or change the signature later.
2648-2681: Consider adding retry logic for transient API errors.Unlike
applyManifests(which wraps operations inretry.OnError), this function doesn't retry on transient failures. A network hiccup during the patch could fail the entire sync unnecessarily.♻️ Suggested retry wrapper
+import "k8s.io/client-go/util/retry" + func (optr *Operator) syncNetworkPolicies(config *renderConfig, _ *configv1.ClusterOperator) error { manifests := []string{ "common/network-policies/machine-config-controller-allow.yaml", } - for _, path := range manifests { + return retry.OnError(retry.DefaultRetry, mcoResourceApply.IsApplyErrorRetriable, func() error { + for _, path := range manifests { // Render the template using the operator's internal asset renderer npBytes, err := renderAsset(config, path) if err != nil { return fmt.Errorf("failed to render %s: %w", path, err) } // ... rest of the loop body ... if err != nil { return fmt.Errorf("failed to apply network policy %s: %w", netPol.Name, err) } + } + return nil + }) - } - return nil }
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/operator/sync.go`:
- Around line 656-662: The call to getClusterNetworkCIDR in syncRenderConfig can
return empty/missing data during early bringup and currently aborts the entire
sync; change syncRenderConfig/optr.renderConfig construction to tolerate a
missing CIDR by catching the error from getClusterNetworkCIDR and treating it as
non-fatal (e.g., set clusterNetworkCIDR to "" and log/debug the condition) or
gate syncNetworkPolicies to only run when inClusterBringup is false;
specifically modify the code around getClusterNetworkCIDR and the
optr.renderConfig assignment in syncRenderConfig to not return the error
directly, and ensure syncNetworkPolicies checks for an empty clusterNetworkCIDR
or inClusterBringup before applying NetworkPolicy logic.
- Around line 2649-2654: The manifest path
"common/network-policies/machine-config-controller-allow.yaml" used when calling
renderAsset(config, path) is not embedded in the manifests FS and will fail at
runtime; fix by either moving the template file into the embedded manifests tree
(e.g., add it under manifests/common/network-policies/) or change the path
string to the actual embedded location (for example
"machineconfigcontroller/machine-config-controller-allow.yaml") so that
renderAsset (and any manifests.ReadFile usage) can find the file at runtime.
---
Nitpick comments:
In `@pkg/operator/sync.go`:
- Around line 2636-2646: In syncNetworkPolicies, remove the commented-out
manifest entries ("common/network-policies/00-default-deny-all.yaml",
"common/network-policies/03-allow-operator.yaml") and the commented conditional
block for config.IsImageBuildEnabled, or if you intend to keep them as a
reminder, replace them with a single concise TODO comment that explains the
intended future behavior (e.g., when/why to add those manifests and the
condition to append "02-allow-os-builder.yaml"); ensure the manifests slice only
contains active entries and that the TODO references syncNetworkPolicies and the
image-build condition so maintainers can find the intended change.
- Around line 2003-2016: getClusterNetworkCIDR only returns the first CIDR from
network.Status.ClusterNetwork which silently ignores additional CIDRs in
dual-stack clusters; update the function's doc comment (above func
getClusterNetworkCIDR) to explicitly state this single-CIDR limitation and the
potential gap for dual-stack clusters, and either (a) add a TODO note
recommending changing the signature to return all CIDRs (e.g., []string) so
callers can handle IPv4+IPv6, or (b) call out that callers must handle multiple
CIDRs themselves — reference getClusterNetworkCIDR so reviewers can find and
adjust callers or change the signature later.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 17c000e3-6754-4a2f-bec2-39477751eaea
📒 Files selected for processing (6)
manifests/machineconfigcontroller/networkpolicy.yamlpkg/operator/bootstrap.gopkg/operator/operator.gopkg/operator/render.gopkg/operator/sync.gotemplates/common/network-policies/machine-config-controller-allow.yaml
✅ Files skipped from review due to trivial changes (2)
- manifests/machineconfigcontroller/networkpolicy.yaml
- templates/common/network-policies/machine-config-controller-allow.yaml
| // Get the Cluster Network CIDR for the MCC's allow NetworkPolicy | ||
| clusterNetworkCIDR, err := optr.getClusterNetworkCIDR() | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| optr.renderConfig = getRenderConfig(optr.namespace, string(kubeAPIServerServingCABytes), spec, &imgs.RenderConfigImages, infra, pointerConfigData, apiServer, fmt.Sprintf("%d", optr.logLevel), clusterNetworkCIDR) |
There was a problem hiding this comment.
Network CIDR fetch failure will block entire operator sync.
If the cluster Network resource doesn't have Status.ClusterNetwork populated (e.g., during early bringup before CNI is fully configured), this will cause syncRenderConfig to fail, blocking all subsequent sync operations and potentially degrading the operator.
Consider adding graceful handling similar to how proxy is handled (lines 2112-2114) by tolerating missing/empty state during bringup:
🛡️ Proposed fix for graceful degradation
// Get the Cluster Network CIDR for the MCC's allow NetworkPolicy
clusterNetworkCIDR, err := optr.getClusterNetworkCIDR()
if err != nil {
- return err
+ // During early cluster bringup, the network status may not be populated yet.
+ // Log a warning but continue with empty CIDR; network policies will be applied
+ // on subsequent syncs once the network is ready.
+ if optr.inClusterBringup {
+ klog.Warningf("Could not get cluster network CIDR during bringup: %v", err)
+ clusterNetworkCIDR = ""
+ } else {
+ return err
+ }
}Alternatively, if network policies are critical, consider moving syncNetworkPolicies to only run after inClusterBringup is false, or adding a check within syncNetworkPolicies to skip if CIDR is empty.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/operator/sync.go` around lines 656 - 662, The call to
getClusterNetworkCIDR in syncRenderConfig can return empty/missing data during
early bringup and currently aborts the entire sync; change
syncRenderConfig/optr.renderConfig construction to tolerate a missing CIDR by
catching the error from getClusterNetworkCIDR and treating it as non-fatal
(e.g., set clusterNetworkCIDR to "" and log/debug the condition) or gate
syncNetworkPolicies to only run when inClusterBringup is false; specifically
modify the code around getClusterNetworkCIDR and the optr.renderConfig
assignment in syncRenderConfig to not return the error directly, and ensure
syncNetworkPolicies checks for an empty clusterNetworkCIDR or inClusterBringup
before applying NetworkPolicy logic.
| for _, path := range manifests { | ||
| // Render the template using the operator's internal asset renderer | ||
| npBytes, err := renderAsset(config, path) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to render %s: %w", path, err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify the network policy manifest path exists in the embedded filesystem
echo "Searching for network policy manifest files..."
# Check the templates directory
echo "=== Checking templates/ ==="
fd -t f "machine-config-controller-allow" templates/ 2>/dev/null || echo "Not found in templates/"
# Check the manifests directory
echo ""
echo "=== Checking manifests/ ==="
fd -t f "machine-config-controller-allow" manifests/ 2>/dev/null || echo "Not found in manifests/"
# Check for any network-policies directories
echo ""
echo "=== Searching for network-policies directories ==="
fd -t d "network-policies" . 2>/dev/null
# Check how manifests.ReadFile is used and what paths are valid
echo ""
echo "=== Checking manifests package embed directive ==="
rg -n "//go:embed" pkg/manifests/ manifests/ 2>/dev/null | head -20Repository: openshift/machine-config-operator
Length of output: 430
🏁 Script executed:
# Search for renderAsset function definition
rg -n "func renderAsset" pkg/operator/ -A 10Repository: openshift/machine-config-operator
Length of output: 643
🏁 Script executed:
# Examine manifests.go to understand the embed directive
cat -n pkg/manifests/manifests.go | head -30Repository: openshift/machine-config-operator
Length of output: 135
🏁 Script executed:
# Check how other manifest paths are used in the sync.go file
rg -n "renderAsset" pkg/operator/sync.go -B 2 -A 2 | head -50Repository: openshift/machine-config-operator
Length of output: 1485
🏁 Script executed:
# Find where manifests.go actually exists
find . -name "manifests.go" -type f 2>/dev/nullRepository: openshift/machine-config-operator
Length of output: 101
🏁 Script executed:
# Find where "common/network-policies/machine-config-controller-allow.yaml" is referenced
rg -n "common/network-policies/machine-config-controller-allow" . -A 2 -B 2Repository: openshift/machine-config-operator
Length of output: 421
🏁 Script executed:
# Find newAssetRenderer implementation
rg -n "func newAssetRenderer" pkg/operator/ -A 15Repository: openshift/machine-config-operator
Length of output: 833
🏁 Script executed:
# Read manifests.go to see the embed directive
cat -n ./manifests/manifests.go | head -30Repository: openshift/machine-config-operator
Length of output: 802
🏁 Script executed:
# List files in the templates directory to see the actual path structure
fd -t f "" templates/common/network-policies/ -x echoRepository: openshift/machine-config-operator
Length of output: 147
🏁 Script executed:
# Check other path patterns used in sync.go to understand the naming convention
rg -n '"\w+/\w+\.yaml"' pkg/operator/sync.go | head -20Repository: openshift/machine-config-operator
Length of output: 138
🏁 Script executed:
# Check what's actually in the manifests directory
ls -la manifests/common/network-policies/ 2>/dev/null || echo "Directory does not exist in manifests/"Repository: openshift/machine-config-operator
Length of output: 115
🏁 Script executed:
# Check the full directory structure of manifests/
fd -t d "" manifests/ | head -20Repository: openshift/machine-config-operator
Length of output: 283
🏁 Script executed:
# Check if templates are part of the build process
rg -n "templates" go.mod pkg/ Makefile build/ 2>/dev/null | grep -i "copy\|build\|embed" | head -10Repository: openshift/machine-config-operator
Length of output: 522
🏁 Script executed:
# Check if there are any other embed directives that include templates/common
rg -n "go:embed.*common" . 2>/dev/nullRepository: openshift/machine-config-operator
Length of output: 59
🏁 Script executed:
# List all files in the manifests directory to see what's available
fd -t f "" manifests/ | sortRepository: openshift/machine-config-operator
Length of output: 4338
🏁 Script executed:
# Check git history or comments around this code to see if this is new
rg -n "network-policies" pkg/operator/sync.go -B 5 -A 5Repository: openshift/machine-config-operator
Length of output: 790
The manifest path will fail at runtime because the file is not embedded in the manifests directory.
The path "common/network-policies/machine-config-controller-allow.yaml" references a file that exists in templates/common/network-policies/ but is not included in the embedded filesystem. The manifests/ directory contains no common/network-policies/ subdirectory, and manifests.ReadFile() only reads from files embedded via the //go:embed * directive in manifests/manifests.go. This will cause the code to fail when renderAsset() attempts to load the file.
Either move the file to manifests/common/network-policies/ or update the path to match an existing manifest location (e.g., "machineconfigcontroller/machine-config-controller-allow.yaml" if placing it in manifests/machineconfigcontroller/).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/operator/sync.go` around lines 2649 - 2654, The manifest path
"common/network-policies/machine-config-controller-allow.yaml" used when calling
renderAsset(config, path) is not embedded in the manifests FS and will fail at
runtime; fix by either moving the template file into the embedded manifests tree
(e.g., add it under manifests/common/network-policies/) or change the path
string to the actual embedded location (for example
"machineconfigcontroller/machine-config-controller-allow.yaml") so that
renderAsset (and any manifests.ReadFile usage) can find the file at runtime.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
templates/common/network-policies/machine-config-controller-allow.yaml (1)
28-43: Tighten the egress exceptions with pod selectors.These rules currently allow TCP/6443 to any pod in
openshift-kube-apiserverand DNS on 53 to any pod inopenshift-dns. For a deny-by-default rollout, I'd strongly prefer adding pod selectors for the actual apiserver and DNS backends so this stays least-privilege.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@templates/common/network-policies/machine-config-controller-allow.yaml` around lines 28 - 43, The egress "to" entries for "Allow API Server Access" and "Allow DNS Lookups" are too broad — narrow them by adding podSelector blocks alongside the existing namespaceSelector so the rules target only the apiserver and DNS backend pods; update the "to: # Allow API Server Access" entry that permits TCP port 6443 to include a podSelector matching the kube-apiserver pod label(s) and update the "to: # Allow DNS Lookups" entry that permits UDP/TCP port 53 to include a podSelector matching the DNS backend pod label(s) used by openshift-dns (use the actual labels from those deployments), ensuring least-privilege egress while keeping the existing ports and namespaceSelector.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@templates/common/network-policies/machine-config-controller-allow.yaml`:
- Around line 21-26: The ingress rule labelled "Allow Kubelet Health Probes
(from Node Network)" currently uses {{.ClusterNetworkCIDR}} which is the pod
overlay and is too broad; update the rule to use the correct node/machine
network CIDR (e.g., replace {{.ClusterNetworkCIDR}} with the appropriate
variable for node/machine network such as {{.MachineNetworkCIDR}} or
{{.NodeNetworkCIDR}}) so only node IPs can access TCP port 9443, or if the
intent was to allow pod-to-controller access, change the human-facing comment to
say "from Cluster (pod) Network" to match {{.ClusterNetworkCIDR}}. Ensure the
change touches the ipBlock.cidr entry and/or the comment text associated with
port 9443.
---
Nitpick comments:
In `@templates/common/network-policies/machine-config-controller-allow.yaml`:
- Around line 28-43: The egress "to" entries for "Allow API Server Access" and
"Allow DNS Lookups" are too broad — narrow them by adding podSelector blocks
alongside the existing namespaceSelector so the rules target only the apiserver
and DNS backend pods; update the "to: # Allow API Server Access" entry that
permits TCP port 6443 to include a podSelector matching the kube-apiserver pod
label(s) and update the "to: # Allow DNS Lookups" entry that permits UDP/TCP
port 53 to include a podSelector matching the DNS backend pod label(s) used by
openshift-dns (use the actual labels from those deployments), ensuring
least-privilege egress while keeping the existing ports and namespaceSelector.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ae4392e6-beee-4bcc-adfd-5c9c6a22d83d
📒 Files selected for processing (1)
templates/common/network-policies/machine-config-controller-allow.yaml
| - from: # Allow Kubelet Health Probes (from Node Network) | ||
| - ipBlock: | ||
| cidr: {{.ClusterNetworkCIDR}} | ||
| ports: | ||
| - protocol: TCP | ||
| port: 9443 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for ClusterNetworkCIDR usage in the repository
fd -t f '\.yaml\|\.yml\|\.go\|\.md' | xargs grep -l "ClusterNetworkCIDR" | head -20Repository: openshift/machine-config-operator
Length of output: 59
🏁 Script executed:
# Also search for NodeNetworkCIDR or ServiceNetworkCIDR to understand the variable naming
rg "NetworkCIDR" --type yaml --type md -A 2 -B 2 | head -80Repository: openshift/machine-config-operator
Length of output: 585
🏁 Script executed:
# Look for the actual file and surrounding context
cat templates/common/network-policies/machine-config-controller-allow.yamlRepository: openshift/machine-config-operator
Length of output: 1237
🏁 Script executed:
# Search for documentation or comments about kubelet probes and network CIDR
rg "kubelet.*probe|health.*probe" --type yaml --type md -A 3 -B 3 | head -100Repository: openshift/machine-config-operator
Length of output: 6902
🏁 Script executed:
# Search for where ClusterNetworkCIDR is defined as a variable/parameter
rg "ClusterNetworkCIDR" -B 5 -A 5 --type go --type yaml --type mdRepository: openshift/machine-config-operator
Length of output: 8443
🏁 Script executed:
# Look for other network CIDR definitions to understand the pattern
rg "NetworkCIDR|NetworkNetwork" --type go --type yaml | head -50Repository: openshift/machine-config-operator
Length of output: 5480
🏁 Script executed:
# Search for how these template variables are populated
rg "\.ClusterNetworkCIDR|ClusterNetworkCIDR:" --type go -B 3 -A 3 | head -80Repository: openshift/machine-config-operator
Length of output: 395
🏁 Script executed:
# Check for any documentation or README files about network setup
fd -t f '\.md$|README' | xargs grep -i "ClusterNetworkCIDR\|network.*cidr" | head -30Repository: openshift/machine-config-operator
Length of output: 59
🏁 Script executed:
# Read the getClusterNetworkCIDR implementation
sed -n '/func (optr \*Operator) getClusterNetworkCIDR/,/^}/p' pkg/operator/sync.goRepository: openshift/machine-config-operator
Length of output: 517
🏁 Script executed:
# Search for any node network or machine network CIDR usage
rg "NodeNetwork|MachineNetwork|node.*cidr" --type go -A 3 -B 3 | head -100Repository: openshift/machine-config-operator
Length of output: 9425
🏁 Script executed:
# Look for kubelet configuration or probe documentation in the codebase
rg "9443|kubelet.*probe|Kubelet Health" -A 3 -B 3 --type yaml --type go | head -80Repository: openshift/machine-config-operator
Length of output: 7475
🏁 Script executed:
# Search for what uses or listens on port 9443
rg "9443|machine.*config.*controller" --type go --type yaml -B 3 -A 3 | grep -A 10 -B 10 "9443\|probe\|webhook" | head -150Repository: openshift/machine-config-operator
Length of output: 11461
🏁 Script executed:
# Look for webhook configurations or health probe definitions
rg "webhook|healthz|readiness.*probe|liveness.*probe" --type yaml -A 5 -B 5 | head -120Repository: openshift/machine-config-operator
Length of output: 15321
🏁 Script executed:
# Search the codebase for how the machine-config-controller is accessed on port 9443
fd -t f '\.go$' | xargs grep -l "9443" | head -10Repository: openshift/machine-config-operator
Length of output: 402
🏁 Script executed:
# Find machine-config-controller deployment manifests
fd -t f "\.yaml\|\.yml" | xargs grep -l "machine-config-controller" | head -20Repository: openshift/machine-config-operator
Length of output: 59
🏁 Script executed:
# Look for any deployment/pod specs for machine-config-controller
fd -t f "deployment\|pod\|statefulset" -path "*/machine-config*"Repository: openshift/machine-config-operator
Length of output: 245
🏁 Script executed:
# Search for webhook or TLS server configuration in machine-config-controller code
rg "ValidatingWebhook|MutatingWebhook|listener.*9443|Serve.*9443" --type go -A 5 -B 5Repository: openshift/machine-config-operator
Length of output: 50390
🏁 Script executed:
# Look for any references to what calls/probes this service
rg "machine.*config.*controller.*9443|9443.*machine" --type yaml --type goRepository: openshift/machine-config-operator
Length of output: 59
🏁 Script executed:
# Search in the actual pkg directory for machine-config-controller main code
find pkg -name "*.go" -type f | xargs grep -l "port.*9443\|9443.*port" 2>/dev/null | head -10Repository: openshift/machine-config-operator
Length of output: 59
🏁 Script executed:
# Search for where machine-config-controller might have a server/listen on a port
rg "ListenAndServe|server.*Start|Addr.*Port" pkg/controller --type go -B 5 -A 5 | head -80Repository: openshift/machine-config-operator
Length of output: 789
🏁 Script executed:
# Let's look at what the file names suggest about network policies
ls -la templates/common/network-policies/Repository: openshift/machine-config-operator
Length of output: 276
🏁 Script executed:
# Search for metrics or server port configuration in machine-config-controller
rg "Addr.*:80|Addr.*:8080|Addr.*:9000|Addr.*:9" pkg/controller --type go -B 3 -A 3 | head -60Repository: openshift/machine-config-operator
Length of output: 59
🏁 Script executed:
# Look for any operator code that manages network policies
rg "mco-allow\|mco.*allow\|NetworkPolicy" pkg/operator --type go -B 3 -A 3 | head -100Repository: openshift/machine-config-operator
Length of output: 59
🏁 Script executed:
# Check if port 9443 is specifically for webhooks or health checks
rg "webhook|health|readiness" pkg/controller --type go | head -30Repository: openshift/machine-config-operator
Length of output: 59
🏁 Script executed:
# Look at what the comment in the NetworkPolicy file says around the problematic rule
head -n 30 templates/common/network-policies/machine-config-controller-allow.yamlRepository: openshift/machine-config-operator
Length of output: 891
Use the correct CIDR source for the intended access.
The comment says "from Node Network" but the rule uses {{.ClusterNetworkCIDR}}, which represents the pod network overlay, not node infrastructure. This makes the ingress rule much broader than intended—allowing every pod in the cluster to access port 9443—while potentially missing the actual kubelet/node source if probes originate from outside the pod network. Either use the appropriate node/machine network CIDR, or validate and update the comment to match the pod network intent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@templates/common/network-policies/machine-config-controller-allow.yaml`
around lines 21 - 26, The ingress rule labelled "Allow Kubelet Health Probes
(from Node Network)" currently uses {{.ClusterNetworkCIDR}} which is the pod
overlay and is too broad; update the rule to use the correct node/machine
network CIDR (e.g., replace {{.ClusterNetworkCIDR}} with the appropriate
variable for node/machine network such as {{.MachineNetworkCIDR}} or
{{.NodeNetworkCIDR}}) so only node IPs can access TCP port 9443, or if the
intent was to allow pod-to-controller access, change the human-facing comment to
say "from Cluster (pod) Network" to match {{.ClusterNetworkCIDR}}. Ensure the
change touches the ipBlock.cidr entry and/or the comment text associated with
port 9443.
- What I did
- How to verify it
- Description for the changelog
Summary by CodeRabbit
Chores
New Features