diff --git a/modules/nw-metallb-collecting-data.adoc b/modules/nw-metallb-collecting-data.adoc index ddd4977bac71..694b299d6df6 100644 --- a/modules/nw-metallb-collecting-data.adoc +++ b/modules/nw-metallb-collecting-data.adoc @@ -7,21 +7,21 @@ = About collecting MetalLB data [role="_abstract"] -To collect diagnostic data for debugging or support analysis, execute the `oc adm must-gather` CLI command. This utility captures essential information regarding the cluster, the MetalLB configuration, and the MetalLB Operator state. +To collect diagnostic data for debugging or support analysis, run the `oc adm must-gather` CLI command. This utility captures essential information regarding the cluster, the MetalLB configuration, and the MetalLB Operator state. The following list details features and objects related to MetalLB and the MetalLB Operator: * The namespace and child objects where you deploy the MetalLB Operator * All MetalLB Operator custom resource definitions (CRDs) -The command collects the following information from FRRouting (FRR), which Red{nbsp} Hat uses to implement BGP and BFD: +The command collects the following information from `FRRouting` (FRR), which Red{nbsp}Hat uses to implement BGP and BFD: * `/etc/frr/frr.conf` * `/etc/frr/frr.log` * `/etc/frr/daemons` configuration file * `/etc/frr/vtysh.conf` -The command collects log and configuration files from the `frr` container that exists in each `speaker` pod. Additionally, the command collects the output from the following `vtysh` commands: +The command collects log and configuration files from the `frr` container that exists in each `frr-k8s` pod in the `openshift-frr-k8s` namespace. Additionally, the command collects the output from the following `vtysh` commands: * `show running-config` * `show bgp ipv4` diff --git a/modules/nw-metallb-configure-community-bgp-advertisement.adoc b/modules/nw-metallb-configure-community-bgp-advertisement.adoc index 92194adcc346..d867107ba98b 100644 --- a/modules/nw-metallb-configure-community-bgp-advertisement.adoc +++ b/modules/nw-metallb-configure-community-bgp-advertisement.adoc @@ -97,7 +97,7 @@ spec: aggregationLength: 32 aggregationLengthV6: 128 communities: - - NO_ADVERTISE <1> + - NO_ADVERTISE ipAddressPools: - doc-example-bgp-community peers: diff --git a/modules/nw-metallb-example-addresspool.adoc b/modules/nw-metallb-example-addresspool.adoc index ccb790d5e386..434fb73151ae 100644 --- a/modules/nw-metallb-example-addresspool.adoc +++ b/modules/nw-metallb-example-addresspool.adoc @@ -68,17 +68,17 @@ metadata: namespace: metallb-system spec: addresses: - - 10.0.100.0/28 <1> + - 10.0.100.0/28 - 2002:2:2::1-2002:2:2::100 # ... ---- -`spec.addresses`: Where `10.0.100.0/28` is the local network IP address followed by the `/28` network prefix. +`spec.addresses`: This list defines the IP ranges that MetalLB manages. This specific example is a dual-stack configuration, meaning it provides both IPv4 and IPv6 addresses. Example of assigning IP address pools to services or namespaces:: You can assign IP addresses from an `IPAddressPool` to services and namespaces that you specify. -If you assign a service or namespace to more than one IP address pool, MetalLB uses an available IP address from the higher-priority IP address pool. If no IP addresses are available from the assigned IP address pools with a high priority, MetalLB uses available IP addresses from an IP address pool with lower priority or no priority. +If you assign a service or namespace to more than one IP address pool, MetalLB uses an available IP address from the higher-priority IP address pool. If no IP addresses are available from the assigned IP address pools with a high priority, MetalLB uses available IP addresses from an IP address pool with lower priority or no priority. [NOTE] ==== @@ -98,15 +98,15 @@ spec: serviceAllocation: priority: 50 namespaces: - - namespace-a + - namespace-a - namespace-b namespaceSelectors: - matchLabels: - zone: east + zone: east serviceSelectors: - - matchExpressions: - - key: security - operator: In + - matchExpressions: + - key: security + operator: In values: - S1 # ... @@ -118,4 +118,4 @@ where: `serviceAllocation.namespaces`:: Assign one or more namespaces to the IP address pool in a list format. `serviceAllocation.namespaceSelectors`:: Assign one or more namespace labels to the IP address pool by using label selectors in a list format. `serviceAllocation.serviceSelectors`:: Assign one or more service labels to the IP address pool by using label selectors in a list format. - + diff --git a/modules/nw-metallb-levels.adoc b/modules/nw-metallb-levels.adoc index 1537b4ca2942..15e1b30b08ea 100644 --- a/modules/nw-metallb-levels.adoc +++ b/modules/nw-metallb-levels.adoc @@ -7,7 +7,7 @@ = FRRouting (FRR) log levels [role="_abstract"] -To control the verbosity of network logs for troubleshooting or monitoring, refer to the FRRouting (FRR) logging levels. +To control the verbosity of network logs for troubleshooting or monitoring, refer to the `FRRouting` (FRR) logging levels. The following values define the severity of recorded events, so that you can use them to filter output based on operational requirements: @@ -31,7 +31,7 @@ Anything that can potentially cause inconsistent `MetalLB` behaviour. Usually `M | `error` a| -Any error that is fatal to the functioning of `MetalLB`. These errors usually require administrator intervention to fix. +Any unrecoverable error in `MetalLB`. These errors usually require administrator intervention to fix. | `none` |Turn off all logging. diff --git a/modules/nw-metallb-loglevel.adoc b/modules/nw-metallb-loglevel.adoc index 05e2aa44c69e..fb66b7d38396 100644 --- a/modules/nw-metallb-loglevel.adoc +++ b/modules/nw-metallb-loglevel.adoc @@ -8,7 +8,7 @@ = Setting the MetalLB logging levels [role="_abstract"] -To manage log verbosity for the FRRouting (FRR) container, configure the `logLevel` specification. By adjusting this setting, you can reduce log volume from the default info level or increase detail for troubleshooting MetalLB configuration issues. +To manage log verbosity for the `FRRouting` (FRR) container, configure the `logLevel` specification. By adjusting this setting, you can reduce log volume from the default info level or increase detail for troubleshooting MetalLB configuration issues. Gain a deeper insight into MetalLB by setting the `logLevel` to `debug`. @@ -19,7 +19,7 @@ Gain a deeper insight into MetalLB by setting the `logLevel` to `debug`. .Procedure -. Create a file, such as `setdebugloglevel.yaml`, with content like the following example: +. Create a file, such as `setdebugloglevel.yaml`, with content such as the following example: + [source,yaml] ---- @@ -87,78 +87,42 @@ I0517 09:55:06.515686 95 request.go:665] Waited for 1.026500832s due to cli {"caller":"service_controller.go:113","controller":"ServiceReconciler","enqueueing":"openshift-kube-controller-manager-operator/metrics","epslice":"{\"metadata\":{\"name\":\"metrics-xtsxr\",\"generateName\":\"metrics-\",\"namespace\":\"openshift-kube-controller-manager-operator\",\"uid\":\"ac6766d7-8504-492c-9d1e-4ae8897990ad\",\"resourceVersion\":\"9041\",\"generation\":4,\"creationTimestamp\":\"2022-05-17T07:16:53Z\",\"labels\":{\"app\":\"kube-controller-manager-operator\",\"endpointslice.kubernetes.io/managed-by\":\"endpointslice-controller.k8s.io\",\"kubernetes.io/service-name\":\"metrics\"},\"annotations\":{\"endpoints.kubernetes.io/last-change-trigger-time\":\"2022-05-17T07:21:34Z\"},\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"name\":\"metrics\",\"uid\":\"0518eed3-6152-42be-b566-0bd00a60faf8\",\"controller\":true,\"blockOwnerDeletion\":true}],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"discovery.k8s.io/v1\",\"time\":\"2022-05-17T07:20:02Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:addressType\":{},\"f:endpoints\":{},\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:endpoints.kubernetes.io/last-change-trigger-time\":{}},\"f:generateName\":{},\"f:labels\":{\".\":{},\"f:app\":{},\"f:endpointslice.kubernetes.io/managed-by\":{},\"f:kubernetes.io/service-name\":{}},\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"0518eed3-6152-42be-b566-0bd00a60faf8\\\"}\":{}}},\"f:ports\":{}}}]},\"addressType\":\"IPv4\",\"endpoints\":[{\"addresses\":[\"10.129.0.7\"],\"conditions\":{\"ready\":true,\"serving\":true,\"terminating\":false},\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"openshift-kube-controller-manager-operator\",\"name\":\"kube-controller-manager-operator-6b98b89ddd-8d4nf\",\"uid\":\"dd5139b8-e41c-4946-a31b-1a629314e844\",\"resourceVersion\":\"9038\"},\"nodeName\":\"ci-ln-qb8t3mb-72292-7s7rh-master-0\",\"zone\":\"us-central1-a\"}],\"ports\":[{\"name\":\"https\",\"protocol\":\"TCP\",\"port\":8443}]}","level":"debug","ts":"2022-05-17T09:55:08Z"} ---- -. View the FRR logs: +. Display the names of the FRR-K8s pods: + [source,terminal] ---- -$ oc logs -n metallb-system speaker-7m4qw -c frr +$ oc get -n openshift-frr-k8s pods +---- ++ +.Example output +[source,text] +---- +NAME READY STATUS RESTARTS AGE +frr-k8s-bz2dn 7/7 Running 0 4h +frr-k8s-statuscleaner-59cf6f5d44-9wkfr 1/1 Running 0 4h +---- + +. View the FRR logs by specifying the `frr` container in one of the `frr-k8s` pods: ++ +[source,terminal] +---- +$ oc logs -n openshift-frr-k8s -c frr ---- + .Example output ---- -Started watchfrr -2022/05/17 09:55:05 ZEBRA: client 16 says hello and bids fair to announce only bgp routes vrf=0 -2022/05/17 09:55:05 ZEBRA: client 31 says hello and bids fair to announce only vnc routes vrf=0 -2022/05/17 09:55:05 ZEBRA: client 38 says hello and bids fair to announce only static routes vrf=0 -2022/05/17 09:55:05 ZEBRA: client 43 says hello and bids fair to announce only bfd routes vrf=0 -2022/05/17 09:57:25.089 BGP: Creating Default VRF, AS 64500 -2022/05/17 09:57:25.090 BGP: dup addr detect enable max_moves 5 time 180 freeze disable freeze_time 0 -2022/05/17 09:57:25.090 BGP: bgp_get: Registering BGP instance (null) to zebra -2022/05/17 09:57:25.090 BGP: Registering VRF 0 -2022/05/17 09:57:25.091 BGP: Rx Router Id update VRF 0 Id 10.131.0.1/32 -2022/05/17 09:57:25.091 BGP: RID change : vrf VRF default(0), RTR ID 10.131.0.1 -2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF br0 -2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF ens4 -2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF ens4 addr 10.0.128.4/32 -2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF ens4 addr fe80::c9d:84da:4d86:5618/64 -2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF lo -2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF ovs-system -2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF tun0 -2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF tun0 addr 10.131.0.1/23 -2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF tun0 addr fe80::40f1:d1ff:feb6:5322/64 -2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth2da49fed -2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth2da49fed addr fe80::24bd:d1ff:fec1:d88/64 -2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth2fa08c8c -2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth2fa08c8c addr fe80::6870:ff:fe96:efc8/64 -2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth41e356b7 -2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth41e356b7 addr fe80::48ff:37ff:fede:eb4b/64 -2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth1295c6e2 -2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth1295c6e2 addr fe80::b827:a2ff:feed:637/64 -2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth9733c6dc -2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth9733c6dc addr fe80::3cf4:15ff:fe11:e541/64 -2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth336680ea -2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth336680ea addr fe80::94b1:8bff:fe7e:488c/64 -2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vetha0a907b7 -2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vetha0a907b7 addr fe80::3855:a6ff:fe73:46c3/64 -2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vethf35a4398 -2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vethf35a4398 addr fe80::40ef:2fff:fe57:4c4d/64 -2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vethf831b7f4 -2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vethf831b7f4 addr fe80::f0d9:89ff:fe7c:1d32/64 -2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vxlan_sys_4789 -2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vxlan_sys_4789 addr fe80::80c1:82ff:fe4b:f078/64 -2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] Timer (start timer expire). -2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] BGP_Start (Idle->Connect), fd -1 -2022/05/17 09:57:26.094 BGP: Allocated bnc 10.0.0.1/32(0)(VRF default) peer 0x7f807f7631a0 -2022/05/17 09:57:26.094 BGP: sendmsg_zebra_rnh: sending cmd ZEBRA_NEXTHOP_REGISTER for 10.0.0.1/32 (vrf VRF default) -2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] Waiting for NHT -2022/05/17 09:57:26.094 BGP: bgp_fsm_change_status : vrf default(0), Status: Connect established_peers 0 -2022/05/17 09:57:26.094 BGP: 10.0.0.1 went from Idle to Connect -2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] TCP_connection_open_failed (Connect->Active), fd -1 -2022/05/17 09:57:26.094 BGP: bgp_fsm_change_status : vrf default(0), Status: Active established_peers 0 -2022/05/17 09:57:26.094 BGP: 10.0.0.1 went from Connect to Active -2022/05/17 09:57:26.094 ZEBRA: rnh_register msg from client bgp: hdr->length=8, type=nexthop vrf=0 -2022/05/17 09:57:26.094 ZEBRA: 0: Add RNH 10.0.0.1/32 type Nexthop -2022/05/17 09:57:26.094 ZEBRA: 0:10.0.0.1/32: Evaluate RNH, type Nexthop (force) -2022/05/17 09:57:26.094 ZEBRA: 0:10.0.0.1/32: NH has become unresolved -2022/05/17 09:57:26.094 ZEBRA: 0: Client bgp registers for RNH 10.0.0.1/32 type Nexthop -2022/05/17 09:57:26.094 BGP: VRF default(0): Rcvd NH update 10.0.0.1/32(0) - metric 0/0 #nhops 0/0 flags 0x6 -2022/05/17 09:57:26.094 BGP: NH update for 10.0.0.1/32(0)(VRF default) - flags 0x6 chgflags 0x0 - evaluate paths -2022/05/17 09:57:26.094 BGP: evaluate_paths: Updating peer (10.0.0.1(VRF default)) status with NHT -2022/05/17 09:57:30.081 ZEBRA: Event driven route-map update triggered -2022/05/17 09:57:30.081 ZEBRA: Event handler for route-map: 10.0.0.1-out -2022/05/17 09:57:30.081 ZEBRA: Event handler for route-map: 10.0.0.1-in -2022/05/17 09:57:31.104 ZEBRA: netlink_parse_info: netlink-listen (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0 -2022/05/17 09:57:31.104 ZEBRA: Neighbor Entry received is not on a VLAN or a BRIDGE, ignoring -2022/05/17 09:57:31.105 ZEBRA: netlink_parse_info: netlink-listen (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0 -2022/05/17 09:57:31.105 ZEBRA: Neighbor Entry received is not on a VLAN or a BRIDGE, ignoring +2026/03/02 09:53:09 WATCHFRR: [T83RR-8SM5G] watchfrr 8.5.3 starting: vty@0 +2026/03/02 09:53:09 WATCHFRR: [ZCJ3S-SPH5S] zebra state -> down : initial connection attempt failed +2026/03/02 09:53:09 WATCHFRR: [ZCJ3S-SPH5S] bgpd state -> down : initial connection attempt failed +2026/03/02 09:53:09 WATCHFRR: [ZCJ3S-SPH5S] staticd state -> down : initial connection attempt failed +2026/03/02 09:53:09 WATCHFRR: [ZCJ3S-SPH5S] bfdd state -> down : initial connection attempt failed +2026/03/02 09:53:09 ZEBRA: [NNACN-54BDA][EC 4043309110] Disabling MPLS support (no kernel support) +2026/03/02 09:53:09 WATCHFRR: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00 +2026/03/02 09:53:09 WATCHFRR: [QDG3Y-BY5TN] zebra state -> up : connect succeeded +2026/03/02 09:53:09 WATCHFRR: [QDG3Y-BY5TN] bgpd state -> up : connect succeeded +2026/03/02 09:53:09 WATCHFRR: [QDG3Y-BY5TN] staticd state -> up : connect succeeded +2026/03/02 09:53:09 WATCHFRR: [QDG3Y-BY5TN] bfdd state -> up : connect succeeded +2026/03/02 09:53:09 WATCHFRR: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify +2026/03/02 09:53:09 ZEBRA: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00 +2026/03/02 09:53:09 BGP: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00 ---- diff --git a/modules/nw-metallb-metrics.adoc b/modules/nw-metallb-metrics.adoc index 346350e474a2..1b52361954a2 100644 --- a/modules/nw-metallb-metrics.adoc +++ b/modules/nw-metallb-metrics.adoc @@ -67,10 +67,10 @@ To monitor network connectivity and diagnose routing states, refer to the Promet | Counts the number of BGP update messages received from each BGP peer. | `frrk8s_bgp_keepalives_sent` -| Counts the number of BGP keepalive messages sent to each BGP peer. +| Counts the number of BGP `keepalive` messages sent to each BGP peer. | `frrk8s_bgp_keepalives_received` -| Counts the number of BGP keepalive messages received from each BGP peer. +| Counts the number of BGP `keepalive` messages received from each BGP peer. | `frrk8s_bgp_route_refresh_sent` | Counts the number of BGP route refresh messages sent to each BGP peer. @@ -81,4 +81,7 @@ To monitor network connectivity and diagnose routing states, refer to the Promet | `frrk8s_bgp_total_received` | Counts the number of total BGP messages received from each BGP peer. +| `frrk8s_bgp_received_prefixes_total` +| Counts the number of load balancer IP address prefixes received from each BGP peer. + |=== diff --git a/modules/nw-metallb-troubleshoot-bfd.adoc b/modules/nw-metallb-troubleshoot-bfd.adoc index b0058fe43450..65123ba2d408 100644 --- a/modules/nw-metallb-troubleshoot-bfd.adoc +++ b/modules/nw-metallb-troubleshoot-bfd.adoc @@ -7,9 +7,9 @@ = Troubleshooting BFD issues [role="_abstract"] -To diagnose and resolve Bidirectional Forwarding Detection (BFD) issues, execute commands directly within the FRRouting (FRR) container. By accessing the container, you can verify that BFD peers are correctly configured with established BGP sessions. +To diagnose and resolve Bidirectional Forwarding Detection (BFD) issues, run commands directly within the `FRRouting` (FRR) container. By accessing the container, you can verify that BFD peers are correctly configured with established BGP sessions. -The BFD implementation that Red{nbsp}Hat supports uses FRRouting (FRR) in a container that exists in a `speaker` pod. +The BFD implementation that Red{nbsp}Hat supports uses `FRRouting` (FRR) in a container that exists in an `frr-k8s` pod in the `openshift-frr-k8s` namespace. .Prerequisites @@ -18,32 +18,30 @@ The BFD implementation that Red{nbsp}Hat supports uses FRRouting (FRR) in a cont .Procedure -. Display the names of the `speaker` pods: +. Display the names of the FRR-K8s pods by running the following command: + [source,terminal] ---- -$ oc get -n metallb-system pods -l component=speaker +$ oc get pods -n openshift-frr-k8s -l app=frr-k8s ---- + .Example output [source,text] ---- NAME READY STATUS RESTARTS AGE -speaker-66bth 4/4 Running 0 26m -speaker-gvfnf 4/4 Running 0 26m -... +frr-k8s-bz2dn 7/7 Running 0 106m ---- -. Display the BFD peers: +. Run the following command against the `frr` container in the FRR-K8s pod to display the BFD peers: + [source,terminal] ---- -$ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bfd peers brief" +$ oc exec -n openshift-frr-k8s -c frr -- vtysh -c "show bfd peers brief" ---- + .Example output ---- -Session count: 2 +Session count: 1 SessionId LocalAddress PeerAddress Status ========= ============ =========== ====== 3909139637 10.0.1.2 10.0.2.3 up @@ -52,4 +50,54 @@ SessionId LocalAddress PeerAddress Status where: + `up`:: Specifies that the `PeerAddress` column includes each BFD peer. If the output does not list a BFD peer IP address that you expected the output to include, troubleshoot BGP connectivity with the peer. If the status field indicates `down`, check for connectivity on the links and equipment between the node and the peer. -You can determine the node name for the speaker pod with a command like `oc get pods -n metallb-system speaker-66bth -o jsonpath='{.spec.nodeName}'`. +You can determine the node name for the FRR-K8s pod with a command such as `oc get pods -n openshift-frr-k8s -o jsonpath='{.spec.nodeName}'`. + +. Optional: To display detailed BFD peer information, run the following command: ++ +[source,terminal] +---- +$ oc exec -n openshift-frr-k8s -c frr -- vtysh -c "show bfd peers" +---- ++ +.Example output +---- +BFD Peers: + peer 10.0.2.3 local-address 10.0.1.2 vrf default interface br-ex + ID: 3909139637 + Remote ID: 2819913327 + Active mode + Status: up + Uptime: 1 hour(s), 12 minute(s), 30 second(s) + Diagnostics: ok + Remote diagnostics: ok + Peer Type: dynamic + RTT min/avg/max: 301/512/4191 usec + Local timers: + Detect-multiplier: 3 + Receive interval: 300ms + Transmission interval: 300ms + Echo receive interval: 50ms + Echo transmission interval: disabled + Remote timers: + Detect-multiplier: 3 + Receive interval: 300ms + Transmission interval: 300ms + Echo receive interval: disabled +---- ++ +where: ++ +`Status`:: The current state of the BFD session. A value of `up` indicates that the session is established and the link is healthy. A value of `down` indicates that the session has failed. +`Uptime` or `Downtime`:: The duration that the session has been in its current state. +`Remote ID`:: The session ID assigned by the remote peer. A value of `0` indicates that the remote peer has not responded. +`Diagnostics` and `Remote diagnostics`:: The diagnostic codes for the local and remote peers. A value of `ok` indicates no errors. +`Detect-multiplier`:: The number of missed packets before the session is declared down. With a receive interval of `300ms` and a detect-multiplier of `3`, the session is declared down after `900ms` of missed packets. + +.Verification + +To confirm that BFD is functioning correctly, verify that all of the following conditions are met: + +* The `show bfd peers brief` output lists each expected BFD peer with a `Status` of `up`. +* The `show bfd peers` detailed output shows a nonzero `Remote ID`, which confirms that the remote peer has responded. +* The `Diagnostics` and `Remote diagnostics` fields both display `ok`. +* If a session shows `down`, check the `Downtime` duration and `Remote diagnostics` field for error codes, and verify network connectivity between the node and the peer. diff --git a/modules/nw-metallb-troubleshoot-bgp.adoc b/modules/nw-metallb-troubleshoot-bgp.adoc index fbd4f626fb75..33d92a708ad2 100644 --- a/modules/nw-metallb-troubleshoot-bgp.adoc +++ b/modules/nw-metallb-troubleshoot-bgp.adoc @@ -7,7 +7,7 @@ = Troubleshooting BGP issues [role="_abstract"] -To diagnose and resolve BGP configuration issues, execute commands directly within the FRR container. By accessing the container, you can verify routing states and identify connectivity errors. +To diagnose and resolve BGP configuration issues, run commands directly within the FRR container. By accessing the container, you can verify routing states and identify connectivity errors. .Prerequisites @@ -16,25 +16,26 @@ To diagnose and resolve BGP configuration issues, execute commands directly with .Procedure -. Display the names of the `frr-k8s` pods by running the following command: +. In {product-title} 4.17 and later, MetalLB delegates BGP to a separate FRR-K8s daemon set running in the `openshift-frr-k8s` namespace. The speaker pod itself no longer contains an FRR container. Display the names of the `frr-k8s` pods by running the following command: + [source,terminal] ---- -$ oc -n metallb-system get pods -l component=frr-k8s +$ oc get pods -n openshift-frr-k8s ---- + .Example output [source,text] ---- -NAME READY STATUS RESTARTS AGE -frr-k8s-thsmw 6/6 Running 0 109m +NAME READY STATUS RESTARTS AGE +frr-k8s-bz2dn 7/7 Running 0 15m +frr-k8s-statuscleaner-59cf6f5d44-9wkfr 1/1 Running 0 15m ---- . Display the running configuration for FRR by running the following command: + [source,terminal] ---- -$ oc exec -n metallb-system frr-k8s-thsmw -c frr -- vtysh -c "show running-config" +$ oc exec -n openshift-frr-k8s -c frr -- vtysh -c "show running-config" ---- + .Example output @@ -45,132 +46,206 @@ Current configuration: ! frr version 8.5.3 frr defaults traditional -hostname some-hostname +hostname mysno-sno.demo.lab log file /etc/frr/frr.log informational log timestamp precision 3 no ip forwarding no ipv6 forwarding service integrated-vtysh-config ! -router bgp 64500 - bgp router-id 10.0.1.2 +router bgp 64501 no bgp ebgp-requires-policy no bgp default ipv4-unicast + bgp graceful-restart preserve-fw-state no bgp network import-check - neighbor 10.0.2.3 remote-as 64500 - neighbor 10.0.2.3 bfd profile doc-example-bfd-profile-full - neighbor 10.0.2.3 timers 5 15 - neighbor 10.0.2.4 remote-as 64500 - neighbor 10.0.2.4 bfd profile doc-example-bfd-profile-full - neighbor 10.0.2.4 timers 5 15 + neighbor 192.168.122.12 remote-as 64500 ! address-family ipv4 unicast - network 203.0.113.200/30 - neighbor 10.0.2.3 activate - neighbor 10.0.2.3 route-map 10.0.2.3-in in - neighbor 10.0.2.4 activate - neighbor 10.0.2.4 route-map 10.0.2.4-in in - exit-address-family - ! - address-family ipv6 unicast - network fc00:f853:ccd:e799::/124 - neighbor 10.0.2.3 activate - neighbor 10.0.2.3 route-map 10.0.2.3-in in - neighbor 10.0.2.4 activate - neighbor 10.0.2.4 route-map 10.0.2.4-in in + network 192.168.122.210/32 + neighbor 192.168.122.12 activate + neighbor 192.168.122.12 route-map 192.168.122.12-in in + neighbor 192.168.122.12 route-map 192.168.122.12-out out exit-address-family +exit +! +ip prefix-list 192.168.122.12-inpl-ipv4 seq 1 deny any +ip prefix-list 192.168.122.12-allowed-ipv4 seq 1 permit 192.168.122.210/32 +! +ipv6 prefix-list 192.168.122.12-allowed-ipv6 seq 1 deny any +ipv6 prefix-list 192.168.122.12-inpl-ipv4 seq 2 deny any +! +route-map 192.168.122.12-out permit 1 + match ip address prefix-list 192.168.122.12-allowed-ipv4 +exit ! -route-map 10.0.2.3-in deny 20 +route-map 192.168.122.12-out permit 2 + match ipv6 address prefix-list 192.168.122.12-allowed-ipv6 +exit ! -route-map 10.0.2.4-in deny 20 +route-map 192.168.122.12-in permit 3 + match ip address prefix-list 192.168.122.12-inpl-ipv4 +exit +! +route-map 192.168.122.12-in permit 4 + match ipv6 address prefix-list 192.168.122.12-inpl-ipv4 +exit ! ip nht resolve-via-default ! ipv6 nht resolve-via-default ! -line vty +end +---- ++ +where: ++ +* `router bgp 64501`:: This is the local Autonomous System Number (ASN) for your MetalLB speakers. +* `neighbor 192.168.122.12 remote-as 64500`:: This identifies the external BGP Peer. Specifies that a `neighbor remote-as ` line exists for each BGP peer custom resource that you added. +** The local ASN is 64501. +** The remote ASN is 64500. +* `network 192.168.122.210/32`:: This is a specific LoadBalancer IP from your IPAddressPool. It is being advertised as a /32 (a single host route), which is standard for MetalLB. +* `neighbor 192.168.122.12 activate`:: Enables the exchange of IPv4 routing information with that specific neighbor. +* `route-map ... in/out`:: These are "filters" or "policies." They ensure that the speaker only sends the IP addresses you have authorized and does not accidentally learn and install internal routes from your physical router. +* `ip prefix-list ... permit 192.168.122.210/32`:: This creates an allowlist. Only this specific IP is permitted to be advertised. +* `route-map 192.168.122.12-out permit 1`:: This tells the router: "If the IP matches the prefix-list above, permit it to be sent out to the neighbor." +* `ip nht resolve-via-default`:: (Next hop tracking) This is a common setting in MetalLB/FRR to ensure that the BGP next-hop can be resolved using the default route if a more specific route isn't available. ++ +If BFD is enabled for the BGP peer, the output includes additional `bfd` lines under the neighbor configuration and a `bfd` section at the end. The following example shows the relevant differences: ++ +.Example output with BFD enabled +---- +router bgp 64501 + ... + neighbor 192.168.122.12 remote-as 64500 + neighbor 192.168.122.12 bfd + neighbor 192.168.122.12 bfd profile bfd-profile + ... +exit +! +... ! bfd - profile doc-example-bfd-profile-full - transmit-interval 35 - receive-interval 35 - passive-mode - echo-mode - echo-interval 35 - minimum-ttl 10 + profile bfd-profile + minimum-ttl 1 + exit ! +exit ! end ---- -+ -where: -+ -`router bgp 64500`:: Specifies the `router bgp` that indicates the ASN for MetalLB. -`neighbor 10.0.2.3 remote-as 64500`:: Specifies that a `neighbor remote-as ` line exists for each BGP peer custom resource that you added. -`bfd profile doc-example-bfd-profile-full`:: Specifies that the BFD profile is associated with the correct BGP peer and that the BFD profile shows in the command output. -`network 203.0.113.200/30`:: Specifies that the `network ` lines match the IP address ranges that you specified in address pool custom resources . Display the BGP summary by running the following command: + [source,terminal] ---- -$ oc exec -n metallb-system frr-k8s-thsmw -c frr -- vtysh -c "show bgp summary" +$ oc exec -n openshift-frr-k8s -c frr -- vtysh -c "show bgp summary" ---- + .Example output ---- -IPv4 Unicast Summary: -BGP router identifier 10.0.1.2, local AS number 64500 vrf-id 0 -BGP table version 1 -RIB entries 1, using 192 bytes of memory -Peers 2, using 29 KiB of memory -Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt -10.0.2.3 4 64500 387 389 0 0 0 00:32:02 0 1 -10.0.2.4 4 64500 0 0 0 0 0 never Active 0 - -Total number of neighbors 2 - -IPv6 Unicast Summary: -BGP router identifier 10.0.1.2, local AS number 64500 vrf-id 0 +IPv4 Unicast Summary (VRF default): +BGP router identifier 192.168.122.12, local AS number 64501 vrf-id 0 BGP table version 1 RIB entries 1, using 192 bytes of memory -Peers 2, using 29 KiB of memory +Peers 1, using 725 KiB of memory -Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt -10.0.2.3 4 64500 387 389 0 0 0 00:32:02 NoNeg -10.0.2.4 4 64500 0 0 0 0 0 never Active 0 +Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc +192.168.122.12 4 64500 37 38 0 0 0 00:32:12 0 1 N/A -Total number of neighbors 2 +Total number of neighbors 1 ---- + where: + -`10.0.2.3`:: Specifies that the output includes a line for each BGP peer custom resource that you added. -`10.0.2.4`:: Specifies that the output shows `0` messages received and `0` messages sent, which indicates a BGP peer that does not have a BGP session. Check network connectivity and the BGP configuration of the BGP peer. +`BGP router identifier 192.168.122.12`:: This is the "ID" of your node. Usually, it's the IP of the primary interface. +`local AS number 64501`:: This is the ASN you assigned to MetalLB in your `BGPPeer` or `MetalLB` CR. +`Neighbor`:: The IP `192.168.122.12` of your external router (the "Peer"). +`AS`:: The ASN of the external router (the "Peer"), which should match the `remote-as` value in your BGP configuration. +`Up/Down`:: The session has been stable for 32 minutes. +`State/PfxRcd`:: This shows the session is established since it's a number, but you have received 0 routes from the peer. Seeing 0 prefixes received is perfectly normal for a standard MetalLB deployment. +`PfxSnt`:: This shows that you have successfully advertised 1 route the LoadBalancer IP to the peer. This confirms MetalLB is doing its job. It has taken one LoadBalancer service IP and successfully told the external router: "If you want to reach this IP, send the traffic to me." ++ +In the output, the `State` is a number `0`, which means the connection is successful. If the connection were broken, you would see text such as `Active`, `Connect`, or `Idle` here. . Display the BGP peers that received an address pool by running the following command: + [source,terminal] ---- -$ oc exec -n metallb-system frr-k8s-thsmw -c frr -- vtysh -c "show bgp ipv4 unicast 203.0.113.200/30" +$ oc exec -n openshift-frr-k8s -c frr -- vtysh -c "show bgp ipv4 unicast" ---- + Replace `ipv4` with `ipv6` to display the BGP peers that received an IPv6 address pool. -Replace `203.0.113.200/30` with an IPv4 or IPv6 IP address range from an address pool. + .Example output ---- -BGP routing table entry for 203.0.113.200/30 -Paths: (1 available, best #1, table default) - Advertised to non peer-group peers: - 10.0.2.3 - Local - 0.0.0.0 from 0.0.0.0 (10.0.1.2) - Origin IGP, metric 0, weight 32768, valid, sourced, local, best (First path received) - Last update: Mon Jan 10 19:49:07 2022 +BGP table version is 1, local router ID is 192.168.122.12, vrf id 0 +Default local pref 100, local AS 64501 +Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, + i internal, r RIB-failure, S Stale, R Removed +Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self +Origin codes: i - IGP, e - EGP, ? - incomplete +RPKI validation codes: V valid, I invalid, N Not found + + Network Next Hop Metric LocPrf Weight Path + *> 192.168.122.210/32 + 0.0.0.0 0 32768 i + +Displayed 1 routes and 1 total paths ---- + where: + -`10.0.2.3`:: Specifies that the output includes an IP address for a BGP peer. +`BGP table version is 1`:: This increments every time the routing table changes. A low number usually indicates a stable or newly started session. +`local router ID 192.168.122.12`:: The unique identifier for this specific OpenShift node in the BGP topology. +`local AS 64501`:: The Private Autonomous System Number (ASN) assigned to your MetalLB deployment. ++ +The bottom section is the actual "map" being sent to your upstream router: ++ +.BGP Route Entry Field Descriptions +[cols="1,2,3", options="header"] +|=== +|Field/Symbol |Value in Output |Description and Troubleshooting Impact + +|Status Codes +|*> +a|* `*` (Valid): The route is technically correct. +* `>` (Best): The route selected for advertisement. MetalLB only sends "Best" paths to peers. If `>` is missing, the peer will not receive the route. + +|Network +|192.168.122.210/32 +|The specific IP assigned to the LoadBalancer service. The /32 mask indicates a host route, ensuring traffic for this specific IP is attracted to this node. + +|Next Hop +|0.0.0.0 +|Indicates the route is local to the node. In MetalLB, 0.0.0.0 means the current node is the egress point for this service. + +|Metric +|0 +|The Multi-Exit Discriminator (MED). Used to suggest a preferred path to external neighbors. Default is 0. + +|LocPrf +|100 +|Local Preference. Used to prefer an exit point for the entire AS. Default is 100. + +|Weight +|32768 +|An internal FRR priority value. Locally injected routes default to 32768, ensuring the node prefers its own local service path over learned routes. + +|Path +|`i` +|The Origin code. `i` (IGP) signifies the route was originated internally through the MetalLB speaker injecting the `IPAddressPool` into the FRR stack. +|=== ++ +If the table is empty, check if the Service has an IP assigned using the `oc get svc` command. Verify that a `BGPAdvertisement` exists and that its `nodeSelector` or `labelSelector` matches the service and the node you are running the command on. +If the Next Hop is not `0.0.0.0`, the node might be trying to forward the traffic elsewhere before it even reaches the pods, which can indicate a complex BGP peering issue or an issue with the node's underlying routing table. + +.Verification + +To confirm that BGP is functioning correctly, verify that all of the following conditions are met: + +* The `show running-config` output contains a `router bgp` section with the correct local ASN and at least one `neighbor` entry with the expected peer IP and remote ASN. +* The `show bgp summary` output shows the BGP session as `Established`. The `State/PfxRcd` column displays a number, such as `0`, rather than a state name such as `Active`, `Connect`, or `Idle`. +* The `PfxSnt` column in the BGP summary shows at least `1`, which confirms that MetalLB is advertising a LoadBalancer IP to the peer. +* The `show bgp ipv4 unicast` output contains at least one route with the `*>` status code, which indicates that the route is valid and selected as the best path for advertisement. +* If BFD is configured, the `show running-config` output includes `neighbor bfd` lines and a `bfd` profile section. diff --git a/modules/nw-metallb-ts-fundamentals.adoc b/modules/nw-metallb-ts-fundamentals.adoc new file mode 100644 index 000000000000..5515b72f14f1 --- /dev/null +++ b/modules/nw-metallb-ts-fundamentals.adoc @@ -0,0 +1,21 @@ +// Module included in the following assemblies: +// +// * networking/metallb/metallb-troubleshoot-support.adoc + +:_mod-docs-content-type: CONCEPT +[id="nw-metallb-fundamental-concepts_{context}"] += Troubleshooting fundamental concepts and responsibility + +[role="_abstract"] +To effectively troubleshoot MetalLB, it is essential to understand the responsibilities of each component in the system. This knowledge allows you to quickly identify the source of issues and apply targeted solutions. + +Before troubleshooting, identify which component is responsible for the failure: + +* **The Controller**: Responsible for IP address allocation. If a Service of `type: LoadBalancer` is stuck in `Pending`, check the controller. +* **The Speaker**: Responsible for "attracting" traffic by announcing the assigned IP via Layer 2 (ARP/NDP) or BGP. If the Service has an IP but is unreachable, check the speakers. +* **The CNI (OVN-Kubernetes)**: Responsible for packet delivery once it reaches the node. + +[IMPORTANT] +==== +Being able to reach the `LoadBalancerIP` from a node *inside* the cluster proves the CNI is working; it does **not** prove MetalLB is fully operational. MetalLB's job is to make the IP reachable from *outside* the cluster. +==== diff --git a/modules/nw-metallb-ts-ovn-k-gateway-modes.adoc b/modules/nw-metallb-ts-ovn-k-gateway-modes.adoc new file mode 100644 index 000000000000..f41fba6211a4 --- /dev/null +++ b/modules/nw-metallb-ts-ovn-k-gateway-modes.adoc @@ -0,0 +1,34 @@ +// Module included in the following assemblies: +// +// * networking/metallb/metallb-troubleshoot-support.adoc + +:_mod-docs-content-type: CONCEPT +[id="nw-metallb-onv-k-gateway-modes_{context}"] += OVN-Kubernetes gateway modes and when to switch + +[role="_abstract"] +MetalLB's compatibility with OVN-Kubernetes depends on the gateway mode configuration. This section explains the differences between shared and local Gateway modes and provides guidance on when to switch. + +A common troubleshooting point is the gateway mode. The requirement depends on your node configuration: + +* **Single Interface**: MetalLB functions correctly in the default **Shared Gateway mode**. +* **Multiple Interfaces (Common in telco environment)**: MetalLB requires **Local Gateway mode** (`routingViaHost: true`). In Shared Gateway mode with multiple interfaces, traffic steering conflicts often occur. + +To check your current configuration: +[source,bash] +---- +$ oc get network.operator cluster -o jsonpath='{.spec.defaultNetwork.ovnKubernetesConfig.gatewayConfig.routingViaHost}' +---- + +If your environment uses multiple interfaces and returns `false`, patch to enable local gateway mode: +[source,bash] +---- +$ oc patch network.operator cluster --type=merge -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":{"routingViaHost": true}}}}}' +---- + +In complex routing scenarios which are common in Telco with BGP configuration, global IP forwarding might be required: +[source,bash] +---- +$ oc patch network.operator cluster -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":{"ipForwarding": "Global"}}}}}' --type=merge +---- + diff --git a/modules/nw-metallb-verifying-bgp-session.adoc b/modules/nw-metallb-verifying-bgp-session.adoc index 8b7ccc0044e1..e261a13211b3 100644 --- a/modules/nw-metallb-verifying-bgp-session.adoc +++ b/modules/nw-metallb-verifying-bgp-session.adoc @@ -3,11 +3,11 @@ = Verifying BGP session state [role="_abstract"] -Once you configure MetalLB for border gateway protocol (BGP) mode, you can verify that the system has established BGP sessions and is advertising routes. You can examine the `BGPSessionStatus` custom resource (CR) and the `FRRNodeState` CR to troubleshoot BGP connectivity and confirm proper route advertisement. +Once you configure MetalLB for border gateway protocol (BGP) mode, you can verify that the system has established BGP sessions and is advertising routes. You can examine the `BGPSessionState` custom resource (CR) and the `FRRNodeState` CR to troubleshoot BGP connectivity and confirm proper route advertisement. [NOTE] ==== -The `BGPSessionStatus` CR is updated at a regular poll interval of 2 minutes. It can take up to 2 minutes for the CR to reflect the actual BGP state. +The `BGPSessionState` CR is updated at a regular poll interval of 2 minutes. It can take up to 2 minutes for the CR to reflect the actual BGP state. ==== .Prerequisites @@ -25,18 +25,18 @@ The `BGPSessionStatus` CR is updated at a regular poll interval of 2 minutes. It + [source,terminal] ---- -$ oc get bgpsessionstatus -o wide +$ oc get bgpsessionstates.frrk8s.metallb.io -A -o wide ---- + -This example output shows the BGP session status between each node and its configured BGP peers. Look for the `State` column to confirm that the session is `Established`. +This example output shows the BGP session status between each node and its configured BGP peers. Look for the `BGP` column to confirm that the session is `Established`. + [source,terminal] ---- -NAMESPACE NAME NODE PEER VRF BGP BFD -frr-k8s-system worker-2gjfq worker 10.89.0.64 Active N/A -frr-k8s-system worker-9gtnb worker 10.89.0.63 Established N/A -frr-k8s-system worker2-rknga worker2 10.89.0.66 Established Up -frr-k8s-system worker2-t7bfc worker2 172.30.0.2 Established Down +NAMESPACE NAME NODE PEER VRF BGP BFD +openshift-frr-k8s worker-2gjfq worker 10.89.0.64 Active N/A +openshift-frr-k8s worker-9gtnb worker 10.89.0.63 Established N/A +openshift-frr-k8s worker2-rknga worker2 10.89.0.66 Established Up +openshift-frr-k8s worker2-t7bfc worker2 172.30.0.2 Established Down ---- . Check `FRRNodeState` to see the BGP configuration on each node by running the following command: diff --git a/networking/ingress_load_balancing/metallb/metallb-troubleshoot-support.adoc b/networking/ingress_load_balancing/metallb/metallb-troubleshoot-support.adoc index eac7babe66db..7f94f497f6c5 100644 --- a/networking/ingress_load_balancing/metallb/metallb-troubleshoot-support.adoc +++ b/networking/ingress_load_balancing/metallb/metallb-troubleshoot-support.adoc @@ -15,6 +15,10 @@ include::modules/nw-metallb-loglevel.adoc[leveloffset=+1] // Log level descriptions include::modules/nw-metallb-levels.adoc[leveloffset=+2] +include::modules/nw-metallb-ts-fundamentals.adoc[leveloffset=+1] + +include::modules/nw-metallb-ts-ovn-k-gateway-modes.adoc[leveloffset=+1] + // Troubleshooting BGP issues include::modules/nw-metallb-troubleshoot-bgp.adoc[leveloffset=+1]