-
Notifications
You must be signed in to change notification settings - Fork 112
must-gather: Add IPsec tunnel analyzer #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Assisted-by: Claude <[email protected]> Signed-off-by: Periyasamy Palanisamy <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pperiyasamy The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughThe changes introduce IPsec analysis capabilities to the Must-Gather Analyzer plugin. A new Python analysis script processes must-gather data to evaluate IPsec configuration and tunnel status. Documentation is added across multiple markdown files describing the IPsec analysis command, usage examples, and integration with existing analysis capabilities. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches✅ Passed checks (7 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
plugins/must-gather/skills/must-gather-analyzer/scripts/analyze_ipsec.py (1)
36-43: Narrow overly broadExceptionhandlers in parsing helpers
parse_yaml_file,parse_ipsec_connections, andcheck_connection_statusall catch bareException, which can mask programming errors and makes troubleshooting harder. The failure modes here are primarily file I/O and parsing issues.Consider tightening these to the expected exception types instead of a blanket
Exception, e.g.:-import yaml +import yaml +from yaml import YAMLError @@ - try: - with open(file_path, 'r') as f: - return yaml.safe_load(f) - except Exception as e: + try: + with open(file_path, "r") as f: + return yaml.safe_load(f) + except (OSError, YAMLError) as e: print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr) return None @@ - try: - with open(openshift_conf_path, 'r', errors='ignore') as f: + try: + with open(openshift_conf_path, "r", errors="ignore") as f: @@ - except Exception as e: + except OSError as e: print(f"Warning: Failed to parse {openshift_conf_path}: {e}", file=sys.stderr) @@ - try: - with open(status_log_path, 'r', errors='ignore') as f: + try: + with open(status_log_path, "r", errors="ignore") as f: @@ - except Exception as e: + except OSError as e: print(f"Warning: Failed to read {status_log_path}: {e}", file=sys.stderr)This keeps the robustness but avoids swallowing unexpected bugs.
Also applies to: 166-183, 187-224
plugins/must-gather/commands/ipsec.md (1)
10-12: Add languages to fenced code blocks for better tooling supportSeveral fenced code blocks here omit a language identifier (markdownlint MD040). Adding explicit languages improves rendering and editor support, e.g.:
-``` +/must-gather:ipsec [must-gather-path] [node-name] -``` +```text @@ -``` +```text network_logs/ └── ipsec/ @@ -``` +``` @@ -``` +```bash /must-gather:ipsec ./must-gather.local.123456789 @@ -``` +```bash /must-gather:ipsec ./must-gather/... worker-1 @@ -``` +```text No IPsec tunnel data found in network_logs/ipsec/ @@ -``` +```text ⚠ Error: openshift.conf not foundAdjust languages (
bash,text, etc.) as appropriate for each block. This keeps markdownlint happy and helps downstream consumers.Also applies to: 30-37, 87-88, 133-172, 176-199
plugins/must-gather/commands/analyze.md (1)
155-207: Specify a language for theOutput Structurecode blockmarkdownlint flags the example output block (including
IPSEC STATUS:) for missing a language identifier (MD040). Adding something liketextimproves tooling support without changing semantics:-``` +```text ================================================================================ MUST-GATHER ANALYSIS SUMMARY @@ Logs to Review: - [Specific log files to examine based on findings] -``` +```This is purely a formatting/tooling improvement; behavior is unaffected.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
plugins/must-gather/README.md(2 hunks)plugins/must-gather/commands/analyze.md(6 hunks)plugins/must-gather/commands/ipsec.md(1 hunks)plugins/must-gather/skills/must-gather-analyzer/scripts/analyze_ipsec.py(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
plugins/must-gather/commands/ipsec.md
10-10: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
30-30: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
42-42: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
87-87: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
133-133: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
139-139: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
145-145: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
151-151: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
157-157: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
163-163: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
169-169: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
184-184: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
190-190: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
196-196: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
plugins/must-gather/commands/analyze.md
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.4)
plugins/must-gather/skills/must-gather-analyzer/scripts/analyze_ipsec.py
41-41: Do not catch blind exception: Exception
(BLE001)
181-181: Do not catch blind exception: Exception
(BLE001)
219-219: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (3)
plugins/must-gather/README.md (2)
254-303: IPsec analysis script documentation aligns with implementationThe
analyze_ipsec.pysection correctly describes the script’s behavior and CLI:
- Usage matches
main()(./analyze_ipsec.py <must-gather-path> [node-name]).- Reported outputs (configuration status, ovn-ipsec-host pods, tunnel counts and statuses) correspond to
print_ipsec_analysis().This block looks consistent with the Python implementation.
350-377:/must-gather:ipseccommand description is consistent with the analyzerThe new slash-command section correctly mirrors what
analyze_ipsec.pydoes:
- Arguments and examples line up with the
must_gather_path+ optional node filter.- “What it provides” matches the script’s sections (config, daemonset pods, tunnel status, per-connection details).
No functional discrepancies spotted here.
plugins/must-gather/commands/analyze.md (1)
25-25: IPsec integration inmust-gather:analyzematches the new analyzerThe updates here integrate IPsec cleanly into the existing analysis flow:
- Bullet list correctly mentions “IPsec configuration and tunnel status”.
analyze_ipsec.pyis listed alongside other scripts in the expected path.- Keyword routing (“ipsec”, “ipsec tunnels”, etc.) maps to
analyze_ipsec.pyonly, which matches the intended behavior.- Full-analysis execution order now includes IPsec between network and events, aligned with the PR description.
- The “IPSEC STATUS” section and example usage (
analyze ipsec tunnels) match the Python script’s output and CLI.This doc block is consistent with the implementation.
Also applies to: 59-64, 109-116, 119-130, 177-179, 235-240
| The must-gather should contain: | ||
| ``` | ||
| network_logs/ | ||
| └── ipsec/ | ||
| └── <pod>_ipsec.d/ | ||
| ├── openshift.conf # IPsec connection configurations | ||
| └── status/ | ||
| └── <pod>.log # Connection status logs | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align documented IPsec log directory layout with implementation
This doc repeatedly states that status logs live under each pod’s directory, e.g.:
network_logs/ipsec/<pod>_ipsec.d/status/<pod>.log(Prerequisites, Implementation step 4, Notes).
In the Python implementation (analyze_ipsec_tunnels() in analyze_ipsec.py), the status directory is currently resolved as network_logs/ipsec/status (sibling of <pod>_ipsec.d) rather than inside <pod>_ipsec.d, and then status/<pod>.log is read from there.
Right now, docs and code disagree on where <pod>.log is expected. Whichever layout is actually produced by must-gather should be the single source of truth; both the documentation here and the script should be updated to match it. The script-side fix can also support both layouts via a small fallback (see comment on analyze_ipsec_tunnels in analyze_ipsec.py).
I’d recommend:
- Confirming the real must-gather directory structure for IPsec logs.
- Updating both this file and
analyze_ipsec.pyso the described path and the path used in code are consistent.
Also applies to: 68-77, 204-211
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
30-30: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
| tunnel_analysis = { | ||
| 'pods': [], | ||
| 'total_connections': 0, | ||
| 'established_connections': 0, | ||
| 'failed_connections': 0 | ||
| } | ||
|
|
||
| # Build a mapping of pod name to node name from the pod analysis | ||
| # This requires re-reading pod info, so we'll do a lightweight version | ||
| pod_to_node = {} | ||
| patterns = [ | ||
| "namespaces/openshift-ovn-kubernetes/pods/ovn-ipsec-host-*/*.yaml", | ||
| "*/namespaces/openshift-ovn-kubernetes/pods/ovn-ipsec-host-*/*.yaml", | ||
| ] | ||
|
|
||
| seen_pods = set() | ||
| for pattern in patterns: | ||
| for pod_file in must_gather_path.glob(pattern): | ||
| if pod_file.name == 'pods.yaml': | ||
| continue | ||
|
|
||
| pod = parse_yaml_file(pod_file) | ||
| if pod: | ||
| name = pod.get('metadata', {}).get('name', '') | ||
| if name.startswith('ovn-ipsec-host-') and name not in seen_pods: | ||
| seen_pods.add(name) | ||
| node = pod.get('spec', {}).get('nodeName', 'Unknown') | ||
| pod_to_node[name] = node | ||
|
|
||
| # Look for ipsec directories | ||
| ipsec_patterns = [ | ||
| "network_logs/ipsec/*_ipsec.d", | ||
| "*/network_logs/ipsec/*_ipsec.d" | ||
| ] | ||
|
|
||
| for pattern in ipsec_patterns: | ||
| for ipsec_dir in must_gather_path.glob(pattern): | ||
| if not ipsec_dir.is_dir(): | ||
| continue | ||
|
|
||
| # Extract pod name from directory (e.g., "ovn-ipsec-host-abc_ipsec.d" -> "ovn-ipsec-host-abc") | ||
| pod_name = ipsec_dir.name.replace('_ipsec.d', '') | ||
|
|
||
| # Apply node filter if specified | ||
| node_name = pod_to_node.get(pod_name, 'Unknown') | ||
| if filter_node and filter_node.lower() != 'all' and node_name != filter_node: | ||
| continue | ||
|
|
||
| openshift_conf = ipsec_dir / 'openshift.conf' | ||
| # Status directory is at network_logs/ipsec/status, not inside the pod's ipsec.d directory | ||
| status_dir = ipsec_dir.parent / 'status' | ||
|
|
||
| pod_info = { | ||
| 'pod': pod_name, | ||
| 'node': node_name, | ||
| 'config_found': openshift_conf.exists(), | ||
| 'status_dir_found': status_dir.exists(), | ||
| 'connections': [] | ||
| } | ||
|
|
||
| if not openshift_conf.exists(): | ||
| pod_info['error'] = 'openshift.conf not found' | ||
| tunnel_analysis['pods'].append(pod_info) | ||
| continue | ||
|
|
||
| # Parse connections from openshift.conf | ||
| connections = parse_ipsec_connections(openshift_conf) | ||
| tunnel_analysis['total_connections'] += len(connections) | ||
|
|
||
| # Status log file follows naming convention: status/<pod_name>.log | ||
| status_log = status_dir / f"{pod_name}.log" if status_dir.exists() else None | ||
| print ("status log directory") | ||
| print (pod_name) | ||
| print (status_log) | ||
| # Check status for each connection | ||
| for conn_name in connections: | ||
| if status_log: | ||
|
|
||
| conn_status = check_connection_status(status_log, conn_name) | ||
| established = conn_status['established'] | ||
| info = conn_status['info'] | ||
| else: | ||
| established = False | ||
| info = 'Status log not found' | ||
|
|
||
| pod_info['connections'].append({ | ||
| 'name': conn_name, | ||
| 'established': established, | ||
| 'info': info | ||
| }) | ||
|
|
||
| if established: | ||
| tunnel_analysis['established_connections'] += 1 | ||
| else: | ||
| tunnel_analysis['failed_connections'] += 1 | ||
|
|
||
| if not status_dir.exists(): | ||
| pod_info['status_warning'] = 'status directory not found' | ||
|
|
||
| tunnel_analysis['pods'].append(pod_info) | ||
|
|
||
| return tunnel_analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status log directory path and debug prints will break tunnel detection
In analyze_ipsec_tunnels, the status log directory handling and extra prints have two problems:
- Path mismatch vs documented layout
The docstring here and plugins/must-gather/commands/ipsec.md both describe the status logs as:
network_logs/ipsec/<pod>_ipsec.d/status/<pod>.log
But the implementation uses:
# Status directory is at network_logs/ipsec/status, not inside the pod's ipsec.d directory
status_dir = ipsec_dir.parent / 'status'This will miss logs stored under <pod>_ipsec.d/status/ and cause all connections to report "Status log not found" even when logs exist.
To support both the per‑pod and shared status layouts safely, you can prefer the per‑pod status/ under <pod>_ipsec.d and fall back to the shared directory if needed:
- # Status directory is at network_logs/ipsec/status, not inside the pod's ipsec.d directory
- status_dir = ipsec_dir.parent / 'status'
+ # Status directory is typically inside the pod's ipsec.d directory:
+ # network_logs/ipsec/<pod>_ipsec.d/status/<pod>.log
+ # Fall back to a shared status directory (network_logs/ipsec/status)
+ # for compatibility with alternative layouts.
+ pod_status_dir = ipsec_dir / "status"
+ shared_status_dir = ipsec_dir.parent / "status"
+ status_dir = pod_status_dir if pod_status_dir.exists() else shared_status_dir
@@
- status_log = status_dir / f"{pod_name}.log" if status_dir.exists() else None
- print ("status log directory")
- print (pod_name)
- print (status_log)
+ status_log = status_dir / f"{pod_name}.log" if status_dir.exists() else None- Leftover debug prints
The print("status log directory"), print(pod_name), and print(status_log) lines will pollute the user-facing report. Removing them (as in the diff above) keeps output aligned with the documented format.
As-is, this combination is likely to misreport tunnel state even when valid logs are present.
🤖 Prompt for AI Agents
In plugins/must-gather/skills/must-gather-analyzer/scripts/analyze_ipsec.py
around lines 251 to 352, the code incorrectly assumes the status directory is
only at network_logs/ipsec/status and also leaves debug print statements; update
the logic to first look for a per-pod status directory at
<pod>_ipsec.d/status/<pod>.log (i.e., status_dir = ipsec_dir / 'status' and
status_log = status_dir / f"{pod_name}.log" if exists), and if not present fall
back to the shared parent status directory (ipsec_dir.parent / 'status') and its
<pod>.log; remove the three print(...) lines; ensure status_dir_found reflects
the actual directory checked and that status_log is None only when neither
location contains the log so connection status falls back to "Status log not
found".
| issues = [] | ||
|
|
||
| if ipsec_config and ipsec_config.get('enabled'): | ||
| if not ipsec_pods: | ||
| if filter_node and filter_node.lower() != 'all': | ||
| issues.append(f"No ovn-ipsec-host pods found for node {filter_node}") | ||
| else: | ||
| issues.append("IPsec is enabled but no ovn-ipsec-host pods found") | ||
|
|
||
| if any(p['status'] != 'Running' for p in ipsec_pods): | ||
| issues.append("Some ovn-ipsec-host pods are not in Running state") | ||
|
|
||
| if tunnel_analysis['failed_connections'] > 0: | ||
| issues.append(f"{tunnel_analysis['failed_connections']} IPsec connections not established") | ||
|
|
||
| if not tunnel_analysis['pods'] and not filter_node: | ||
| issues.append("No IPsec tunnel data found in network_logs/ipsec/") | ||
|
|
||
| if issues: | ||
| print("⚠ Issues Detected:") | ||
| for idx, issue in enumerate(issues, 1): | ||
| print(f"{idx}. {issue}") | ||
| else: | ||
| if ipsec_config and ipsec_config.get('enabled'): | ||
| if tunnel_analysis['total_connections'] > 0: | ||
| print(f"✓ All {tunnel_analysis['established_connections']} IPsec connections are established") | ||
| else: | ||
| print("✓ IPsec is enabled (no tunnel data available for verification)") | ||
| else: | ||
| print("IPsec is not enabled on this cluster") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid treating External IPsec mode as a failure in the summary
get_ipsec_config() sets enabled to True for any mode other than "Disabled" (i.e., both "Full" and "External"). analyze_ipsec() only analyzes pods and tunnels when mode == "Full", which is correct because ovn-ipsec-host is not used in External mode.
However, the summary logic here uses ipsec_config.get('enabled') and therefore:
- Treats External mode as “enabled”.
- Then flags “IPsec is enabled but no ovn-ipsec-host pods found” and “No IPsec tunnel data found…” even though both are expected in External mode.
This will generate false alarms on clusters using External IPsec.
Consider keying the summary off mode instead, e.g.:
- issues = []
-
- if ipsec_config and ipsec_config.get('enabled'):
+ issues = []
+ mode = ipsec_config.get("mode") if ipsec_config else None
+
+ if mode == "Full":
@@
- if issues:
+ if issues:
print("⚠ Issues Detected:")
@@
- else:
- if ipsec_config and ipsec_config.get('enabled'):
- if tunnel_analysis['total_connections'] > 0:
- print(f"✓ All {tunnel_analysis['established_connections']} IPsec connections are established")
- else:
- print("✓ IPsec is enabled (no tunnel data available for verification)")
- else:
- print("IPsec is not enabled on this cluster")
+ else:
+ if mode == "Full":
+ if tunnel_analysis["total_connections"] > 0:
+ print(f"✓ All {tunnel_analysis['established_connections']} IPsec connections are established")
+ else:
+ print("✓ IPsec is enabled in Full mode (no tunnel data available for verification)")
+ elif mode == "External":
+ print("IPsec is enabled in External mode (ovn-ipsec-host pods and tunnel data are not expected)")
+ else:
+ print("IPsec is not enabled on this cluster")This keeps the issue list meaningful for Full mode while avoiding spurious warnings for External mode.
🤖 Prompt for AI Agents
In plugins/must-gather/skills/must-gather-analyzer/scripts/analyze_ipsec.py
around lines 488 to 518, the summary currently keys off
ipsec_config.get('enabled') which treats External mode as enabled and produces
false positives; change the conditional logic to check ipsec_config.get('mode')
== 'Full' when deciding whether to validate ovn-ipsec-host pods, tunnel data,
and when emitting the summary about missing pods/tunnels or established
connections so that only Full mode triggers those checks and External mode does
not produce those warnings.
|
lgtm pending the fix for coderabbit suggestions and linting. |
Introduces
/must-gather:ipseccommand to analyze IPsec configuration and tunnel establishment status in OpenShift clusters. The analyzer:status logs for ESTABLISHED_CHILD_SA markers
The analysis script examines network_logs/ipsec/ data to report on all IPsec connections across the cluster mesh.
Assisted-by: Claude [email protected]
Example output 1:
Example output 2:
Example output 3:
Summary by CodeRabbit
New Features
Documentation