Skip to content

Add multi distro support#15

Open
manoj-freyr wants to merge 7 commits intomainfrom
multiOs-support
Open

Add multi distro support#15
manoj-freyr wants to merge 7 commits intomainfrom
multiOs-support

Conversation

@manoj-freyr
Copy link
Copy Markdown

@manoj-freyr manoj-freyr commented Nov 12, 2025

Motivation

Adds support to be OS agnostic and work for all linux distros

Technical Details

After the changes tried running install rvs commands:
pytest -vvv --log-file=../logs/rvs_cvs_test_mi325__install_rvs.log -s tests/health/install/install_rvs.py --cluster_file ./input/cluster_file/manojsk_cluster.json --config_file ./input/config_file/health/mi300_health_config.json

{FD45B03F-FEDD-45EA-94E2-5C00CED99736}

Used the same in RHEL, it works:

{3F9ABA1F-C713-4F26-9491-E22B2563EF46}

Copy link
Copy Markdown
Contributor

@solaiys solaiys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Copy Markdown
Contributor

@cijohnson cijohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

sys.path.insert( 0, './lib' ) use in mulitple files, are they required with new cvs pkg?

Comment thread cvs/tests/health/install/install_rvs.py Outdated
Comment thread cvs/tests/health/rocblas_cvs.py Outdated
lfrischm and others added 7 commits December 29, 2025 15:09
- Added detect_distro() function to identify Linux distribution
- Added package name translation for RHEL/SUSE equivalents
- Added multi-distro package management functions (install_package, update_package_cache, map_packages)
- Added Docker installation support for RHEL/CentOS and SUSE
- Updated test files with proper cvs.lib imports for multi-distro functions

Supports: Debian/Ubuntu (apt-get), RHEL/CentOS/Rocky/Alma (dnf), SUSE (zypper)
… commands

- Replace hardcoded 'apt update' and 'apt-get install' with detect_distro() and install_package()
- Add proper package name translation using map_packages()
- Update all affected test files:
  - tests/health/install/install_babelstream.py
  - tests/health/install/install_rocblas.py
  - tests/health/install/install_rvs.py
  - tests/health/rocblas_cvs.py
  - tests/ibperf/install_ibperf_tools.py
@manoj-freyr manoj-freyr requested a review from solaiys January 16, 2026 09:37
atnair-amd added a commit that referenced this pull request Apr 17, 2026
Consolidated fix for four small, narrow bugs in cvs/lib/linux_utils.py.
Each fix has its own regression test in cvs/lib/unittests/test_linux_utils.py
following the existing MagicMock pattern; all four new tests fail against
the pre-fix code and pass here.

1. get_rdma_nic_dict missing match-guard (real impact on banff today):
   match.group(1) was called without checking that the strict inner
   pattern matched. DOWN rdma links omit the `netdev <name>` clause, so
   the first DOWN device raised AttributeError and aborted the parse for
   all nodes. The sibling get_active_rdma_nic_dict already guards with
   `if match:`; add the same guard here. Caller check_cluster_health.py:73
   wraps the call in try/except, so the health-report HTML was silently
   missing RDMA NIC data on banff.

   Live observation on banff-cyxtera-s70-2:
     sudo rdma link
     link mlx5_0/1 ... state DOWN physical_state DISABLED
     ...
     link mlx5_8/1 state ACTIVE physical_state LINK_UP netdev ens14np0

     ### get_rdma_nic_dict on banff ###
     AttributeError: 'NoneType' object has no attribute 'group'

2. get_ip_addr_dict int_nam leaks across nodes (latent, multi-node):
   int_nam was initialized once outside the per-node loop, so after
   parsing node A it retained A's last interface name. If node B's first
   matching line was a property line (mtu/state/mac/inet/inet6) rather
   than an interface header, the code did
   `ip_dict['nodeB']['<nodeA-iface>']['mtu'] = ...` and raised KeyError.
   Move `int_nam = None` inside the per-node loop and add an
   `if int_nam is None: continue` guard after the header block so
   property handlers no-op until the first header is seen.
   Header lines carry `mtu N`/`state UP` inline, so they still fall
   through because int_nam is just set by the header block above.

3. get_linux_perf_tuning_dict never returns (no callers today, primes
   the function for future wiring-in):
   The function built out_dict but ended without a `return`, so every
   caller received None. The docstring itself flagged it. Append the
   missing return and update the docstring. Live-confirmed on banff
   (TYPE: NoneType, VALUE: None).

4. get_dns_dict dead branches (no callers today, same reasoning):
   Duplicated `elif re.search('Protocols', ...)` branch and every body
   was `print('')`, so dns_dict[node] was always {}. Replace the dead
   branches with proper regex captures for Protocols (collected as a
   list since it appears once globally and once per Link),
   Current DNS Server, DNS Servers (space-split list), and DNS Domain.
   Live-confirmed on banff: resolvectl returned usable data but
   dns_dict was {'banff-...': {}}.

Coordination: no overlap with PR #15 (multi-OS additions appended after
line 814) or PR #122 (logging-only conversions elsewhere in the file).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants