Skip to content

Conversation

@grondo
Copy link
Contributor

@grondo grondo commented Nov 13, 2025

Currently invalid hostnames in a jobspec hostlist constraint either result in mysterious unsatisfiable errors or are silently ignored. The jobspec validator only ensures the hostlist can be decoded, but does no further validation.

This PR adds support for validating constraint hosts against the instance hostlist (if it is available), and gives users a more meaningful error:

$ flux run -n1 --requires=host:badhost true
flux-run: ERROR: host constraint contains invalid hosts: host 'badhost' not found

This approach will not detect excluded hosts, nor if hosts are requested that are not also currently in the selected queue, etc. However, this ensures the hosts are at least valid hostnames for the current instance, and the implementation is very lightweight. Further validation would be much more heavyweight, but could potentially be done in the future in a separate validator plugin.

Problem: In test t2260-job-list.t, a fake resource set forces the
nodelist to `node[0-3]`, but this fake hostlist does not also apply to
the `hostlist` broker attribute. Since the jobspec validator will soon
check for valid hostlist constraints using the `hostlist` attribute,
this may cause job submissions using `--requires=host:nodeX` to fail.

Use the `--test-hosts` option of `flux start` to ensure the `hostlist`
attribute matches the hostlist in the resource set.
Problem: Hostlist constraints set in jobspec are currently validated
only to determine if they conform to RFC 29 Hostlist format. Invalid
hosts required in a constraint currently result in an empty set of
potential resources in the scheduler and thereby an "unsatisfiable"
error, and invalid excluded hosts are silently ignored. Neither of
these are helpful to users who may have mistyped the intended hosts.

When possible, fetch the enclosing hostlist and ensure all hosts in
a constraint are contained within it, and raise an exception calling
out any invalid hostnames.

Add a function to create an on-demand, per-thread Flux handle in
the Jobspec module for use by the validation functions. Use this to
fetch the local hostlist attribute (if there is one) and cache it,
and validate any hosts in constraints using that hostlist.

Fixes flux-framework#7200
Problem: No tests in the testsuite ensure that the job validator
catches invalid hosts in a hostlist constraint.

Add a couple tests to t2110-job-ingest-validator.t.
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! That worked out nicely.

@mergify mergify bot added the queued label Nov 17, 2025
@mergify mergify bot merged commit 44d90af into flux-framework:master Nov 17, 2025
35 checks passed
@mergify mergify bot removed the queued label Nov 17, 2025
@grondo grondo deleted the issue#7200 branch November 17, 2025 14:48
@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

❌ Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.70%. Comparing base (4b5a517) to head (c6c417d).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/bindings/python/flux/job/Jobspec.py 90.90% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7203      +/-   ##
==========================================
- Coverage   83.70%   83.70%   -0.01%     
==========================================
  Files         553      553              
  Lines       92350    92370      +20     
==========================================
+ Hits        77304    77320      +16     
- Misses      15046    15050       +4     
Files with missing lines Coverage Δ
src/bindings/python/flux/job/Jobspec.py 92.85% <90.90%> (-0.10%) ⬇️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants