-
Notifications
You must be signed in to change notification settings - Fork 55
job-manager: improve policy limit error messages #7204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
garlick
approved these changes
Nov 14, 2025
Member
garlick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Contributor
Author
|
After rebasing, the ENOSPC tests are failing on every builder: Looking into it. |
Problem: The limit-job-size plugin has repetitive checks for over and under policy limits, which will make updating the similar error messages tedious. Add a few helper functions so the similar error message is created in a single place.
Problem: The error message from the limit-job-size plugin does not include the requested resource count or the target queue. This could leave users confused about the source of the policy. Add the requested resource count as computed by the plugin, as well as any queue name when generating error messages in the limit-job-size plugin.
Problem: The tests for the limit-job-size plugin in the testsuite do not ensure queue policy limit errors include the queue name. Add a test that ensures the queue name is present in a queue-specific error.
Problem: The job update tests expect specific errors when moving a job to a new queue or updating duration exceeds policy limits, but these error messages may be expanded in the future. Update the error patterns to allow for future changes.
Problem: The limit-duration plugin does not include the requested duration or target queue in the error message when rejecting a job. This can lead to user confusion about the source of the policy limit. Add the requested duration (formatted as fsd) and the queue (if any) to the error message sent back to the user.
Problem: The tests for the limit-duration job-manager plugin do not ensure error messages contain expected details. Add a couple new tests to t2221-job-manager-limit-duration.t.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7204 +/- ##
==========================================
- Coverage 83.71% 83.71% -0.01%
==========================================
Files 553 553
Lines 92370 92379 +9
==========================================
+ Hits 77329 77336 +7
- Misses 15041 15043 +2
🚀 New features to boost your workflow:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR improves error messages from the limit-job-size and limit-duration plugins
to include the requested resource count/duration and the target queue name
when rejecting jobs.
Example improvement:
requested nnodes exceeds policy limit of 16requested nnodes (20) exceeds policy limit of 16 for queue debugFixes: #7201