Skip to content

Add a Job Status enum + scheduler refactoring#92

Open
AlexJones0 wants to merge 5 commits intolowRISC:masterfrom
AlexJones0:scheduler_cleanup
Open

Add a Job Status enum + scheduler refactoring#92
AlexJones0 wants to merge 5 commits intolowRISC:masterfrom
AlexJones0:scheduler_cleanup

Conversation

@AlexJones0
Copy link
Contributor

@AlexJones0 AlexJones0 commented Jan 30, 2026

This PR performs some small cleanup of the scheduler in preparation for potential larger-scale changes. Specifically, it aims to fix any outstanding pyright and ruff check failures that exist within the scheduler itself, to provide a good base for further refactoring or modification.

One other clear deficiency in the design that I wanted to address was the use of a string (always a single upper-case character) like "D", "P" or "K" to refer to the Job status, rather than the more standard Enumerated type. An enum would improve readability & ergonomics, and add stricter typing of status values - so I've opted to make this change, which is quite wide-spread (albeit simple).

Address the `S101` failure with ruff check.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Break apart a couple of the more complex functions to resolve ruff lint
warnings related to method complexity. Also take the opportunity to
slightly improve some of the documentation on these functions and to
refactor some of the code to be nicer to read.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
There may still be more work needed on the typing front but this
resolves the singular remaining type error given by pyright within the
scheduler.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This 'E' error launcher status is not well-defined or well-known within
the rest of DVSim, but can be seen as meaningfully different from the
'F'/'FAILED' status which describes the dispatched job failing, as this
denotes a failure within the launcher.

As DVSim currently does not handle this status (almost at-all!) and this
is the single occurrence of the 'E' status in the code, we instead
convert this into a more appropriate raised `LauncherError`, as in
reality this case only happens when calling `poll()` either before
`launch()` or after `launch()` already reported some error.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Currently the job status is passed around DVSim as a string, where the
expectation is that it is always a single upper case character in ('Q',
'D', 'P', 'F', 'K') - some limited parts of the code also handle an 'E'
error case.

Turn these into a small Enum type so that we have much stricter typing
on these values and improved ergonomics when referring to different job
statuses in code.

This status Enum lives in its own module instead of with the other job
data to avoid circular imports (as the job status' registered callbacks
reference the launcher definition).

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Copy link
Contributor

@rswarbrick rswarbrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of nitty comments (sorry!) but I like this. I especially like the enum change: it's a great improvement from magic one-character strings!

assert job.dependencies
if not job.dependencies:
err = "Job item exists but the next_item's dependency list is empty?"
raise RuntimeError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to skip the variable, I think:

                    if not job.dependencies:
                        raise RuntimeError("Job item exists but the "
                                           "next_item's dependency list is empty?")

I'd probably also write it as "... the next item's...", to make a bit more sense to the person reading it.

job: JobSpec = self._jobs[job_name]

if job.target not in self._scheduled:
msg = f"Scheduler does not contain target {job.target}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'd compute the message inside the function call parentheses.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linting rules explicitly reject that. Can't remember why exactly, something to do with the stack trace. I think you end up with the message there twice, pre formatted and then formatted.

We could disable the lint rule if you feel strongly about it. But that's how it is at the moment.


We choose the target that follows the 'item''s current target and find
the list of successors whose dependency list contains 'item'. If 'item'
We choose the target that follows the "item"'s current target and find
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct change here is to remove the quotes entirely for the first occurrence and replace the other two with "job_name".

cfgs = set(self._scheduled[target])

target = next(iter(self._scheduled), None)
cfgs = {} if target is None else set(self._scheduled[target])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops: {} is an empty dictionary, not a set.

if not job.dependencies:
err = "Job item exists but the next_item's dependency list is empty?"
raise RuntimeError(err)
msg = "Job item exists but the next_item's dependency list is empty?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this code was here already, but you're touching it! Probably put the message inside the function call and get rid of a variable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is due to the linting rules. Doesn't mean it's what we want, but that's why.

def _check_status(self) -> tuple[str, ErrorMessage | None]:
"""Determine the outcome of the job (P/F if it ran to completion).
def _check_status(self) -> tuple[JobStatus, ErrorMessage | None]:
"""Determine the outcome of the job (Passed/Failed if it ran to completion).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think the enum names need to be ALLCAPS


Args:
status: status of the job, either 'P', 'F' or 'K'.
status: status of the completed job (must be either Passed, Failed or Killed).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALL_CAPS here as well, I think.

This returns 'D', 'P', 'F', or 'K'. If 'D', the job is still running.
If 'P', the job finished successfully. If 'F', the job finished with
an error. If 'K' it was killed.
This returns a job status. If dispatched, the job is still running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More capitals here, I think

This returns 'D', 'P', 'F', or 'K'. If 'D', the job is still running.
If 'P', the job finished successfully. If 'F', the job finished with
an error. If 'K' it was killed.
This returns a job status. If dispatched, the job is still running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet more capitals (sorry).


This returns 'D', 'P' or 'F'. If 'D', the job is still running. If 'P',
the job finished successfully. If 'F', the job finished with an error.
This returns a job status. If dispatched, the job is still running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More capitals! :-)

Copy link
Collaborator

@machshev machshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AlexJones0 this is a really nice tidy up. Some comments, but nothing major.

JobStatus.DISPATCHED: self.scratch_path + "/" + "dispatched",
JobStatus.PASSED: self.scratch_path + "/" + "passed",
JobStatus.FAILED: self.scratch_path + "/" + "failed",
JobStatus.KILLED: self.scratch_path + "/" + "killed",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be improved further by converting it to a dictionary comprehension. The path based on lowercase enumeration key.

That way further job statuses can be added by just extending the enum.

return (datetime.datetime.now() - self.start_time).total_seconds() / 60

def poll(self):
def poll(self) -> JobStatus | None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions is the status None? Are we missing a value in JobStatus?

"poll() was called either before calling launch() or after "
"ignoring a LauncherError from launch()."
)
raise LauncherError(msg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Exception caught by the scheduler? We don't want to kill the DVSim process due to exceptions in launchers.

cfgs = set(self._scheduled[target])

target = next(iter(self._scheduled), None)
cfgs = {} if target is None else set(self._scheduled[target])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly sure {} is an empty dictionary. I'm assuming you wanted an empty set instead?

cfgs = {job.block.name}

if target is None:
return []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the typing is correct and Sequence is all that's needed then you could return an empty tuple instead (). Generally I'd lean towards immutable types where possible.

JobStatus.PASSED: self.scratch_path + "/" + "passed",
JobStatus.FAILED: self.scratch_path + "/" + "failed",
JobStatus.KILLED: self.scratch_path + "/" + "killed",
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a comprehension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants