Skip to content

Conversation

@colinthomas-z80
Copy link
Contributor

@colinthomas-z80 colinthomas-z80 commented Nov 11, 2025

Proposed Changes

Addresses #4276

We cannot partially iterate the ready list and leave it out of order if we wish to maintain priority scheduling. If a task priority is set by the user, revert to iterating the entire list each scheduling attempt.

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update: Update the manual to reflect user-visible changes.
  • Type Labels: Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM: Mark your PR as ready to merge.

Copy link
Member

@btovar btovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready to merge?

@dthain
Copy link
Member

dthain commented Nov 11, 2025

Hold on a minute...

@dthain
Copy link
Member

dthain commented Nov 11, 2025

I don't think send_one_task has the desired effect. If no tasks are schedulable, it will rotate through the entire list, yes. But if it starts working through the list and finds one halfway, then it will return after rotating half the list, and the priority order is lost. Or am I misunderstanding something?

@colinthomas-z80
Copy link
Contributor Author

You're right I think. Maybe I can just switch back to list_iterate

@btovar
Copy link
Member

btovar commented Nov 11, 2025

list_rotate was added for performance advantages, right?
Would it be better to keep it and count how many rotations where done and then rotate back to the start if needed?

@colinthomas-z80
Copy link
Contributor Author

That could get complicated quickly if we are pushing new tasks with priorities while the list is out of order

@btovar
Copy link
Member

btovar commented Nov 11, 2025

We can avoid the code duplication if you use a function pointer for whether to use list_nextitem or list_rotate, and update q->attempt_schedule_depth to the length of the ready_queue every time when using priorities.

@colinthomas-z80
Copy link
Contributor Author

Is it necessary to call list_first_item before iterating the list normally?

@dthain
Copy link
Member

dthain commented Nov 11, 2025

Careful! You are iterating over the list but using list_pop_tail to remove a task. Those don't match up.

@colinthomas-z80
Copy link
Contributor Author

@dthain any thoughts? Maybe we create a new branch for Ben C to test first?

@benclifford
Copy link

I'm happy to test whatever branch. I'm also happy to build from source so you don't to do more than point me at a suitable git branch/commit.

@dthain
Copy link
Member

dthain commented Nov 17, 2025

Colin, if you think it's ready to go, then let's have BenC give it a test drive.

@colinthomas-z80
Copy link
Contributor Author

colinthomas-z80 commented Nov 17, 2025

@benclifford If you don't mind cloning my repo feel free to take the latest commit here and see if it helps your issue. I am uncertain about the throughput implications with your particular application.

@benclifford
Copy link

I tried this PR with the test run that I was originally complaining about in issue #4276 and it looks good. This plot is of a run which is intended to run longer tasks before shorter tasks, assuming the submitter knows the task duration, and it looks like that is true (apart from one startup task that I think is probably expected due to submission time/order). The context for that is this blog post https://parsl-project.org/2025/11/08/priority.html

image

@dthain
Copy link
Member

dthain commented Nov 19, 2025

Thank you for brining this up and then testing.
@colinthomas-z80 and @btovar can you coordinate a release for Ben?

change_task_state(q, t, WORK_QUEUE_TASK_RETRIEVED);
expired++;
list_pop_tail(q->ready_list);
list_remove(q->ready_list, t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you changed to list_remove? This would cause wq to check the whole list as the task is in the tail?

@dthain
Copy link
Member

dthain commented Nov 20, 2025

The issue here is that list_pop only makes sense when using list_rotate. When iterating over the list, you want to remove the item at the cursor. list_remove works but runs in linear time. list_drop is the handy thing to use here: it drops the item at the location of a list_cursor.

@btovar
Copy link
Member

btovar commented Nov 21, 2025

I'll update this pr with list_item_remove, merge it and make a release.

@colinthomas-z80
Copy link
Contributor Author

It looks like list_drop requires a reference to the existing cursor which is not available outside list.c unless we re implement the iteration method using the more primitive functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants