-
Notifications
You must be signed in to change notification settings - Fork 456
docs: KEP-6915: Label based switch to control Workload activeness #7295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: KEP-6915: Label based switch to control Workload activeness #7295
Conversation
Signed-off-by: Vassilis Vassiliadis <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: VassilisVassiliadis The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
|
Hi @VassilisVassiliadis. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
| - No new fields; reuse `Workload.spec.active` | ||
| - Label contract: `kueue.x-k8s.io/active` values `true` or `false` | ||
| - Invalid values are ignored and assumed to be `true` | ||
| - Feature gate: `WorkloadActiveLabel` (Alpha, default off) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are you planning to implement this so that all workloads can leverage this functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question, I think that we should only change reconciler.go (and unit/integration tests code) when implementing the feature.
So there are 3 questions to answer:
- Which Workloads do we patch ?
- What changes do we make ?
- Which parts of reconciler.go do we change?
For question 1: In the interest of simplicity we should only patch the Workloads of non-composable jobs.
For question 2: The intention of this KEP is to force a Workload to the inactive state.
To this end, in the places that we patch the Workload object for a Job we should:
- get the
kueue.x-k8s.io/activelabel of the Job object - if it is a false-y value then we set
workload.spec.activeto False - if it is a truth-y value, we do nothing. Basically, this is as if this feature does not exist
This means that the User can only force a Workload to transition from active to inactive but not the other way around. This reduces the chances of someone abusing this feature.
For question 3:
First, I'd create a utility function that works like this:
func shouldDeactivateJob(job GenericJob) bool {
// inspect the WorkloadActiveLabel feature gate, if it is off return false
// if this is a composable job, return false
// grab the value of the `kueue.x-k8s.io/active` label,
// if it is false-y return true
// else, return false
}There is at least 1 place, the part of the code that creates the Workload:
kueue/pkg/controller/jobframework/reconciler.go
Lines 1369 to 1377 in bb2b22f
| // Create the corresponding workload. | |
| wl, err := r.constructWorkload(ctx, job) | |
| if err != nil { | |
| return err | |
| } | |
| err = r.prepareWorkload(ctx, job, wl) | |
| if err != nil { | |
| return err | |
| } |
Something like this:
func (r *JobReconciler) prepareWorkload(ctx context.Context, job GenericJob, wl *kueue.Workload) error {
priorityClassName, source, p, err := r.extractPriority(ctx, wl.Spec.PodSets, job)
if err != nil {
return err
}
wl.Spec.PriorityClassName = priorityClassName
wl.Spec.Priority = &p
wl.Spec.PriorityClassSource = source
wl.Spec.PodSets = clearMinCountsIfFeatureDisabled(wl.Spec.PodSets)
// ***** start new code ***
if shouldDeactivateJob(job) && wl.Spec.Active {
wl.Spec.Active = false
// Optionally, record an event in Job and/or Workload explaining why the Workload got deactivated
}
// ***** end new code ***
if workloadSliceEnabled(job) {
return prepareWorkloadSlice(ctx, r.client, job, wl)
}
return nil
}That covers the scenario in which a user creates a Job that contains the kueue.x-k8s.io/active=false label. However, I don't think this change is enough to cover the scenario in which a user labels a Job object that Kueue has already created a Workload for.
I think for this case we need to add some code here:
kueue/pkg/controller/jobframework/reconciler.go
Lines 396 to 400 in bb2b22f
| // 1. Attempt to retrieve an existing workload (if any) for this job. | |
| wl, err := r.ensureOneWorkload(ctx, job, object) | |
| if err != nil { | |
| return ctrl.Result{}, err | |
| } |
So I'm thinking of something like this:
wl, err := r.ensureOneWorkload(ctx, job, object)
if err != nil {
return ctrl.Result{}, err
}
// ***** start new code ***
if wl != nil shouldDeactivateJob(job) && wl.Spec.Active {
wl.Spec.Active = false
// Optionally, record an event in Job and/or Workload explaining why the Workload got deactivated
}
// ***** end new code ***Let me know what you think and I can update the KEP to reflect this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention that whenever I set wl.Spec.active = false I'm also patching the workload object to persist the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also need to double check whether this covers the case of removing the label or setting it to a truth-y value. This is going to be easier with a couple of actual unit tests ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not that well versed here but I just wanted to make sure we have a plan for how to incorporate this for all frameworks and not just batch job.
I'll leave this to the kueue maintainers but I think this is a good step forward on this KEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. At a high level, the main idea is to operate at the level of GenericJob and Workload objects. I'm currently concretizing exactly how to configure Kueue to use this label as a hint for activating/deactivating workloads in a way that the user's intentions are not conflicting with Kueue's.
|
/cc @copilot I want to try out if I have copilot review. Ignore for now. |
Co-authored-by: Kevin Hannon <[email protected]>
Co-authored-by: Kevin Hannon <[email protected]>
|
Folks, thanks for working on the KEP, I will try my best to land here next week. |
| Consider including folks who also work outside the SIG or subproject. | ||
| --> | ||
|
|
||
| ## Design Details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main thing I would like to see discussed in this section is how to prevent the situation that:
- user deactivates a workload setting
kueue.x-k8s.io/active: false - users sets
kueue.x-k8s.io/active: trueto re-activate a workload - Kueue tries to de-activate the workload due to for example PodsReady backoff exceeded
- Kueue immediately re-activates the workload due to
kueue.x-k8s.io/active: true
Basically how to handle conflicting intentions set by the user and Kueue.
I'm open to proposals, one could be that user cannot set kueue.x-k8s.io/active: true, no label means "active". Then Kueue can always re-activate. However, maybe there are alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mimowo this makes a lot of sense to me.
Actually, your comment prompted me to rethink how I was going to implement this. I wrote down a couple of different ways to go about about implementing this feature and I'm leaning towards the approach below. If it also makes sense to you I'd like to update the KEP to reflect it.
Overall, I think that a user can use the label kueue.x-k8s.io/active in GenericJobs to only perform the following 2 actions:
- ask Kueue to not consider the GenericJob as eligible for scheduling
- for a GenericJob that they had previously asked Kueue to not consider eligible for scheduling, inform Kueue that the user has now changed their mind and now Kueue may again consider the GenericJob eligible for scheduling
In other words, this label offers a way to temporarily suspend a GenericJob. It cannot override Kueue's decision to mark the Workload of a GenericJob as inactive.
Therefore, I think that the GenericJob reconciler should provide this label as a hint to the Workload reconciler and leave the final decision to the latter.
We want to give Kueue's decisions to deactivate a workload higher priority than this "user hint". Therefore, we need to track the reason why a workload got deactivated. If it was due to the kueue.x-k8s.io/active=false label then it's safe to undo the deactivation when this label gets removed or its value is set to true.
There already exists a DeactivationTarget condition that the Workload reconciler uses to deactivate a Workload under certain conditions. I didn't find code that flips spec.Active back to true in the Workload reconciler's logic though. I only found the newly added code in the Job Reconciler for the JobWithCustomWorkloadActivation interface.
Therefore, my current assumption is that when the Workload reconciler deactivates a Workload, that workload stays deactivated till a human or external controller intervenes.
So there're a couple of different ways to track the reason why a workload got deactivated: A new Condition, a label, an annotation, a new field in the CRD of Workload.
I think it would be overkill to introduce a new field to the Workload CRD so we should opt for one of the other options. Looking at the code of the Workload reconciler I see a few places where the reconciler keeps the state of Workloads using to Conditions. Therefore my below suggestion opts for a Condition. That said, it's very easy to adapt to any of the alternatives.
So I think we can add the following two conditions to Workloads:
DeactivatedByJob: It acts similarly toDeactivationTarget. The difference is that it does not get deleted when settingspec.Activeto false. The Workload reconciler deletes theDeactivatedByJobcondition when it "undoes" a deactivation (see next bullet)ActivationTarget: It is an ephemeral Condition that is the opposite ofDeactivationTarget. It only takes effect if an Active Workload has theDeactivatedByJobcondition. Specifically:- If the workload is inactive AND has the condition
DeactivationTargetwith a statusTrue, THEN the workload is activated and the reconciler deletes theDeactivationTargetcondition (i.e. "undoing" the deactivation). - if the workload is active AND has the condition
ActivationTargetthen the reconciler deletes theActivationTargetcondition.
- If the workload is inactive AND has the condition
To be perfectly honest I'm not completely sold on the name DeactivatedByJob but I cannot think of a better one at the moment so I will use it as a placeholder till we find a more appropriate one.
The logic for the GenericJob reconciler:
- if there is a label
kueue.x-k8s.io/activewith valuefalseTHEN ensure that there is a conditionDeactivatedByJobin the associated Workload with the statusTrue. - if there is no label
kueue.x-k8s.io/activeor its value istrueAND there is aDeactivatedByJobcondition in the associated Workload with the statusTrueTHEN ensure there is a conditionActivationTargetwith the statusTrue.
The logic for the Workload reconciler:
- if the workload is active
- if there is a condition
DeactivatedByJobwith a statusTrueAND there is noDeactivationTargetcondition THEN set the conditionDeactivationTargetwith reasonDeactivatedByJoband messageowner job has label kueue.x-k8s.io/active=false - if there is a condition
ActivationTargetwith statusTruethen delete this condition and exit
- if there is a condition
- if the workload is inactive AND there is a condition
ActivationTargetwith statusTrueAND a conditionDeactivatedByJobwith statusTrue:- set spec.Active to
Trueand delete theDeactivatedByJobcondition
- set spec.Active to
|
@mimowo I'm thinking of putting together an image that explains the state transitions and then updating the KEP based with the logic I describe above. Before I do that I'd just like to ask about the goal of this KEP. Feature requestI'd like the ability to create a Job object (RayCluster, batch v1/job, etc) in a Kueue-managed namespace in a way that it remains suspended till I change my mind. I cannot currently do this without interacting with Kueue CustomResource objects (e.g. a Workload CR). Currently, this is not feasible because Kueue manages the "suspension" logic of Objects (e.g. the Extended context of feature request
QuestionDoes it make sense for an end-user to be able to mark a Job as ineligible for scheduling and at a later point reverting it back to being eligible for scheduling all without interacting with Kueue CRs ? |
What type of PR is this?
/kind documentation
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #6915
Special notes for your reviewer:
Does this PR introduce a user-facing change?