Skip to content

Conversation

@VassilisVassiliadis
Copy link
Contributor

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #6915

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/documentation Categorizes issue or PR as related to documentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: VassilisVassiliadis
Once this PR has been reviewed and has the lgtm label, please assign mimowo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@netlify
Copy link

netlify bot commented Oct 16, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 2f5548b
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68f23ff38881b40007e07c65

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 16, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @VassilisVassiliadis. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 16, 2025
@mimowo
Copy link
Contributor

mimowo commented Oct 16, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 16, 2025
- No new fields; reuse `Workload.spec.active`
- Label contract: `kueue.x-k8s.io/active` values `true` or `false`
- Invalid values are ignored and assumed to be `true`
- Feature gate: `WorkloadActiveLabel` (Alpha, default off)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you planning to implement this so that all workloads can leverage this functionality?

Copy link
Contributor Author

@VassilisVassiliadis VassilisVassiliadis Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, I think that we should only change reconciler.go (and unit/integration tests code) when implementing the feature.

So there are 3 questions to answer:

  1. Which Workloads do we patch ?
  2. What changes do we make ?
  3. Which parts of reconciler.go do we change?

For question 1: In the interest of simplicity we should only patch the Workloads of non-composable jobs.

For question 2: The intention of this KEP is to force a Workload to the inactive state.

To this end, in the places that we patch the Workload object for a Job we should:

  1. get the kueue.x-k8s.io/active label of the Job object
  2. if it is a false-y value then we set workload.spec.active to False
  3. if it is a truth-y value, we do nothing. Basically, this is as if this feature does not exist

This means that the User can only force a Workload to transition from active to inactive but not the other way around. This reduces the chances of someone abusing this feature.

For question 3:

First, I'd create a utility function that works like this:

func shouldDeactivateJob(job GenericJob) bool {
	// inspect the WorkloadActiveLabel feature gate, if it is off return false
	// if this is a composable job, return false
	// grab the value of the `kueue.x-k8s.io/active` label, 
	// 	if it is false-y return true
	// 	else, return false
}

There is at least 1 place, the part of the code that creates the Workload:

// Create the corresponding workload.
wl, err := r.constructWorkload(ctx, job)
if err != nil {
return err
}
err = r.prepareWorkload(ctx, job, wl)
if err != nil {
return err
}

Something like this:

func (r *JobReconciler) prepareWorkload(ctx context.Context, job GenericJob, wl *kueue.Workload) error {
	priorityClassName, source, p, err := r.extractPriority(ctx, wl.Spec.PodSets, job)
	if err != nil {
		return err
	}

	wl.Spec.PriorityClassName = priorityClassName
	wl.Spec.Priority = &p
	wl.Spec.PriorityClassSource = source
	wl.Spec.PodSets = clearMinCountsIfFeatureDisabled(wl.Spec.PodSets)

    // ***** start new code ***
    if shouldDeactivateJob(job) && wl.Spec.Active {
        wl.Spec.Active = false
        // Optionally, record an event in Job and/or Workload explaining why the Workload got deactivated
    }
    // ***** end new code ***

	if workloadSliceEnabled(job) {
		return prepareWorkloadSlice(ctx, r.client, job, wl)
	}
	return nil
}

That covers the scenario in which a user creates a Job that contains the kueue.x-k8s.io/active=false label. However, I don't think this change is enough to cover the scenario in which a user labels a Job object that Kueue has already created a Workload for.

I think for this case we need to add some code here:

// 1. Attempt to retrieve an existing workload (if any) for this job.
wl, err := r.ensureOneWorkload(ctx, job, object)
if err != nil {
return ctrl.Result{}, err
}

So I'm thinking of something like this:

wl, err := r.ensureOneWorkload(ctx, job, object)
if err != nil {
	return ctrl.Result{}, err
}

// ***** start new code ***
if wl != nil shouldDeactivateJob(job) && wl.Spec.Active {
	wl.Spec.Active = false
	// Optionally, record an event in Job and/or Workload explaining why the Workload got deactivated
}
// ***** end new code ***

Let me know what you think and I can update the KEP to reflect this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention that whenever I set wl.Spec.active = false I'm also patching the workload object to persist the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also need to double check whether this covers the case of removing the label or setting it to a truth-y value. This is going to be easier with a couple of actual unit tests ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that well versed here but I just wanted to make sure we have a plan for how to incorporate this for all frameworks and not just batch job.

I'll leave this to the kueue maintainers but I think this is a good step forward on this KEP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. At a high level, the main idea is to operate at the level of GenericJob and Workload objects. I'm currently concretizing exactly how to configure Kueue to use this label as a hint for activating/deactivating workloads in a way that the user's intentions are not conflicting with Kueue's.

@kannon92
Copy link
Contributor

/cc @copilot

I want to try out if I have copilot review. Ignore for now.

@mimowo
Copy link
Contributor

mimowo commented Oct 17, 2025

Folks, thanks for working on the KEP, I will try my best to land here next week.

Consider including folks who also work outside the SIG or subproject.
-->

## Design Details
Copy link
Contributor

@mimowo mimowo Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main thing I would like to see discussed in this section is how to prevent the situation that:

  1. user deactivates a workload setting kueue.x-k8s.io/active: false
  2. users sets kueue.x-k8s.io/active: true to re-activate a workload
  3. Kueue tries to de-activate the workload due to for example PodsReady backoff exceeded
  4. Kueue immediately re-activates the workload due to kueue.x-k8s.io/active: true

Basically how to handle conflicting intentions set by the user and Kueue.

I'm open to proposals, one could be that user cannot set kueue.x-k8s.io/active: true, no label means "active". Then Kueue can always re-activate. However, maybe there are alternatives.

Copy link
Contributor Author

@VassilisVassiliadis VassilisVassiliadis Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimowo this makes a lot of sense to me.

Actually, your comment prompted me to rethink how I was going to implement this. I wrote down a couple of different ways to go about about implementing this feature and I'm leaning towards the approach below. If it also makes sense to you I'd like to update the KEP to reflect it.

Overall, I think that a user can use the label kueue.x-k8s.io/active in GenericJobs to only perform the following 2 actions:

  1. ask Kueue to not consider the GenericJob as eligible for scheduling
  2. for a GenericJob that they had previously asked Kueue to not consider eligible for scheduling, inform Kueue that the user has now changed their mind and now Kueue may again consider the GenericJob eligible for scheduling

In other words, this label offers a way to temporarily suspend a GenericJob. It cannot override Kueue's decision to mark the Workload of a GenericJob as inactive.

Therefore, I think that the GenericJob reconciler should provide this label as a hint to the Workload reconciler and leave the final decision to the latter.

We want to give Kueue's decisions to deactivate a workload higher priority than this "user hint". Therefore, we need to track the reason why a workload got deactivated. If it was due to the kueue.x-k8s.io/active=false label then it's safe to undo the deactivation when this label gets removed or its value is set to true.

There already exists a DeactivationTarget condition that the Workload reconciler uses to deactivate a Workload under certain conditions. I didn't find code that flips spec.Active back to true in the Workload reconciler's logic though. I only found the newly added code in the Job Reconciler for the JobWithCustomWorkloadActivation interface.

Therefore, my current assumption is that when the Workload reconciler deactivates a Workload, that workload stays deactivated till a human or external controller intervenes.

So there're a couple of different ways to track the reason why a workload got deactivated: A new Condition, a label, an annotation, a new field in the CRD of Workload.

I think it would be overkill to introduce a new field to the Workload CRD so we should opt for one of the other options. Looking at the code of the Workload reconciler I see a few places where the reconciler keeps the state of Workloads using to Conditions. Therefore my below suggestion opts for a Condition. That said, it's very easy to adapt to any of the alternatives.

So I think we can add the following two conditions to Workloads:

  1. DeactivatedByJob: It acts similarly to DeactivationTarget. The difference is that it does not get deleted when setting spec.Active to false. The Workload reconciler deletes the DeactivatedByJob condition when it "undoes" a deactivation (see next bullet)
  2. ActivationTarget: It is an ephemeral Condition that is the opposite of DeactivationTarget. It only takes effect if an Active Workload has the DeactivatedByJob condition. Specifically:
    1. If the workload is inactive AND has the condition DeactivationTarget with a status True, THEN the workload is activated and the reconciler deletes the DeactivationTarget condition (i.e. "undoing" the deactivation).
    2. if the workload is active AND has the condition ActivationTarget then the reconciler deletes the ActivationTarget condition.

To be perfectly honest I'm not completely sold on the name DeactivatedByJob but I cannot think of a better one at the moment so I will use it as a placeholder till we find a more appropriate one.

The logic for the GenericJob reconciler:

  1. if there is a label kueue.x-k8s.io/active with value false THEN ensure that there is a condition DeactivatedByJob in the associated Workload with the status True.
  2. if there is no label kueue.x-k8s.io/active or its value is true AND there is a DeactivatedByJob condition in the associated Workload with the status True THEN ensure there is a condition ActivationTarget with the status True.

The logic for the Workload reconciler:

  1. if the workload is active
    1. if there is a condition DeactivatedByJob with a status True AND there is no DeactivationTarget condition THEN set the condition DeactivationTarget with reason DeactivatedByJob and message owner job has label kueue.x-k8s.io/active=false
    2. if there is a condition ActivationTarget with status True then delete this condition and exit
  2. if the workload is inactive AND there is a condition ActivationTarget with status True AND a condition DeactivatedByJob with status True:
    1. set spec.Active to True and delete the DeactivatedByJob condition

@VassilisVassiliadis
Copy link
Contributor Author

@mimowo I'm thinking of putting together an image that explains the state transitions and then updating the KEP based with the logic I describe above. Before I do that I'd just like to ask about the goal of this KEP.

Feature request

I'd like the ability to create a Job object (RayCluster, batch v1/job, etc) in a Kueue-managed namespace in a way that it remains suspended till I change my mind. I cannot currently do this without interacting with Kueue CustomResource objects (e.g. a Workload CR).

Currently, this is not feasible because Kueue manages the "suspension" logic of Objects (e.g. the .suspend field of RayCluster and batch.v1/Job objects).

Extended context of feature request

  1. a user cannot "force" the Job object to be scheduled without going through all the regular checks of Kueue (e.g. admission controllers, capacity check, etc)
  2. a user cannot "override" Kueue's decision to suspend a Job (i.e. some internal component of Kueue, or some external entity, setting .active=False in the associated Workload object

Question

Does it make sense for an end-user to be able to mark a Job as ineligible for scheduling and at a later point reverting it back to being eligible for scheduling all without interacting with Kueue CRs ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Support configuring Kueue to suspend a Job till told otherwise on a per job basis

4 participants