Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 31 additions & 1 deletion config/crd/jobflow/bases/flow.volcano.sh_jobflows.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,9 @@ spec:
networkTopology:
properties:
highestTierAllowed:
default: 1
type: integer
highestTierName:
type: string
mode:
default: hard
enum:
Expand Down Expand Up @@ -199,6 +200,35 @@ spec:
type: integer
name:
type: string
partitionPolicy:
properties:
minPartitions:
default: 0
format: int32
type: integer
networkTopology:
properties:
highestTierAllowed:
type: integer
highestTierName:
type: string
mode:
default: hard
enum:
- hard
- soft
type: string
type: object
partitionSize:
format: int32
type: integer
totalPartitions:
format: int32
type: integer
required:
- partitionSize
- totalPartitions
type: object
policies:
items:
properties:
Expand Down
32 changes: 31 additions & 1 deletion config/crd/jobflow/bases/flow.volcano.sh_jobtemplates.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,9 @@ spec:
networkTopology:
properties:
highestTierAllowed:
default: 1
type: integer
highestTierName:
type: string
mode:
default: hard
enum:
Expand Down Expand Up @@ -127,6 +128,35 @@ spec:
type: integer
name:
type: string
partitionPolicy:
properties:
minPartitions:
default: 0
format: int32
type: integer
networkTopology:
properties:
highestTierAllowed:
type: integer
highestTierName:
type: string
mode:
default: hard
enum:
- hard
- soft
type: string
type: object
partitionSize:
format: int32
type: integer
totalPartitions:
format: int32
type: integer
required:
- partitionSize
- totalPartitions
type: object
policies:
items:
properties:
Expand Down
32 changes: 31 additions & 1 deletion config/crd/volcano/bases/batch.volcano.sh_cronjobs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,9 @@ spec:
networkTopology:
properties:
highestTierAllowed:
default: 1
type: integer
highestTierName:
type: string
mode:
default: hard
enum:
Expand Down Expand Up @@ -163,6 +164,35 @@ spec:
type: integer
name:
type: string
partitionPolicy:
properties:
minPartitions:
default: 0
format: int32
type: integer
networkTopology:
properties:
highestTierAllowed:
type: integer
highestTierName:
type: string
mode:
default: hard
enum:
- hard
- soft
type: string
type: object
partitionSize:
format: int32
type: integer
totalPartitions:
format: int32
type: integer
required:
- partitionSize
- totalPartitions
type: object
policies:
items:
properties:
Expand Down
32 changes: 31 additions & 1 deletion config/crd/volcano/bases/batch.volcano.sh_jobs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,9 @@ spec:
networkTopology:
properties:
highestTierAllowed:
default: 1
type: integer
highestTierName:
type: string
mode:
default: hard
enum:
Expand Down Expand Up @@ -145,6 +146,35 @@ spec:
type: integer
name:
type: string
partitionPolicy:
properties:
minPartitions:
default: 0
format: int32
type: integer
networkTopology:
properties:
highestTierAllowed:
type: integer
highestTierName:
type: string
mode:
default: hard
enum:
- hard
- soft
type: string
type: object
partitionSize:
format: int32
type: integer
totalPartitions:
format: int32
type: integer
required:
- partitionSize
- totalPartitions
type: object
policies:
items:
properties:
Expand Down
63 changes: 62 additions & 1 deletion config/crd/volcano/bases/scheduling.volcano.sh_podgroups.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -95,10 +95,14 @@ spec:
CRD.
properties:
highestTierAllowed:
default: 1
description: HighestTierAllowed specifies the highest tier that
a job allowed to cross when scheduling.
type: integer
highestTierName:
description: |-
HighestTierName specifies the highest tier name that a job allowed to cross when scheduling.
HighestTierName and HighestTierAllowed cannot be set simultaneously.
type: string
mode:
default: hard
description: Mode specifies the mode of the network topology constrain.
Expand All @@ -122,6 +126,63 @@ spec:
Queue defines the queue to allocate resource for PodGroup; if queue does not exist,
the PodGroup will not be scheduled. Defaults to `default` Queue with the lowest weight.
type: string
subGroupPolicy:
description: SubGroupPolicy defines policies for dividing all pods
within the podGroup into multiple groups.
items:
properties:
matchPolicy:
description: |-
MatchPolicy defines matching strategies for different groups, where pods with the same labelKey value are grouped together.
The LabelKey in the list is unique.
items:
properties:
labelKey:
description: LabelKey specifies the label key used to
group pods.
type: string
type: object
type: array
minSubGroups:
default: 0
description: MinSubGroups defines the minimum number of sub-affinity
groups required.
format: int32
type: integer
name:
description: Name specifies the name of SubGroupPolicy
type: string
networkTopology:
description: NetworkTopology defines the NetworkTopology config,
this field works in conjunction with network topology feature
and hyperNode CRD.
properties:
highestTierAllowed:
description: HighestTierAllowed specifies the highest tier
that a job allowed to cross when scheduling.
type: integer
highestTierName:
description: |-
HighestTierName specifies the highest tier name that a job allowed to cross when scheduling.
HighestTierName and HighestTierAllowed cannot be set simultaneously.
type: string
mode:
default: hard
description: Mode specifies the mode of the network topology
constrain.
enum:
- hard
- soft
type: string
type: object
subGroupSize:
description: |-
SubGroupSize defines the number of pods in each sub-affinity group.
Only when a subGroup of pods, with a size of "subGroupSize", can satisfy the network topology constraint then will the subGroup be scheduled.
format: int32
type: integer
type: object
type: array
type: object
status:
description: |-
Expand Down
6 changes: 6 additions & 0 deletions config/crd/volcano/bases/topology.volcano.sh_hypernodes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ spec:
- jsonPath: .spec.tier
name: Tier
type: string
- jsonPath: .spec.tierName
name: TierName
type: string
- jsonPath: .status.nodeCount
name: NodeCount
type: integer
Expand Down Expand Up @@ -147,6 +150,9 @@ spec:
tier:
description: Tier categorizes the performance level of the HyperNode.
type: integer
tierName:
description: TierName represents the level name of the HyperNode.
type: string
required:
- tier
type: object
Expand Down
33 changes: 17 additions & 16 deletions docs/user-guide/how_to_use_job_policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,25 +13,26 @@ the task policy.
* Users can set multiple policy for a job or a task.
* Currently, Volcano provides **6 built-in events** for users. The details are as follows.

| ID | Event | Description |
|-----|----------------|-------------------------------------------------------------------------------------------------------------------|
| 1 | `PodFailed` | Check whether there is any pod' status is `Failed`. |
| 2 | `PodEvicted` | Check whether there is any pod is evicted. |
| 3 | `PodPending` | Check whether there is any pod is pending. It is usually used with timeout. If the pod is not pending, the timeout action will be canceled. |
| 4 | `TaskCompleted`| Check whether there is a task whose all pods are succeed. If `minsuccess` is configured for a task, it will also be regarded as task completes. |
| 4 | `Unknown` | Check whether the status of a volcano job is `Unknown`. The most possible factor is task unschedulable. It is triggered when part pods can't be scheduled while some are already running in gang-scheduling case. |
| 5 | `*` | It means all the events, which is not so common used. |
| ID | Event | Description |
|----|-----------------|-------------------------------------------------------------------------------------------------------------------|
| 1 | `PodFailed` | Check whether there is any pod' status is `Failed`. |
| 2 | `PodEvicted` | Check whether there is any pod is evicted. |
| 3 | `PodPending` | Check whether there is any pod is pending. It is usually used with timeout. If the pod is not pending, the timeout action will be canceled. |
| 4 | `TaskCompleted` | Check whether there is a task whose all pods are succeed. If `minsuccess` is configured for a task, it will also be regarded as task completes. |
| 5 | `Unknown` | Check whether the status of a volcano job is `Unknown`. The most possible factor is task unschedulable. It is triggered when part pods can't be scheduled while some are already running in gang-scheduling case. |
| 6 | `*` | It means all the events, which is not so common used. |

* Currently, Volcano provides **5 built-in actions** for users. The details are as follows.

| ID | Action | Description |
|-----|-------------------|------------------------------------------------------------------------------------------------------------------|
| 1 | `AbortJob` | Abort the whole job, but it can be resumed. All pods will be evicted and no pod will be recreated. |
| 2 | `RestartJob` | Restart the whole job. |
| 3 | `RestartTask` | The task will be restarted. This action **cannot** work with job level events such as `Unknown`. |
| 2 | `RestartPod` | The pod will be restarted. This action **cannot** work with job level events such as `Unknown`. |
| 4 | `TerminateJob` | Terminate the whole job and it **cannot** be resumed. All pods will be evicted and no pod will be recreated. |
| 5 | `CompleteJob` | Regard the job as completed. The unfinished pods will be killed. |
| ID | Action | Description |
|----|--------------------|--------------------------------------------------------------------------------------------------------------|
| 1 | `AbortJob` | Abort the whole job, but it can be resumed. All pods will be evicted and no pod will be recreated. |
| 2 | `RestartJob` | Restart the whole job. |
| 3 | `RestartTask` | The task will be restarted. This action **cannot** work with job level events such as `Unknown`. |
| 4 | `RestartPod` | The pod will be restarted. This action **cannot** work with job level events such as `Unknown`. |
| 5 | `RestartPartition` | The partition will be restarted. This action **cannot** work with job level events such as `Unknown`. |
| 6 | `TerminateJob` | Terminate the whole job and it **cannot** be resumed. All pods will be evicted and no pod will be recreated. |
| 7 | `CompleteJob` | Regard the job as completed. The unfinished pods will be killed. |

## Examples
1. Set a pair of `event` and `action`.
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ require (
sigs.k8s.io/controller-runtime v0.13.0
sigs.k8s.io/yaml v1.6.0
stathat.com/c/consistent v1.0.0
volcano.sh/apis v1.13.1-0.20251020121257-f562f82b42fd
volcano.sh/apis v1.13.1-0.20251114021538-d1e61c510040
)

require (
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -496,5 +496,5 @@ sigs.k8s.io/yaml v1.6.0 h1:G8fkbMSAFqgEFgh4b1wmtzDnioxFCUgTZhlbj5P9QYs=
sigs.k8s.io/yaml v1.6.0/go.mod h1:796bPqUfzR/0jLAl6XjHl3Ck7MiyVv8dbTdyT3/pMf4=
stathat.com/c/consistent v1.0.0 h1:ezyc51EGcRPJUxfHGSgJjWzJdj3NiMU9pNfLNGiXV0c=
stathat.com/c/consistent v1.0.0/go.mod h1:QkzMWzcbB+yQBL2AttO6sgsQS/JSTapcDISJalmCDS0=
volcano.sh/apis v1.13.1-0.20251020121257-f562f82b42fd h1:LdemLOIsCwABz4wS4wTHB6WA7dXz4gdq5TuUj/RZRKA=
volcano.sh/apis v1.13.1-0.20251020121257-f562f82b42fd/go.mod h1:CKQbxVt0o4lTKisC0MonoXWruGFC0S8KU+UuzaZ5E7k=
volcano.sh/apis v1.13.1-0.20251114021538-d1e61c510040 h1:FP8O0jjsPbKiNAY0UOwAkW4mEX5JuCOqdnEfzZteE8k=
volcano.sh/apis v1.13.1-0.20251114021538-d1e61c510040/go.mod h1:CKQbxVt0o4lTKisC0MonoXWruGFC0S8KU+UuzaZ5E7k=
Loading
Loading