Skip to content

Commit 740881f

Browse files
committed
Provides group affinity partitioning capability
Signed-off-by: 3sunny <[email protected]>
1 parent 26b56fd commit 740881f

35 files changed

+1434
-111
lines changed

config/crd/jobflow/bases/flow.volcano.sh_jobflows.yaml

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,8 +114,9 @@ spec:
114114
networkTopology:
115115
properties:
116116
highestTierAllowed:
117-
default: 1
118117
type: integer
118+
highestTierName:
119+
type: string
119120
mode:
120121
default: hard
121122
enum:
@@ -199,6 +200,35 @@ spec:
199200
type: integer
200201
name:
201202
type: string
203+
partitionPolicy:
204+
properties:
205+
minPartitions:
206+
default: 0
207+
format: int32
208+
type: integer
209+
networkTopology:
210+
properties:
211+
highestTierAllowed:
212+
type: integer
213+
highestTierName:
214+
type: string
215+
mode:
216+
default: hard
217+
enum:
218+
- hard
219+
- soft
220+
type: string
221+
type: object
222+
partitionSize:
223+
format: int32
224+
type: integer
225+
totalPartitions:
226+
format: int32
227+
type: integer
228+
required:
229+
- partitionSize
230+
- totalPartitions
231+
type: object
202232
policies:
203233
items:
204234
properties:

config/crd/jobflow/bases/flow.volcano.sh_jobtemplates.yaml

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,9 @@ spec:
4242
networkTopology:
4343
properties:
4444
highestTierAllowed:
45-
default: 1
4645
type: integer
46+
highestTierName:
47+
type: string
4748
mode:
4849
default: hard
4950
enum:
@@ -127,6 +128,35 @@ spec:
127128
type: integer
128129
name:
129130
type: string
131+
partitionPolicy:
132+
properties:
133+
minPartitions:
134+
default: 0
135+
format: int32
136+
type: integer
137+
networkTopology:
138+
properties:
139+
highestTierAllowed:
140+
type: integer
141+
highestTierName:
142+
type: string
143+
mode:
144+
default: hard
145+
enum:
146+
- hard
147+
- soft
148+
type: string
149+
type: object
150+
partitionSize:
151+
format: int32
152+
type: integer
153+
totalPartitions:
154+
format: int32
155+
type: integer
156+
required:
157+
- partitionSize
158+
- totalPartitions
159+
type: object
130160
policies:
131161
items:
132162
properties:

config/crd/volcano/bases/batch.volcano.sh_cronjobs.yaml

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,9 @@ spec:
7878
networkTopology:
7979
properties:
8080
highestTierAllowed:
81-
default: 1
8281
type: integer
82+
highestTierName:
83+
type: string
8384
mode:
8485
default: hard
8586
enum:
@@ -163,6 +164,35 @@ spec:
163164
type: integer
164165
name:
165166
type: string
167+
partitionPolicy:
168+
properties:
169+
minPartitions:
170+
default: 0
171+
format: int32
172+
type: integer
173+
networkTopology:
174+
properties:
175+
highestTierAllowed:
176+
type: integer
177+
highestTierName:
178+
type: string
179+
mode:
180+
default: hard
181+
enum:
182+
- hard
183+
- soft
184+
type: string
185+
type: object
186+
partitionSize:
187+
format: int32
188+
type: integer
189+
totalPartitions:
190+
format: int32
191+
type: integer
192+
required:
193+
- partitionSize
194+
- totalPartitions
195+
type: object
166196
policies:
167197
items:
168198
properties:

config/crd/volcano/bases/batch.volcano.sh_jobs.yaml

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,9 @@ spec:
6060
networkTopology:
6161
properties:
6262
highestTierAllowed:
63-
default: 1
6463
type: integer
64+
highestTierName:
65+
type: string
6566
mode:
6667
default: hard
6768
enum:
@@ -145,6 +146,35 @@ spec:
145146
type: integer
146147
name:
147148
type: string
149+
partitionPolicy:
150+
properties:
151+
minPartitions:
152+
default: 0
153+
format: int32
154+
type: integer
155+
networkTopology:
156+
properties:
157+
highestTierAllowed:
158+
type: integer
159+
highestTierName:
160+
type: string
161+
mode:
162+
default: hard
163+
enum:
164+
- hard
165+
- soft
166+
type: string
167+
type: object
168+
partitionSize:
169+
format: int32
170+
type: integer
171+
totalPartitions:
172+
format: int32
173+
type: integer
174+
required:
175+
- partitionSize
176+
- totalPartitions
177+
type: object
148178
policies:
149179
items:
150180
properties:

config/crd/volcano/bases/scheduling.volcano.sh_podgroups.yaml

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,10 +95,14 @@ spec:
9595
CRD.
9696
properties:
9797
highestTierAllowed:
98-
default: 1
9998
description: HighestTierAllowed specifies the highest tier that
10099
a job allowed to cross when scheduling.
101100
type: integer
101+
highestTierName:
102+
description: |-
103+
HighestTierName specifies the highest tier name that a job allowed to cross when scheduling.
104+
HighestTierName and HighestTierAllowed cannot be set simultaneously.
105+
type: string
102106
mode:
103107
default: hard
104108
description: Mode specifies the mode of the network topology constrain.
@@ -122,6 +126,63 @@ spec:
122126
Queue defines the queue to allocate resource for PodGroup; if queue does not exist,
123127
the PodGroup will not be scheduled. Defaults to `default` Queue with the lowest weight.
124128
type: string
129+
subGroupPolicy:
130+
description: SubGroupPolicy defines policies for dividing all pods
131+
within the podGroup into multiple groups.
132+
items:
133+
properties:
134+
matchPolicy:
135+
description: |-
136+
MatchPolicy defines matching strategies for different groups, where pods with the same labelKey value are grouped together.
137+
The LabelKey in the list is unique.
138+
items:
139+
properties:
140+
labelKey:
141+
description: LabelKey specifies the label key used to
142+
group pods.
143+
type: string
144+
type: object
145+
type: array
146+
minSubGroups:
147+
default: 0
148+
description: MinSubGroups defines the minimum number of sub-affinity
149+
groups required.
150+
format: int32
151+
type: integer
152+
name:
153+
description: Name specifies the name of SubGroupPolicy
154+
type: string
155+
networkTopology:
156+
description: NetworkTopology defines the NetworkTopology config,
157+
this field works in conjunction with network topology feature
158+
and hyperNode CRD.
159+
properties:
160+
highestTierAllowed:
161+
description: HighestTierAllowed specifies the highest tier
162+
that a job allowed to cross when scheduling.
163+
type: integer
164+
highestTierName:
165+
description: |-
166+
HighestTierName specifies the highest tier name that a job allowed to cross when scheduling.
167+
HighestTierName and HighestTierAllowed cannot be set simultaneously.
168+
type: string
169+
mode:
170+
default: hard
171+
description: Mode specifies the mode of the network topology
172+
constrain.
173+
enum:
174+
- hard
175+
- soft
176+
type: string
177+
type: object
178+
subGroupSize:
179+
description: |-
180+
SubGroupSize defines the number of pods in each sub-affinity group.
181+
Only when a subGroup of pods, with a size of "subGroupSize", can satisfy the network topology constraint then will the subGroup be scheduled.
182+
format: int32
183+
type: integer
184+
type: object
185+
type: array
125186
type: object
126187
status:
127188
description: |-

config/crd/volcano/bases/topology.volcano.sh_hypernodes.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,9 @@ spec:
2020
- jsonPath: .spec.tier
2121
name: Tier
2222
type: string
23+
- jsonPath: .spec.tierName
24+
name: TierName
25+
type: string
2326
- jsonPath: .status.nodeCount
2427
name: NodeCount
2528
type: integer
@@ -147,6 +150,9 @@ spec:
147150
tier:
148151
description: Tier categorizes the performance level of the HyperNode.
149152
type: integer
153+
tierName:
154+
description: TierName represents the level name of the HyperNode.
155+
type: string
150156
required:
151157
- tier
152158
type: object

docs/user-guide/how_to_use_job_policy.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -13,25 +13,26 @@ the task policy.
1313
* Users can set multiple policy for a job or a task.
1414
* Currently, Volcano provides **6 built-in events** for users. The details are as follows.
1515

16-
| ID | Event | Description |
17-
|-----|----------------|-------------------------------------------------------------------------------------------------------------------|
18-
| 1 | `PodFailed` | Check whether there is any pod' status is `Failed`. |
19-
| 2 | `PodEvicted` | Check whether there is any pod is evicted. |
20-
| 3 | `PodPending` | Check whether there is any pod is pending. It is usually used with timeout. If the pod is not pending, the timeout action will be canceled. |
21-
| 4 | `TaskCompleted`| Check whether there is a task whose all pods are succeed. If `minsuccess` is configured for a task, it will also be regarded as task completes. |
22-
| 4 | `Unknown` | Check whether the status of a volcano job is `Unknown`. The most possible factor is task unschedulable. It is triggered when part pods can't be scheduled while some are already running in gang-scheduling case. |
23-
| 5 | `*` | It means all the events, which is not so common used. |
16+
| ID | Event | Description |
17+
|----|-----------------|-------------------------------------------------------------------------------------------------------------------|
18+
| 1 | `PodFailed` | Check whether there is any pod' status is `Failed`. |
19+
| 2 | `PodEvicted` | Check whether there is any pod is evicted. |
20+
| 3 | `PodPending` | Check whether there is any pod is pending. It is usually used with timeout. If the pod is not pending, the timeout action will be canceled. |
21+
| 4 | `TaskCompleted` | Check whether there is a task whose all pods are succeed. If `minsuccess` is configured for a task, it will also be regarded as task completes. |
22+
| 5 | `Unknown` | Check whether the status of a volcano job is `Unknown`. The most possible factor is task unschedulable. It is triggered when part pods can't be scheduled while some are already running in gang-scheduling case. |
23+
| 6 | `*` | It means all the events, which is not so common used. |
2424

2525
* Currently, Volcano provides **5 built-in actions** for users. The details are as follows.
2626

27-
| ID | Action | Description |
28-
|-----|-------------------|------------------------------------------------------------------------------------------------------------------|
29-
| 1 | `AbortJob` | Abort the whole job, but it can be resumed. All pods will be evicted and no pod will be recreated. |
30-
| 2 | `RestartJob` | Restart the whole job. |
31-
| 3 | `RestartTask` | The task will be restarted. This action **cannot** work with job level events such as `Unknown`. |
32-
| 2 | `RestartPod` | The pod will be restarted. This action **cannot** work with job level events such as `Unknown`. |
33-
| 4 | `TerminateJob` | Terminate the whole job and it **cannot** be resumed. All pods will be evicted and no pod will be recreated. |
34-
| 5 | `CompleteJob` | Regard the job as completed. The unfinished pods will be killed. |
27+
| ID | Action | Description |
28+
|----|--------------------|--------------------------------------------------------------------------------------------------------------|
29+
| 1 | `AbortJob` | Abort the whole job, but it can be resumed. All pods will be evicted and no pod will be recreated. |
30+
| 2 | `RestartJob` | Restart the whole job. |
31+
| 3 | `RestartTask` | The task will be restarted. This action **cannot** work with job level events such as `Unknown`. |
32+
| 4 | `RestartPod` | The pod will be restarted. This action **cannot** work with job level events such as `Unknown`. |
33+
| 5 | `RestartPartition` | The partition will be restarted. This action **cannot** work with job level events such as `Unknown`. |
34+
| 6 | `TerminateJob` | Terminate the whole job and it **cannot** be resumed. All pods will be evicted and no pod will be recreated. |
35+
| 7 | `CompleteJob` | Regard the job as completed. The unfinished pods will be killed. |
3536

3637
## Examples
3738
1. Set a pair of `event` and `action`.

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ require (
5252
sigs.k8s.io/controller-runtime v0.13.0
5353
sigs.k8s.io/yaml v1.6.0
5454
stathat.com/c/consistent v1.0.0
55-
volcano.sh/apis v1.13.1-0.20251020121257-f562f82b42fd
55+
volcano.sh/apis v1.13.1-0.20251114021538-d1e61c510040
5656
)
5757

5858
require (

go.sum

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -496,5 +496,5 @@ sigs.k8s.io/yaml v1.6.0 h1:G8fkbMSAFqgEFgh4b1wmtzDnioxFCUgTZhlbj5P9QYs=
496496
sigs.k8s.io/yaml v1.6.0/go.mod h1:796bPqUfzR/0jLAl6XjHl3Ck7MiyVv8dbTdyT3/pMf4=
497497
stathat.com/c/consistent v1.0.0 h1:ezyc51EGcRPJUxfHGSgJjWzJdj3NiMU9pNfLNGiXV0c=
498498
stathat.com/c/consistent v1.0.0/go.mod h1:QkzMWzcbB+yQBL2AttO6sgsQS/JSTapcDISJalmCDS0=
499-
volcano.sh/apis v1.13.1-0.20251020121257-f562f82b42fd h1:LdemLOIsCwABz4wS4wTHB6WA7dXz4gdq5TuUj/RZRKA=
500-
volcano.sh/apis v1.13.1-0.20251020121257-f562f82b42fd/go.mod h1:CKQbxVt0o4lTKisC0MonoXWruGFC0S8KU+UuzaZ5E7k=
499+
volcano.sh/apis v1.13.1-0.20251114021538-d1e61c510040 h1:FP8O0jjsPbKiNAY0UOwAkW4mEX5JuCOqdnEfzZteE8k=
500+
volcano.sh/apis v1.13.1-0.20251114021538-d1e61c510040/go.mod h1:CKQbxVt0o4lTKisC0MonoXWruGFC0S8KU+UuzaZ5E7k=

0 commit comments

Comments
 (0)