generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 478
[#5310] workload controller delete refactor #7585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Singularity23x0
wants to merge
202
commits into
kubernetes-sigs:main
from
Singularity23x0:5310-reconciliation-logic
Closed
Changes from 184 commits
Commits
Show all changes
202 commits
Select commit
Hold shift + click to select a range
07e9f65
Impelemented the alfa version of the deletion integration solution.
Singularity23x0 6728e2b
Bump github.com/cert-manager/cert-manager from 1.19.0 to 1.19.1 (#7321)
dependabot[bot] 71b8511
Deflake test (#7325)
pajakd 636383f
[Bugfix] Allow to set ClusterName with ElasticJob (#7278)
mszadkow b9466d5
Bump kueueviz frontend dependencies. (#7335)
mbobrovskyi 5d3187f
Bump cypress in /test/e2e/kueueviz in the all group (#7239)
dependabot[bot] 0b08d3d
Bump node from 24-alpine to 25-alpine in /hack/depcheck (#7323)
dependabot[bot] 9a97939
Bump node from 24-slim to 25-slim in /cmd/kueueviz/frontend (#7324)
dependabot[bot] d7bedee
E2e test for Node HotSwap in TAS with slices (#7142)
pajakd 02c34b8
Enable cache in pod integration tests to fix failure with ManagedJob …
kannon92 b231d05
Fix MultiKueue workload re-evaluation bug (#6732)
ravisantoshgudimetla e05dd5e
Remove unnecessary error check. (#7352)
mbobrovskyi 122318e
Use default cluster names. (#7353)
mbobrovskyi 93e6b50
Enable conversion webhooks for v1beta2: LocalQueue, ClusterQueue, Wor…
mimowo 5dc4f0a
Extend immutable error messages. (#7354)
mbobrovskyi 4f9e8e4
chore: Use utiltesting context in DRA UTs (#7356)
tenzen-y 26d469b
Update main after 0.13.7 (#7360)
mimowo 071dd51
Update main with the latest v0.14.2 (#7359)
tenzen-y db6df11
Deprecate LocalQueueFlavorStatus for v1beta1 and v1beta2 (#7337)
iomarsayed 5909cb3
Add TAS support to the Kubeflow Trainer integration (#7249)
kaisoz a416c2e
Add validation for unsupported DRA features (#7226)
harche adca57b
hotswap reschedule evicted (#7376)
pajakd ab74ebe
v1beta2: graduate Config API (#7375)
mbobrovskyi 3e01953
Align imports for Kueue (#7378)
mimowo e0a733c
Remove workers from Pytorch e2e test. (#7381)
mbobrovskyi 02af80c
Fix Should run a kubeflow PyTorchJob on worker if admitted e2e test. …
mbobrovskyi aa299d3
[Trainer] Use podset label to identify Kueue injected config (#7389)
kaisoz ab990ab
Expose contextualized fair sharing weights for cluster queues as metr…
j-skiba 9a4452a
Bump e2e-test-images/agnhost from 2.57 to 2.59 in /hack/agnhost (#7399)
dependabot[bot] 33ceb56
Use clock on preemption. (#7395)
mbobrovskyi c7ef18f
v1beta2-convert-logic-and-tests (#7369)
mimowo 4d627e6
Split preemptions unit tests. (#7403)
mbobrovskyi 72f0e69
Bump cypress/base from 22.20.0 to 22.21.0 in /hack/cypress (#7402)
dependabot[bot] ecc5785
update documentation to use v1beta2 (#7409)
kannon92 1b6910c
Helm: request conversion webhooks only for types requiring it (#7410)
mimowo b4973f7
replace cohort with cohortName for v1beta2 docs (#7412)
kannon92 7c1f588
add v1beta2 api gen docs (#7414)
kannon92 7b78828
formatting issue: add space after comments for apigeneration tags (#7…
kannon92 361f523
Replace preemtion stub with interceptor function in TestPreemption. (…
mbobrovskyi 1de8dba
Bump the all group in /cmd/kueueviz/frontend with 2 updates (#7405)
mbobrovskyi 54091d1
Bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.1 (#7397)
dependabot[bot] 87d985b
Deprecate QueueVisibility for v1beta2 (#7319)
bobsongplus 5062544
doc: add Kueue configuration v1beta2 API document (#7417)
bobsongplus 7491d35
Support mutating workload priority class. (#7289)
mbobrovskyi 430f1db
enable ssa tags for kubernetes api linter (#7339)
kannon92 17a7ec3
increase topology limits to 16 to match topology updates (#7423)
kannon92 fddf1b7
Bump github.com/onsi/ginkgo/v2 in /hack/internal/tools (#7401)
dependabot[bot] b177dcc
Simplify JobSet ReclaimablePods integration (#7420)
PBundyra cfe148a
promote MultiKueueBatchJobWithManagedBy to beta (#7341)
kannon92 9bba991
Remove duplicate env variables in podSet template. (#7425)
mbobrovskyi a24556e
Fix multikueue/provisioning indexer conflict setup (#7432)
IrvingMg 5891828
Fix SanitizePodSets feature gate version. (#7444)
mbobrovskyi d0338b2
MultiKueue remote client kubeconfig validation (#7439)
mszadkow d81e75b
services: update app.kuberntes.io/component for services (#7371)
rphillips 7c4de98
Add License prefix for helm templates. (#7438)
mbobrovskyi cbc270b
Self-nominate IrvingMg as reviewer for internal tool yaml-processor (…
IrvingMg 56bdbcc
Update main with the latest v0.14.3 (#7455)
mimowo b4edcb9
Add v0.13.8 Release note to CHANGELOG (#7458)
tenzen-y ab8b367
Replace preemtion stub with interceptor function in TestHierarchicalP…
mbobrovskyi 4f8172d
Drop graduated ManagedJobsNamespaceSelector feature gate. (#7466)
mbobrovskyi 76a89e0
v1beta2: graduate the visibility API. (#7411)
mbobrovskyi 8e4abe2
add support for maxlength linter command for kubernetes-api-linter (#…
kannon92 fc3c1c4
Fix feature gates tables. (#7467)
mbobrovskyi 4ce6906
Sync feature gate tables. (#7475)
mbobrovskyi 6121826
Drop graduated ProvisioningACC feature gate. (#7465)
mbobrovskyi 0423898
docs(kep): Create delayed admission check retries KEP (#6210)
dhenkel92 8c973f5
v1beta2: drop types related to QueueVisibility (#7447)
mbobrovskyi 83effda
v1beta2: Remove deprecated retryDelayMinutes field and fix conversion…
nerdeveloper 8970bdb
v1beta2: drop deprecated Flavors field from LocalQueueStatus (#7449)
mbobrovskyi 825c4e3
v1beta2: remove all unnecessary wrappers for v1beta1 (#7481)
mbobrovskyi 5225f29
Replace preemption stub with interceptor function in TestSchedule. (#…
mbobrovskyi f483b4f
Extend kubeconfig validation tests (#7483)
mszadkow 0827c45
Prevent StatefulSet scale-up while workload is being deleted (#7479)
IrvingMg d96094c
Promote AdmissionFairSharing to beta (#7463)
kannon92 cedc241
Replace preemption stub with interceptor function in TestLastScheduli…
mbobrovskyi ef12e72
Enable nomaps and nobools kube api linter (#7489)
kannon92 4aa6d05
Replace preemption stub with interceptor function in scheduler TAS un…
mbobrovskyi c7d7ef3
Completed the initial draft of Delete event refactor.
Singularity23x0 1cef4e8
Finished deletion refactor. Added unit tests.
Singularity23x0 c20b6ac
Remove remote client of insecurely setup cluster (#7486)
mszadkow 5a83c45
Remove applyPreemption stub. (#7507)
mbobrovskyi 07e442e
v1beta2: Remove deprecated PodIntegrationOptions API (#7406)
nerdeveloper f134700
Switch Default TAS Placement Algorithm from BestFit to Mixed. (#7416)
iomarsayed a1fb5ae
Cleanup jobframework log (#7426)
PBundyra 05ba58b
Wrap with Eventually to avoid flake (#7523)
mszadkow b205f5d
Flaky sticky workload - fix (#7528)
mimowo 08319fa
Use Equal instead of Equivalent for asserting Suspend (#7526)
mszadkow 933a228
v1beta2: In FlavorFungibility API migrate Preempt/Borrow to MayStopSe…
mbobrovskyi 490d40d
Graduate ManagedJobsNamespaceSelectorAlwaysRespected feature to Beta …
PannagaRao ff7236f
Add Multikueue and ProvReq integration test (#7505)
IrvingMg a806a48
Add feature gate for reclaimable Pods (#7525)
PBundyra 0e7a10d
Cleanup preemption message generation (#7541)
mszadkow bd6e6fb
Use wrappers in cluster_queue_test.go. (#7543)
mbobrovskyi e3fe657
Finalizer implementation finalized.
Singularity23x0 a77639e
enable optional, required and optionalandrequired linter checks (#7488)
kannon92 1edfeaf
Add preemptor and preemptee path to the Preemption message (#7522)
mszadkow 2544a73
Refactor DRA validation to use field.ErrorList (#7529)
harche daaa5c2
Remove redundant type conversions. (#7545)
mbobrovskyi 2c0904f
Ensure roundtrip success for Quantities (#7430)
brejman 62db2e8
Remove deprecated AdmissionChecks field from v1beta2 ClusterQueue API…
nerdeveloper 50c1fb0
Use GomegaMatcher instead of OmegaMatcher. (#7552)
mbobrovskyi c759957
Use ExpectWorkloadsWithWorkloadPriority and ExpectWorkloadsWithPodPri…
mbobrovskyi f8df1e4
Fix test case to check creation workload with empty priorityClassName…
mbobrovskyi e1a8d18
Update wrappers to use utiltesting alias (#7561)
mszadkow fcf5ea3
Add CHANGELOG for v0.13.9 (#7562)
tenzen-y e8c47b8
Update intergation tests to use utiltesting alias (#7563)
mszadkow 22e249f
Update main with the latest v0.14.4 (#7559)
mimowo 25f46f8
api/kueue/v1beta1: add unit tests for workload conversion (#7546)
sohankunkerkar 1e9c2cd
test: Add conversion unit tests for LocalQueue and ClusterQueue (#7567)
sohankunkerkar 89f6f27
Set default image in wrappers to agnhost (#7551)
mszadkow c00e728
Bump github.com/containerd/containerd in /hack/internal/tools (#7568)
dependabot[bot] ae76730
Use util.RealClock in tests. (#7574)
mbobrovskyi ec1b73d
enable linter via regular expressions (#7571)
kannon92 15a6257
[Cleanup] Update e2e tests to use utiltesting alias (#7564)
mszadkow 3b81164
Fix - Workloads requesting TAS cannot run via MultiKueue (#5361)
IrvingMg e2ecd6a
Fixed DelayedTopologyRequestState enum validation. (#7573)
mbobrovskyi 84b835c
Add documentation for Kubeflow Trainer v2 TrainJob integration with K…
NarayanaSabari 71cd4e1
JobReconciler don't update PodsReady condition timely (#7364)
olderTaoist 7d2f0ff
v1beta2: change the API for Workload's spec.priorityClassSource (#7540)
mbobrovskyi f513d78
Merge branch 'main' into 5310-reconciliation-logic
Singularity23x0 f8b1d9a
Update pkg/controller/core/workload_controller.go
Singularity23x0 d816f66
Update pkg/controller/core/workload_controller.go
Singularity23x0 cf8eebe
Update pkg/controller/core/workload_controller.go
Singularity23x0 488819b
Log levels cleanup.
Singularity23x0 61d5822
Refactor.
Singularity23x0 2759e65
Removed unused constant.
Singularity23x0 6e35455
Added value back for merge purposes.
Singularity23x0 d91bff0
Introduce workload.Finish helper function (#7582)
mszadkow 9b7fa28
Bump Kubeflow Trainer to v2.1.0 (#7586)
IrvingMg c188f41
Bump cypress in /test/e2e/kueueviz in the all group (#7595)
dependabot[bot] c5ba474
Bump github.com/kubeflow/mpi-operator from 0.6.0 to 0.7.0 (#7593)
dependabot[bot] 291ba2c
Bump cypress/base from 22.21.0 to 24.11.0 in /hack/cypress (#7596)
dependabot[bot] 213b2d3
Bump github.com/onsi/ginkgo/v2 from 2.27.1 to 2.27.2 (#7590)
dependabot[bot] a2b16f1
Bump sigs.k8s.io/controller-runtime from 0.22.3 to 0.22.4 (#7591)
dependabot[bot] 2201beb
Bump golang.org/x/sync from 0.17.0 to 0.18.0 (#7592)
dependabot[bot] 6d01bac
Bump github.com/ray-project/kuberay/ray-operator from 1.4.2 to 1.5.0 …
dependabot[bot] abd37c9
Cleanup workload.Finish (#7588)
mszadkow a3bccf8
Bump the all group in /cmd/kueueviz/frontend with 4 updates (#7603)
mbobrovskyi 5fd9a8f
Remove RuntimeInfo wrapper (#7607)
mszadkow 0a438a9
Refactor Pending() and add PendingTotal(). (#7609)
mbobrovskyi d4c69c1
Merge PendingActiveInLocalQueue and PendingInadmissibleInLocalQueue. …
mbobrovskyi 14772ff
Fix Scheduler when ClusterQueue head has inadmissible workload sticky…
mbobrovskyi 0c08737
Remove support for Kubernetes v1.31. (#7623)
mbobrovskyi 1888303
Cleanup logging for Job MultiKueue adapter (#7624)
mbobrovskyi 76d8785
[TAS] Balanced placement (#6851)
pajakd 6e502d4
Restrict logging of nominating with incremental dispatcher (#7619)
mszadkow 1a59050
Make ExpectWorkloadsToBePreempted() more strict. (#7631)
mbobrovskyi 8f3bd28
Add ginkgo.GinkgoHelper() where it was missed. (#7635)
mbobrovskyi cc0fff0
[KEP] FlavorFungability: replace FlavorFungibilityImplicitPreferenceD…
vladikkuzn 780484e
Fix wait_for_images.sh for release candidates. (#7636)
mbobrovskyi 88ff2d5
Add priorities to workload to make the test deterministic (#7630)
mszadkow 05935b3
Remove offset if using ginkgo.GinkgoHelper(). (#7632)
mbobrovskyi 1f9dde2
Use `constants.PodSetLabel` instead of `controllerconsts.PodSetLabel`…
kshalot a341567
fix: fix typo in docs (#7648)
kennygt51 3ff4e09
v1beta2: Delete .enable field from FairSharing API in config (#7583)
mbobrovskyi 23b0ae6
Cleanup of Balanced TAS (#7645)
pajakd d3018e3
Log Sticky Workload Deletion Path (#7654)
gabesaba b542ac9
test: add TestCompareBool (#7651)
kennygt51 2f5dd3a
update to helm 4.0 (#7653)
kannon92 0666991
KEP changes for v1beta2 TopologyAssignment (#7419)
olekzabl 6bdb2dc
TopologyAssignment v1beta2 (#7544)
olekzabl 08014e4
Disable IcrementalDispatcher if not configured (#7638)
mszadkow 44ff55e
Break looping when workload is already known to run on node (#7658)
olekzabl a04a945
Document changing featureGates with configMap (#7652)
MaysaMacedo 5342857
v1beta2: Delete .enable field from WaitForPodsReady API in config (#7…
mbobrovskyi 79846fc
[KEP] FlavorFungability: replace FlavorFungibilityImplicitPreferenceD…
vladikkuzn 1473a73
Check Cq active before the test to avoid flakiness (#7672)
mszadkow 0f8ca03
Fix the MultiKueue flake issue (#7666)
mimowo 71524e3
Rename variable (#7668)
pajakd 0c6635e
Set blockAdmission to false in workload retention docs (#7676)
kannon92 98c692b
Use job key instead of key. (#7681)
mbobrovskyi 03fdaa0
Fix example links in website. (#7685)
mbobrovskyi 7195107
chore: Use structured loggings for localqueue entry penalty (#7680)
tenzen-y b79f9fa
v1beta2: change default for waitForPodsReady.blockAdmission to false …
mbobrovskyi b9f23bb
KEP-2349: Move MultiKueue external custom Job support to Beta (#7669)
khrm a47c3a5
Bump the kubernetes group across 1 directory with 3 updates (#7694)
dependabot[bot] 25f7e06
feat(KEP-3258): implement delayed admission check retries (#7620)
sohankunkerkar 16c6638
Bump cypress/base from 24.11.0 to 24.11.1 in /hack/cypress (#7696)
dependabot[bot] 0b5ed06
Change `common{Prefix,Suffix}` -> `{prefix,suffix}` (#7697)
olekzabl ab2ef7e
Fix inconsistency in the KEP/2349 README.md (#7698)
khrm db35b6f
Check Lq active before the test to avoid flakiness (#7699)
mszadkow ca7924f
Merge branch 'main' into 5310-reconciliation-logic
Singularity23x0 4a229aa
Post merge cleanup.
Singularity23x0 17fb70c
Code stucturing - minor ix.
Singularity23x0 135a260
Applied review comments.
Singularity23x0 8b10c39
Added safety check with finalizers reconciliation.
Singularity23x0 1362d10
Fix AFS docs (#7705)
PBundyra 57ba36e
Balanced refactor (#7700)
pajakd 5d3c532
Optimize triggerDeactivation() logic. (#7711)
mbobrovskyi 0d4b31e
Bump js-yaml from 3.14.1 to 3.14.2 in /test/e2e/kueueviz (#7717)
dependabot[bot] aca2250
Bump js-yaml from 3.14.1 to 3.14.2 in /cmd/kueueviz/frontend (#7718)
dependabot[bot] 0f7a062
Wait for quota reservation before admission in Should readmit preempt…
mbobrovskyi 4943614
add modernize check (#7704)
dongjiang1989 b72dd61
Use UpdateFunc type. (#7719)
mbobrovskyi af15273
Add MultiKueue with Topology-Aware Scheduling setup guide for Kind (#…
IrvingMg 8296466
Use pointer for PatchOptions. (#7721)
mbobrovskyi 000ea24
docs: Add feature gate documentation for MultiKueueAdaptersForCustomJ…
khrm 2a901de
Format fix.
Singularity23x0 510f516
Merge branch 'kubernetes-sigs:main' into 5310-reconciliation-logic
Singularity23x0 476cc52
fix wl controller test
Singularity23x0 7dc5491
WOrklaod controller tests fix.
Singularity23x0 82491dd
Unit tests update.
Singularity23x0 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code wrapped
errwithIgnoreNotFound- IIUC to silence errors in case when the workload has been deleted in the meantime.Have you dropped that wrapping in this PR? If so, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did. The idea is that we do not want the workload deleted outside of the controlled environment safeguarded by the deletion finalizer. As such I advocate for the not found error to be explicit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, from our onboarding I recall that, strictly speaking, there's no guarantee that a particular resource change (in this case - deletion) will be processed by
Reconcileat most once.Though this was theory, and in practice - IDK if it's better to throw on duplicated reconciliations (because they're so rare in practice) or to swallow potential mishandling errors as you described.
So for me it looks like a non-obvious tradeoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay makes sense, I'll amend the logic.