Conversation
Signed-off-by: Qi Wang <qiwan@redhat.com>
|
Skipping CI for Draft Pull Request. |
WalkthroughThis PR introduces CRIOCredentialProviderConfig resource support to the machine configuration system, enabling automatic management of CRI-O credential provider configurations across MachineConfigPools. Changes span RBAC permissions, controller implementation with observer wiring and lifecycle handling, helper functions for credential provider config rendering and merging, constants, kubelet service templates, and comprehensive test coverage. Changes
Sequence Diagram(s)sequenceDiagram
participant Watch as Informer<br/>(Watch)
participant Controller as Controller<br/>(Sync)
participant ConfigPool as MachineConfigPool<br/>(Iteration)
participant Ignition as Ignition<br/>(Generation)
participant MachineConfig as MachineConfig<br/>(Update)
participant Status as Status<br/>(Propagation)
Watch->>Controller: CRIOCredentialProviderConfig Added/Updated/Deleted
Controller->>Controller: Enqueue config reload in criocpQueue
Controller->>Controller: syncCRIOCredentialProviderConfig()
Controller->>ConfigPool: Iterate MachineConfigPools
ConfigPool->>Ignition: generateOriginalCredentialProviderConfig()
Ignition->>Ignition: Parse/merge credential provider config
Ignition->>Ignition: Generate drop-in unit (if needed)
Ignition->>Ignition: crioCredentialProviderConfigIgnition()
Ignition-->>Controller: Ignition content
Controller->>MachineConfig: Apply changes via ignition
Controller->>Status: syncCRIOCredentialProviderConfigStatusOnly()
Status->>Status: Update observedGeneration & conditions
Status-->>Controller: Status updated
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: QiWang19 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
pkg/controller/container-runtime-config/container_runtime_config_controller_test.go (1)
2167-2172: Exercise both rendered pool outputs in these CRIOCP tests.These assertions only inspect
mcs.Name, but the controller renders role-specific content (pool.Nameflows intocrioCredentialProviderConfigIgnition). A worker-only regression—or an unexpected drop-in on AWS/GCP/Azure—would still pass here. Please verify both generated MachineConfigs and the “no drop-in” branches as well.Also applies to: 2221-2227, 2254-2254, 2316-2320
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/container-runtime-config/container_runtime_config_controller_test.go` around lines 2167 - 2172, The test only calls f.verifyCRIOCredentialProviderConfigContents with mcs.Name, so it misses verifying the role-specific MachineConfig rendered for each pool; update the test to call verifyCRIOCredentialProviderConfigContents for both mcs.Name and the generated MachineConfig name that includes pool.Name (i.e., exercise the worker and master pool outputs), passing the same criocp and verifyOpts, and add assertions for the "no drop-in" branch by setting verifyOpts.expectDropin=false and expectDropinPath empty when platform is AWS/GCP/Azure to ensure both presence and absence of constants.KubeletCrioImageCredProviderConfPath are checked.pkg/controller/container-runtime-config/helpers.go (1)
1356-1357: Minor: Redundant condition check.The check
crioCredProviderExist && crioCredProviderIdx != -1is redundant sincecrioCredProviderIdxis set to a non-negative value in the same block wherecrioCredProviderExistbecomes true (lines 1305-1306).♻️ Proposed simplification
- if crioCredProviderExist && crioCredProviderIdx != -1 { + if crioCredProviderExist { credProviderConfigObject.Providers[crioCredProviderIdx].MatchImages = images🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/container-runtime-config/helpers.go` around lines 1356 - 1357, Remove the redundant boolean check by relying only on crioCredProviderIdx to determine presence: when updating credProviderConfigObject.Providers[crioCredProviderIdx].MatchImages, drop crioCredProviderExist && and keep the index check (crioCredProviderIdx != -1) or, even better, assume the index is valid in the same block that sets it; update the block that currently uses crioCredProviderExist and crioCredProviderIdx to reference only crioCredProviderIdx (and ensure Providers and MatchImages are accessed via credProviderConfigObject.Providers[crioCredProviderIdx]).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`:
- Around line 1189-1218: The status updates for Validated and
MachineConfigRendered are being written inside the per-pool retry loop, allowing
later pools to overwrite earlier pool results; change the logic to aggregate
per-pool results first and write resource-level status once after looping all
pools: call crioCredentialProviderConfigIgnition and syncIgnitionConfig for each
pool as you do now but collect overlappedEntries (merge unique entries), any
per-pool errors, and a boolean "anyApplied" across pools; remove the per-pool
calls to syncCRIOCredentialProviderConfigStatusOnly and instead after the pool
loop call syncCRIOCredentialProviderConfigStatusOnly once — if merged
overlappedEntries non-empty set ConditionTypeValidated with
ReasonConfigurationPartiallyApplied and include the aggregated list, and for
ConditionTypeMachineConfigRendered set ReasonMachineConfigRenderingFailed if any
error occurred or ReasonMachineConfigRenderingSucceeded only if all pools
succeeded (use the collected anyApplied/anyError flags).
- Around line 549-566: In handleQueueErr ensure maxRetries is an actual cap:
when queue.NumRequeues(key) >= maxRetries you should call
utilruntime.HandleError(err), log the drop, call queue.Forget(key) and NOT
requeue the key; remove the queue.AddAfter(key, 1*time.Minute) in that branch
(or only use AddAfter before Forget if you intend a single delayed retry, but do
not re-add after forgetting), updating the logic around handleQueueErr,
queue.Forget, queue.AddAfter so permanent/validation errors are not retried
forever.
In `@test/framework/envtest.go`:
- Around line 119-121: Add handling for the CRIOCredentialProviderConfig type in
the envtest shared helpers so CreateObjects and CleanEnvironment don't treat it
as unknown or leak instances: update the type-switches/known-type lists used by
CreateObjects, CleanEnvironment (and any deleteObject/ensureDeleted helpers) to
include the CRIOCredentialProviderConfig kind/name (and its API group/version
used in the CRD), implement creation logic in CreateObjects and
deletion/checking logic in CleanEnvironment/delete helpers to create/delete that
resource just like the other vendor CRD types referenced in the file.
---
Nitpick comments:
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller_test.go`:
- Around line 2167-2172: The test only calls
f.verifyCRIOCredentialProviderConfigContents with mcs.Name, so it misses
verifying the role-specific MachineConfig rendered for each pool; update the
test to call verifyCRIOCredentialProviderConfigContents for both mcs.Name and
the generated MachineConfig name that includes pool.Name (i.e., exercise the
worker and master pool outputs), passing the same criocp and verifyOpts, and add
assertions for the "no drop-in" branch by setting verifyOpts.expectDropin=false
and expectDropinPath empty when platform is AWS/GCP/Azure to ensure both
presence and absence of constants.KubeletCrioImageCredProviderConfPath are
checked.
In `@pkg/controller/container-runtime-config/helpers.go`:
- Around line 1356-1357: Remove the redundant boolean check by relying only on
crioCredProviderIdx to determine presence: when updating
credProviderConfigObject.Providers[crioCredProviderIdx].MatchImages, drop
crioCredProviderExist && and keep the index check (crioCredProviderIdx != -1)
or, even better, assume the index is valid in the same block that sets it;
update the block that currently uses crioCredProviderExist and
crioCredProviderIdx to reference only crioCredProviderIdx (and ensure Providers
and MatchImages are accessed via
credProviderConfigObject.Providers[crioCredProviderIdx]).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ff4b7c37-4de5-4304-91d1-387d9b9d8c5f
📒 Files selected for processing (13)
install/0000_80_machine-config_00_rbac.yamlmanifests/machineconfigcontroller/clusterrole.yamlpkg/apihelpers/apihelpers.gopkg/controller/container-runtime-config/container_runtime_config_controller.gopkg/controller/container-runtime-config/container_runtime_config_controller_test.gopkg/controller/container-runtime-config/helpers.gopkg/controller/container-runtime-config/helpers_test.gopkg/daemon/constants/constants.gotemplates/arbiter/01-arbiter-kubelet/_base/units/kubelet.service.yamltemplates/arbiter/01-arbiter-kubelet/on-prem/units/kubelet.service.yamltemplates/master/01-master-kubelet/on-prem/units/kubelet.service.yamltemplates/worker/01-worker-kubelet/on-prem/units/kubelet.service.yamltest/framework/envtest.go
| // handleQueueErr handles errors for a given queue and resource type. | ||
| func (ctrl *Controller) handleQueueErr(queue workqueue.TypedRateLimitingInterface[string], resourceType string, err error, key string) { | ||
| if err == nil { | ||
| ctrl.imgQueue.Forget(key) | ||
| queue.Forget(key) | ||
| return | ||
| } | ||
|
|
||
| if ctrl.imgQueue.NumRequeues(key) < maxRetries { | ||
| klog.V(2).Infof("Error syncing image config %v: %v", key, err) | ||
| ctrl.imgQueue.AddRateLimited(key) | ||
| if queue.NumRequeues(key) < maxRetries { | ||
| klog.V(2).Infof("Error syncing %s %v: %v", resourceType, key, err) | ||
| queue.AddRateLimited(key) | ||
| return | ||
| } | ||
|
|
||
| utilruntime.HandleError(err) | ||
| klog.V(2).Infof("Dropping image config %q out of the queue: %v", key, err) | ||
| ctrl.imgQueue.Forget(key) | ||
| ctrl.imgQueue.AddAfter(key, 1*time.Minute) | ||
| klog.V(2).Infof("Dropping %s %q out of the queue: %v", resourceType, key, err) | ||
| queue.Forget(key) | ||
| queue.AddAfter(key, 1*time.Minute) | ||
| } |
There was a problem hiding this comment.
maxRetries is no longer a real cap.
Line 565 re-adds the key after Forget, so permanently invalid configs and hard failures will be retried forever once per minute. That contradicts the contract documented above and turns validation errors into endless log/queue churn for all three workers.
♻️ Suggested fix
func (ctrl *Controller) handleQueueErr(queue workqueue.TypedRateLimitingInterface[string], resourceType string, err error, key string) {
if err == nil {
queue.Forget(key)
return
}
if queue.NumRequeues(key) < maxRetries {
klog.V(2).Infof("Error syncing %s %v: %v", resourceType, key, err)
queue.AddRateLimited(key)
return
}
utilruntime.HandleError(err)
klog.V(2).Infof("Dropping %s %q out of the queue: %v", resourceType, key, err)
queue.Forget(key)
- queue.AddAfter(key, 1*time.Minute)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // handleQueueErr handles errors for a given queue and resource type. | |
| func (ctrl *Controller) handleQueueErr(queue workqueue.TypedRateLimitingInterface[string], resourceType string, err error, key string) { | |
| if err == nil { | |
| ctrl.imgQueue.Forget(key) | |
| queue.Forget(key) | |
| return | |
| } | |
| if ctrl.imgQueue.NumRequeues(key) < maxRetries { | |
| klog.V(2).Infof("Error syncing image config %v: %v", key, err) | |
| ctrl.imgQueue.AddRateLimited(key) | |
| if queue.NumRequeues(key) < maxRetries { | |
| klog.V(2).Infof("Error syncing %s %v: %v", resourceType, key, err) | |
| queue.AddRateLimited(key) | |
| return | |
| } | |
| utilruntime.HandleError(err) | |
| klog.V(2).Infof("Dropping image config %q out of the queue: %v", key, err) | |
| ctrl.imgQueue.Forget(key) | |
| ctrl.imgQueue.AddAfter(key, 1*time.Minute) | |
| klog.V(2).Infof("Dropping %s %q out of the queue: %v", resourceType, key, err) | |
| queue.Forget(key) | |
| queue.AddAfter(key, 1*time.Minute) | |
| } | |
| // handleQueueErr handles errors for a given queue and resource type. | |
| func (ctrl *Controller) handleQueueErr(queue workqueue.TypedRateLimitingInterface[string], resourceType string, err error, key string) { | |
| if err == nil { | |
| queue.Forget(key) | |
| return | |
| } | |
| if queue.NumRequeues(key) < maxRetries { | |
| klog.V(2).Infof("Error syncing %s %v: %v", resourceType, key, err) | |
| queue.AddRateLimited(key) | |
| return | |
| } | |
| utilruntime.HandleError(err) | |
| klog.V(2).Infof("Dropping %s %q out of the queue: %v", resourceType, key, err) | |
| queue.Forget(key) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`
around lines 549 - 566, In handleQueueErr ensure maxRetries is an actual cap:
when queue.NumRequeues(key) >= maxRetries you should call
utilruntime.HandleError(err), log the drop, call queue.Forget(key) and NOT
requeue the key; remove the queue.AddAfter(key, 1*time.Minute) in that branch
(or only use AddAfter before Forget if you intend a single delayed retry, but do
not re-add after forgetting), updating the logic around handleQueueErr,
queue.Forget, queue.AddAfter so permanent/validation errors are not retried
forever.
| if err := retry.RetryOnConflict(updateBackoff, func() error { | ||
|
|
||
| credentialProviderConfigIgn, overlappedEntries, err := crioCredentialProviderConfigIgnition(ctrl.templatesDir, controllerConfig, pool.Name, crioCredentialProviderConfig) | ||
| if err != nil { | ||
| ctrl.syncCRIOCredentialProviderConfigStatusOnly(err, apicfgv1alpha1.ConditionTypeMachineConfigRendered, apicfgv1alpha1.ReasonMachineConfigRenderingFailed, "could not generate CRIOCredentialProvider Ignition config: %v", err) | ||
| return err | ||
| } | ||
| if len(overlappedEntries) > 0 { | ||
| ctrl.syncCRIOCredentialProviderConfigStatusOnly(nil, apicfgv1alpha1.ConditionTypeValidated, apicfgv1alpha1.ReasonConfigurationPartiallyApplied, "CRIOCredentialProviderConfig has one or multiple entries that overlap with the original credential provider config. Skip rendering entries: %v.", overlappedEntries) | ||
| } else { | ||
| ctrl.syncCRIOCredentialProviderConfigStatusOnly(nil, apicfgv1alpha1.ConditionTypeValidated, "") | ||
| } | ||
|
|
||
| applied, err = ctrl.syncIgnitionConfig(managedKeyCredentialProvider, credentialProviderConfigIgn, pool, ownerReferenceCredentialProviderConfig(crioCredentialProviderConfig)) | ||
| if err != nil { | ||
| ctrl.syncCRIOCredentialProviderConfigStatusOnly(err, apicfgv1alpha1.ConditionTypeMachineConfigRendered, apicfgv1alpha1.ReasonMachineConfigRenderingFailed, "could not sync CRIOCredentialProvider Ignition config: %v", err) | ||
| return err | ||
| } | ||
|
|
||
| return nil | ||
| }); err != nil { | ||
| ctrl.syncCRIOCredentialProviderConfigStatusOnly(err, apicfgv1alpha1.ConditionTypeMachineConfigRendered, apicfgv1alpha1.ReasonMachineConfigRenderingFailed, "could not Create/Update MachineConfig: %v", err) | ||
| return err | ||
| } | ||
|
|
||
| if applied { | ||
| klog.Infof("Applied CRIOCredentialProviderConfig cluster on MachineConfigPool %v", pool.Name) | ||
| ctrlcommon.UpdateStateMetric(ctrlcommon.MCCSubControllerState, "machine-config-controller-container-runtime-config", "Sync CRIO Credential Provider Config", pool.Name) | ||
| } | ||
| ctrl.syncCRIOCredentialProviderConfigStatusOnly(nil, apicfgv1alpha1.ConditionTypeMachineConfigRendered, apicfgv1alpha1.ReasonMachineConfigRenderingSucceeded) |
There was a problem hiding this comment.
Aggregate CRIOCP status across pools before writing it.
crioCredentialProviderConfigIgnition(..., pool.Name, ...) is role-specific, but Validated and MachineConfigRendered are updated inside the per-pool loop. If master and worker resolve different overlap sets, the last pool processed can overwrite an earlier ConfigurationPartiallyApplied result with Succeeded, which misreports the resource-wide state.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@pkg/controller/container-runtime-config/container_runtime_config_controller.go`
around lines 1189 - 1218, The status updates for Validated and
MachineConfigRendered are being written inside the per-pool retry loop, allowing
later pools to overwrite earlier pool results; change the logic to aggregate
per-pool results first and write resource-level status once after looping all
pools: call crioCredentialProviderConfigIgnition and syncIgnitionConfig for each
pool as you do now but collect overlappedEntries (merge unique entries), any
per-pool errors, and a boolean "anyApplied" across pools; remove the per-pool
calls to syncCRIOCredentialProviderConfigStatusOnly and instead after the pool
loop call syncCRIOCredentialProviderConfigStatusOnly once — if merged
overlappedEntries non-empty set ConditionTypeValidated with
ReasonConfigurationPartiallyApplied and include the aggregated list, and for
ConditionTypeMachineConfigRendered set ReasonMachineConfigRenderingFailed if any
error occurred or ReasonMachineConfigRenderingSucceeded only if all pools
succeeded (use the collected anyApplied/anyError flags).
| filepath.Join("..", "..", "vendor", "github.com", "openshift", "api", "config", "v1alpha1", "zz_generated.crd-manifests", "0000_10_config-operator_01_criocredentialproviderconfigs-CustomNoUpgrade.crd.yaml"), | ||
| filepath.Join("..", "..", "vendor", "github.com", "openshift", "api", "config", "v1alpha1", "zz_generated.crd-manifests", "0000_10_config-operator_01_criocredentialproviderconfigs-DevPreviewNoUpgrade.crd.yaml"), | ||
| filepath.Join("..", "..", "vendor", "github.com", "openshift", "api", "config", "v1alpha1", "zz_generated.crd-manifests", "0000_10_config-operator_01_criocredentialproviderconfigs-TechPreviewNoUpgrade.crd.yaml"), |
There was a problem hiding this comment.
Extend the shared envtest helpers for the new CRD.
Lines 119-121 install the CRIOCredentialProviderConfig CRDs, but Lines 131-227, 229-325, and 327-414 still don't check, delete, or create that type. Tests using CreateObjects / CleanEnvironment will either fall into the Unknown object type branch or leak the resource between cases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/framework/envtest.go` around lines 119 - 121, Add handling for the
CRIOCredentialProviderConfig type in the envtest shared helpers so CreateObjects
and CleanEnvironment don't treat it as unknown or leak instances: update the
type-switches/known-type lists used by CreateObjects, CleanEnvironment (and any
deleteObject/ensureDeleted helpers) to include the CRIOCredentialProviderConfig
kind/name (and its API group/version used in the CRD), implement creation logic
in CreateObjects and deletion/checking logic in CleanEnvironment/delete helpers
to create/delete that resource just like the other vendor CRD types referenced
in the file.
- What I did
- How to verify it
- Description for the changelog
Summary by CodeRabbit
Release Notes
New Features
Chores