OCM-23390 | feat: prefetch machine types and versions in interactive mode#1078
OCM-23390 | feat: prefetch machine types and versions in interactive mode#1078rcampos2029 wants to merge 1 commit intoopenshift-online:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rcampos2029 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds an execution-scoped, concurrency-safe prefetch cache and background goroutines to asynchronously fetch OpenShift version and machine type options; Changes
Sequence Diagram(s)sequenceDiagram
participant Cmd as Cmd (preRun / Flow)
participant Cache as prefetchCache
participant G1 as Goroutine (MachineTypes)
participant G2 as Goroutine (Versions)
participant API as External Fetchers
participant OCM as OCM Connection
Cmd->>Cache: init per-run cache
Cmd->>OCM: open connection
alt interactive
Cmd->>G1: spawn machine-type prefetch (before auth)
Cmd->>G2: spawn version prefetch (after filters computed)
end
par background fetches
G1->>API: fetch machine types (with params)
G1-->>Cache: store results, signal done
G2->>API: fetch versions (with filters)
G2-->>Cache: store results, signal done
end
Cmd->>Cache: request machine type options (cache-aware)
alt prefetched available
Cache-->>Cmd: return prefetched machine types
else
Cmd->>API: fetch machine types synchronously
API-->>Cmd: return results
end
Cmd->>Cache: request version options (cache-aware)
alt prefetched available
Cache-->>Cmd: return prefetched versions
else
Cmd->>API: fetch versions synchronously
API-->>Cmd: return results
end
Cmd->>Cache: wait for prefetch.wg (deferred)
Cache-->>Cmd: all background fetches complete
Cmd->>OCM: close connection
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
cmd/ocm/create/cluster/cmd.go (4)
206-224: Implicit dependency on globalargsin background goroutine.
getMachineTypeOptions(called at line 216) readsargs.providerandargs.ccs.Enabledfrom global state. While these values are set before the goroutine starts (line 918), this is an implicit contract that's easy to break if the code order changes.Consider passing these values explicitly to make the dependency clear:
♻️ Suggested refactor
-func (p *prefetchCache) startPrefetchMachineTypes(connection *sdk.Connection) { +func (p *prefetchCache) startPrefetchMachineTypes(connection *sdk.Connection, providerName string, ccsEnabled bool) { // ... go func() { - options, err := getMachineTypeOptions(connection) + options, err := provider.GetMachineTypeOptions( + connection.ClustersMgmt().V1(), + providerName, ccsEnabled) // ... }() }Then at call site:
- prefetch.startPrefetchMachineTypes(connection) + prefetch.startPrefetchMachineTypes(connection, args.provider, args.ccs.Enabled)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ocm/create/cluster/cmd.go` around lines 206 - 224, startPrefetchMachineTypes launches a background goroutine that calls getMachineTypeOptions which implicitly reads global args.provider and args.ccs.Enabled; instead make that dependency explicit by changing startPrefetchMachineTypes to accept provider and ccsEnabled (or a small struct) and pass those values into the goroutine so getMachineTypeOptions receives them as parameters (update the getMachineTypeOptions signature and all call sites accordingly), preserving existing mutexed state updates to machineTypeOptions, machineTypeErr, machineTypeFetched and the guard machineTypeFetchStarted to keep behavior unchanged.
929-936: Minor duplication of version filter parameters.The
gcpMarketplaceEnabledandadditionalFilterscomputation is duplicated: once inside the interactive block (lines 929-933) and again outside (lines 972-976). WhilegetVersionFilters()is inexpensive, consolidating this would reduce repetition.♻️ Optional consolidation
+ // Compute version filter parameters (used for prefetch and later fetch) + var gcpMarketplaceEnabled string + if isGcpMarketplace { + gcpMarketplaceEnabled = strconv.FormatBool(isGcpMarketplace) + } + additionalFilters := getVersionFilters() + // Start prefetching versions in background if interactive mode if args.interactive { - var gcpMarketplaceEnabled string - if isGcpMarketplace { - gcpMarketplaceEnabled = strconv.FormatBool(isGcpMarketplace) - } - additionalFilters := getVersionFilters() prefetch.startPrefetchVersions(connection, args.channel, args.channelGroup, gcpMarketplaceEnabled, additionalFilters) } // ... - var gcpMarketplaceEnabled string - if isGcpMarketplace { - gcpMarketplaceEnabled = strconv.FormatBool(isGcpMarketplace) - } - additionalFilters := getVersionFilters() versions, defaultVersion, err := prefetch.getVersionOptionsFromCache(...)Also applies to: 972-976
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ocm/create/cluster/cmd.go` around lines 929 - 936, The code duplicates computing gcpMarketplaceEnabled and additionalFilters before calling prefetch.startPrefetchVersions; instead compute gcpMarketplaceEnabled from isGcpMarketplace and call getVersionFilters() once and reuse those values for both the interactive branch and the non-interactive branch so prefetch.startPrefetchVersions(args.channel, args.channelGroup, gcpMarketplaceEnabled, additionalFilters) always uses the same variables; update the logic around isGcpMarketplace, gcpMarketplaceEnabled, getVersionFilters(), and the call to prefetch.startPrefetchVersions to remove the repeated computation and reuse the single derived variables.
172-183: Goroutines may use connection after it's closed.The
connectionpassed to these goroutines is closed viadeferwhenpreRunreturns (line 849). If the prefetch hasn't completed by then, the goroutine will attempt API calls on a closed connection. While the fallback sync fetch handles the work, this could cause silent errors or confusing logs.Consider either:
- Using a
context.Contextwith cancellation to abort in-flight requests- Using a
sync.WaitGroupto wait for goroutines before closing the connection- Accepting this as acceptable behavior since results are already fetched via fallback
Also applies to: 215-223
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ocm/create/cluster/cmd.go` around lines 172 - 183, The background goroutine calls getVersionOptionsWithDefault using the shared connection which is closed by preRun's defer, risking use-after-close; fix by wiring cancellation/waiting: either (preferred) add a context.Context parameter to getVersionOptionsWithDefault and pass the preRun ctx into the goroutine, check ctx in long-running requests and return early if cancelled so the goroutine never uses a closed connection, or (alternative) introduce a sync.WaitGroup in preRun, call wg.Add(1) before starting the goroutine and wg.Done() at its end, and call wg.Wait() before the deferred connection.Close() so all goroutines finish before closing; reference getVersionOptionsWithDefault, the anonymous goroutine, p.versionOptions/p.versionErr, p.mu and the connection used in preRun.
186-203: Cache getters don't wait for in-progress prefetch.If prefetch is started but not yet complete when the getter is called, a redundant synchronous fetch is performed instead of waiting for the in-progress operation. This is functionally correct but potentially wasteful.
For improved efficiency, consider using a
sync.Condor a completion channel to wait for in-progress fetches:♻️ Sketch of waiting pattern using sync.Cond
type prefetchCache struct { mu sync.Mutex + cond *sync.Cond // ... } +func newPrefetchCache() *prefetchCache { + p := &prefetchCache{} + p.cond = sync.NewCond(&p.mu) + return p +} func (p *prefetchCache) getVersionOptionsFromCache(...) (...) { p.mu.Lock() + for p.versionFetchStarted && !p.versionFetched { + p.cond.Wait() + } if p.versionFetched { // ... } p.mu.Unlock() // ... } // In the goroutine, after setting versionFetched: + p.cond.Broadcast()Also applies to: 226-238
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ocm/create/cluster/cmd.go` around lines 186 - 203, The getter prefetchCache.getVersionOptionsFromCache currently returns immediately and triggers a redundant fetch if prefetch is in-progress; change it to wait for the in-progress prefetch to finish by adding a synchronization primitive (e.g. add a *sync.Cond or a completion channel on the prefetchCache struct) and use it inside getVersionOptionsFromCache: when mu is held and versionFetched is false, wait (cond.Wait or receive on channel) until the fetch completes, then return p.versionOptions, p.defaultVersion, p.versionErr; ensure the prefetch code signals/broadcasts on completion (and sets versionFetched/versionErr) so waiters proceed; apply the same pattern to the other cache getter referenced around lines 226-238 (the analogous function that reads versionFetched/versionOptions/defaultVersion/versionErr and uses p.mu).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cmd/ocm/create/cluster/cmd.go`:
- Around line 206-224: startPrefetchMachineTypes launches a background goroutine
that calls getMachineTypeOptions which implicitly reads global args.provider and
args.ccs.Enabled; instead make that dependency explicit by changing
startPrefetchMachineTypes to accept provider and ccsEnabled (or a small struct)
and pass those values into the goroutine so getMachineTypeOptions receives them
as parameters (update the getMachineTypeOptions signature and all call sites
accordingly), preserving existing mutexed state updates to machineTypeOptions,
machineTypeErr, machineTypeFetched and the guard machineTypeFetchStarted to keep
behavior unchanged.
- Around line 929-936: The code duplicates computing gcpMarketplaceEnabled and
additionalFilters before calling prefetch.startPrefetchVersions; instead compute
gcpMarketplaceEnabled from isGcpMarketplace and call getVersionFilters() once
and reuse those values for both the interactive branch and the non-interactive
branch so prefetch.startPrefetchVersions(args.channel, args.channelGroup,
gcpMarketplaceEnabled, additionalFilters) always uses the same variables; update
the logic around isGcpMarketplace, gcpMarketplaceEnabled, getVersionFilters(),
and the call to prefetch.startPrefetchVersions to remove the repeated
computation and reuse the single derived variables.
- Around line 172-183: The background goroutine calls
getVersionOptionsWithDefault using the shared connection which is closed by
preRun's defer, risking use-after-close; fix by wiring cancellation/waiting:
either (preferred) add a context.Context parameter to
getVersionOptionsWithDefault and pass the preRun ctx into the goroutine, check
ctx in long-running requests and return early if cancelled so the goroutine
never uses a closed connection, or (alternative) introduce a sync.WaitGroup in
preRun, call wg.Add(1) before starting the goroutine and wg.Done() at its end,
and call wg.Wait() before the deferred connection.Close() so all goroutines
finish before closing; reference getVersionOptionsWithDefault, the anonymous
goroutine, p.versionOptions/p.versionErr, p.mu and the connection used in
preRun.
- Around line 186-203: The getter prefetchCache.getVersionOptionsFromCache
currently returns immediately and triggers a redundant fetch if prefetch is
in-progress; change it to wait for the in-progress prefetch to finish by adding
a synchronization primitive (e.g. add a *sync.Cond or a completion channel on
the prefetchCache struct) and use it inside getVersionOptionsFromCache: when mu
is held and versionFetched is false, wait (cond.Wait or receive on channel)
until the fetch completes, then return p.versionOptions, p.defaultVersion,
p.versionErr; ensure the prefetch code signals/broadcasts on completion (and
sets versionFetched/versionErr) so waiters proceed; apply the same pattern to
the other cache getter referenced around lines 226-238 (the analogous function
that reads versionFetched/versionOptions/defaultVersion/versionErr and uses
p.mu).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8ac53fc4-3072-437d-aa14-e3ed971adfdc
📒 Files selected for processing (1)
cmd/ocm/create/cluster/cmd.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/ocm/create/cluster/cmd.go`:
- Around line 213-221: The current getters return cached errors when
p.versionFetched/p.machineTypeFetched is true, making transient prefetch
failures terminal; change the logic in the version getter (symbols:
p.versionFetched, p.versionOptions, p.defaultVersion, p.versionErr,
getVersionOptionsWithDefault) so that if p.versionFetched is true but
p.versionErr != nil you treat it like a cache miss: release the mutex, call
getVersionOptionsWithDefault synchronously, re-acquire the mutex and return the
fresh results (and update cached fields only on success), and apply the same
pattern to the machine-type getter (the analogous fetched/options/error symbols)
so background errors do not prevent a foreground retry.
- Around line 888-892: The deferred cleanup currently blocks on
prefetch.wg.Wait() and connection.Close(), which causes early validation errors
or interactive aborts to stall until speculative prefetch goroutines finish;
change the cleanup so the defer only closes the connection (call
connection.Close()) and do not wait on prefetch.wg there, and instead make
prefetch goroutines cancellable (use a context with cancel or a done channel)
and ensure any wait for prefetch.wg is invoked only after successful completion
of the operation that required prefetch (not on every early exit); update the
prefetch start sites to accept the cancelable context and call cancel on early
error/abort so prefetch goroutines can exit promptly rather than the deferred
wg.Wait blocking exit.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e6bc6545-22ca-465c-8405-6f75063d80b5
📒 Files selected for processing (1)
cmd/ocm/create/cluster/cmd.go
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
cmd/ocm/create/cluster/cmd.go (1)
217-221:⚠️ Potential issue | 🟡 MinorLine exceeds maximum length and unlock/relock pattern is fragile.
Line 219 is 134 characters (max 120). Additionally, the manual unlock at line 218, synchronous fetch, then relock at line 220 combined with
defer p.mu.Unlock()at line 206 is correct but fragile—the deferred unlock runs after the manual relock, which works but is easy to misread.Consider restructuring to avoid the unlock/relock dance:
Suggested refactor
func (p *prefetchCache) getVersionOptionsFromCache( connection *sdk.Connection, channel string, channelGroup string, gcpMarketplaceEnabled string, additionalFilters string, ) ([]arguments.Option, string, error) { p.mu.Lock() - defer p.mu.Unlock() // Wait for in-progress fetch to complete for p.versionFetchStarted && !p.versionFetched { p.versionCond.Wait() } - if p.versionFetched { - return p.versionOptions, p.defaultVersion, p.versionErr + if p.versionFetched && p.versionErr == nil { + opts, ver := p.versionOptions, p.defaultVersion + p.mu.Unlock() + return opts, ver, nil } // Not prefetched or prefetch failed, fetch synchronously p.mu.Unlock() - options, defaultVer, err := getVersionOptionsWithDefault(connection, channel, channelGroup, gcpMarketplaceEnabled, additionalFilters) - p.mu.Lock() - return options, defaultVer, err + return getVersionOptionsWithDefault( + connection, channel, channelGroup, gcpMarketplaceEnabled, additionalFilters) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ocm/create/cluster/cmd.go` around lines 217 - 221, The code currently does a manual p.mu.Unlock(), calls getVersionOptionsWithDefault(...), then p.mu.Lock() while there's also a defer p.mu.Unlock() earlier, which is fragile and exceeds line length; refactor so the critical-section lock is scoped (e.g., wrap the portion that needs p.mu in an anonymous func that does p.mu.Lock() / defer p.mu.Unlock()) and perform the synchronous call to getVersionOptionsWithDefault(connection, channel, channelGroup, gcpMarketplaceEnabled, additionalFilters) outside that scoped lock; this removes the manual unlock/relock dance around getVersionOptionsWithDefault and shortens the long line.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/ocm/create/cluster/cmd.go`:
- Around line 770-776: The call in getMachineTypeOptionsWithParams is too long;
refactor the return statement so the provider.GetMachineTypeOptions invocation
breaks its arguments across multiple lines (for example place
connection.ClustersMgmt().V1(), providerType, and ccsEnabled each on their own
line) to keep line length under 120 characters while preserving the same
parameters and return values.
---
Duplicate comments:
In `@cmd/ocm/create/cluster/cmd.go`:
- Around line 217-221: The code currently does a manual p.mu.Unlock(), calls
getVersionOptionsWithDefault(...), then p.mu.Lock() while there's also a defer
p.mu.Unlock() earlier, which is fragile and exceeds line length; refactor so the
critical-section lock is scoped (e.g., wrap the portion that needs p.mu in an
anonymous func that does p.mu.Lock() / defer p.mu.Unlock()) and perform the
synchronous call to getVersionOptionsWithDefault(connection, channel,
channelGroup, gcpMarketplaceEnabled, additionalFilters) outside that scoped
lock; this removes the manual unlock/relock dance around
getVersionOptionsWithDefault and shortens the long line.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 59b5d0dc-13d5-40a2-a144-7a4a7a9ba1b3
📒 Files selected for processing (1)
cmd/ocm/create/cluster/cmd.go
…mode Optimize interactive cluster creation by prefetching machine types and versions in background while user answers other prompts, reducing wait time. Implementation details: - Background goroutines fetch data early using explicit parameters to avoid implicit dependencies on global state - sync.WaitGroup tracks goroutines; sync.Cond allows cache getters to wait for in-progress fetches instead of triggering redundant requests - Context with cancellation allows early exits (validation errors, user cancellation) to terminate immediately without blocking on network work - Failed prefetches are treated as cache misses and retried synchronously, preventing transient errors from becoming terminal - Version filter parameters computed once to avoid duplication Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
cmd/ocm/create/cluster/cmd.go (2)
253-267: Same condition variable signaling concern asstartPrefetchVersions.The early return on context cancellation (lines 254-258) has the same issue: waiters in
getMachineTypeOptionsFromCachecould block indefinitely if the goroutine exits without callingmachineTypeCond.Broadcast(). Apply the same fix pattern suggested above.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ocm/create/cluster/cmd.go` around lines 253 - 267, The goroutine that sets p.machineTypeOptions exits early on context cancellation without signaling p.machineTypeCond, which can leave waiters in getMachineTypeOptionsFromCache blocked; modify the early-return path so that before returning you acquire p.mu, set p.machineTypeFetched = true (and set p.machineTypeErr to context.Canceled or the current err), call p.machineTypeCond.Broadcast(), and then p.mu.Unlock(); this ensures any waiters are woken even when the context is canceled and mirrors the fix used for startPrefetchVersions.
191-206: Potential deadlock if context is cancelled after fetch completes but before storing results.If the context is cancelled between the
getVersionOptionsWithDefaultcall completing and storing results (lines 192-197), the goroutine returns without settingversionFetched = trueor callingversionCond.Broadcast(). Any goroutine waiting ingetVersionOptionsFromCachewould wait indefinitely on the condition variable (line 221).While the current code flow prevents this in practice (cancel is only called after preRun returns, and cache getters are within preRun), this pattern is fragile and could cause issues with future refactoring or in tests.
Consider always signaling completion, storing the cancellation as the error:
Suggested fix
go func() { defer p.wg.Done() options, defaultVer, err := getVersionOptionsWithDefault( connection, channel, channelGroup, gcpMarketplaceEnabled, additionalFilters) // Check if context was cancelled before storing results select { case <-p.ctx.Done(): - // Context cancelled, don't store results - return + // Context cancelled, store cancellation error so waiters don't block + err = p.ctx.Err() default: } p.mu.Lock() p.versionOptions = options p.defaultVersion = defaultVer p.versionErr = err p.versionFetched = true p.versionCond.Broadcast() p.mu.Unlock() }()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/ocm/create/cluster/cmd.go` around lines 191 - 206, The goroutine that calls getVersionOptionsWithDefault may return without setting p.versionFetched or calling p.versionCond.Broadcast() if p.ctx.Done() is observed after the fetch; to fix, always acquire p.mu and set p.versionOptions, p.defaultVersion and p.versionFetched = true and call p.versionCond.Broadcast() before returning—if the context is cancelled set p.versionErr = context.Canceled (or p.ctx.Err()) or combine the fetch error with p.ctx.Err() so waiting callers in getVersionOptionsFromCache are unblocked and receive an error instead of hanging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cmd/ocm/create/cluster/cmd.go`:
- Around line 253-267: The goroutine that sets p.machineTypeOptions exits early
on context cancellation without signaling p.machineTypeCond, which can leave
waiters in getMachineTypeOptionsFromCache blocked; modify the early-return path
so that before returning you acquire p.mu, set p.machineTypeFetched = true (and
set p.machineTypeErr to context.Canceled or the current err), call
p.machineTypeCond.Broadcast(), and then p.mu.Unlock(); this ensures any waiters
are woken even when the context is canceled and mirrors the fix used for
startPrefetchVersions.
- Around line 191-206: The goroutine that calls getVersionOptionsWithDefault may
return without setting p.versionFetched or calling p.versionCond.Broadcast() if
p.ctx.Done() is observed after the fetch; to fix, always acquire p.mu and set
p.versionOptions, p.defaultVersion and p.versionFetched = true and call
p.versionCond.Broadcast() before returning—if the context is cancelled set
p.versionErr = context.Canceled (or p.ctx.Err()) or combine the fetch error with
p.ctx.Err() so waiting callers in getVersionOptionsFromCache are unblocked and
receive an error instead of hanging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 67e2e7d1-3a43-48d0-99c3-4e75068388bd
📒 Files selected for processing (1)
cmd/ocm/create/cluster/cmd.go
| defer func() { | ||
| // Cancel any background prefetch operations and close connection | ||
| // Don't wait for goroutines to complete - early exits should not block on speculative work | ||
| prefetch.cancel() |
There was a problem hiding this comment.
| prefetch.cancel() | |
| // Cancel any background prefetch operations and wait for goroutines to finish | |
| // before closing the connection they may be using | |
| prefetch.cancel() | |
| prefetch.wg.Wait() |
Description
Implement background prefetching for machine types and version options in interactive cluster creation to reduce user wait time.
Problem
In interactive mode for
ocm create cluster, users experience delays when prompted for machine types and versions because these API calls can take several seconds to complete.Solution
Prefetch these slow API calls in the background:
Benefits
Technical Details
Changes
Modified
cmd/ocm/create/cluster/cmd.go:Testing
Related Jira
https://issues.redhat.com/browse/OCM-23390