Skip to content

perf: reuse Apple runner cache across version bumps#900

Open
thymikee wants to merge 7 commits into
mainfrom
perf/apple-runner-build-time
Open

perf: reuse Apple runner cache across version bumps#900
thymikee wants to merge 7 commits into
mainfrom
perf/apple-runner-build-time

Conversation

@thymikee

@thymikee thymikee commented Jun 27, 2026

Copy link
Copy Markdown
Member

Summary

Reduce Apple runner cold-build and first-use work while keeping the cache reliability boundary intact.

Area Before: global agent-device@0.18.0 After: PR #900 cumulative
npm/user runner build Runtime command plus in-bundle Swift unit-test methods compiled in the same UI-test target Runtime command only; Swift unit-test methods are behind AGENT_DEVICE_RUNNER_UNIT_TESTS
Maintainer Swift coverage Coupled to build:all Separate macOS CI compile job for the guarded Swift unit-test surface
Asset catalog actool runs for tiny branding assets on every cold runner build Assets.xcassets and bundled branding images are removed from repo/npm/project inputs
Cache key Reusable across package version bumps; invalidates on toolchain/build metadata Same, plus build metadata records unit-test Swift flags without asset-catalog-only settings
Maintainer/CI script destination Defaulted to generic/platform=iOS Simulator, which built arm64 + x86_64 locally Picks a concrete available iOS/tvOS simulator with a 3s fallback to generic, so build:xcuitest can build one active simulator arch

Fresh measurements on Xcode 26.2, iPhone 17 Pro Max simulator, alternating runs, fresh DerivedData per run:

Scenario Before runs Before median / mean After runs After median / mean Delta
xcodebuild build-for-testing runner build 8.69s, 8.07s, 7.12s, 7.15s, 8.06s 8.06s / 7.82s 7.82s, 7.76s, 7.74s, 7.82s, 7.59s 7.76s / 7.75s -3.7% median
End-to-end first use: open settings then first snapshot -i with no runner cache 31.5s 31.5s 21.4s 21.4s -32% wall time

The build-only number is intentionally modest because Xcode warmup and shared compiler caches dominate after the first run, but the baseline still runs asset catalog work on every clean DerivedData build and the PR branch does not. The first-use CLI comparison is the more representative user path for "runner is not installed yet".

I tested lazy-loading the screen recording Swift surface after these measurements. It saved about 0.54s median on default clean runner builds, but it required a second runner build variant, feature-specific cache keys, session reuse handling for fixed DerivedData paths, and a more surprising first record path. That tradeoff was not worth shipping, so the lazy-recording commit was reverted in b82124170 and recording remains part of the normal runner.

Latest additional A/B matrix for low-complexity build-setting levers, all run sequentially with fresh DerivedData per build on Xcode 26.2 / iPhone simulator:

Variant Runs Wall median / mean Result
Current PR baseline 5 6.128s / 6.632s Baseline for this run
ENABLE_TESTABILITY=NO 5 5.888s / 5.904s Reject for runtime: only ~0.24s median and changes Swift testability semantics
SWIFT_SERIALIZE_DIAGNOSTICS=NO 5 6.042s / 6.073s Reject: command-line override did not remove -serialize-diagnostics
SWIFT_EMIT_MODULE_SEPARATELY=NO 5 6.016s / 5.971s Reject: command-line override did not remove -experimental-emit-module-separately
SWIFT_ENABLE_INCREMENTAL_COMPILATION=NO 5 6.110s / 6.023s Reject: removed -incremental, no reliable wall-time win
Testability off + diagnostics off 5 5.922s / 6.067s Reject
Testability off + emit-module off 5 5.975s / 5.976s Reject
Testability off + incremental off 5 5.876s / 5.897s Reject: best median here, still only ~0.25s and changes testability semantics

One missed lever did pay off for maintainer/CI script builds:

Script destination Runs Wall median / mean Archs compiled Result
generic/platform=iOS Simulator 5 8.210s / 8.042s arm64 + x86_64 Old script default
Concrete iOS simulator UDID 5 5.767s / 5.825s arm64 only -2.443s median / -29.8%

Runtime/user runner builds already use concrete simulator destinations, so this does not change the npm ios-prepare path. It fixes scripts/build-xcuitest-apple.sh / pnpm build:xcuitest by probing CoreSimulator for an available concrete iOS/tvOS simulator with a 3s timeout, and falling back to the generic destination if discovery fails.

Earlier profiling experiments that informed the patch:

Experiment Wall time SwiftCompile Asset catalog Result
Default single-file cold-ish 10.62s 7.22s 5.57s First actool run is noisy
Wholemodule after warmup 6.43s 1.53s 0.67s Less repeated Swift work, but not lower warm wall time
Single-file after warmup 5.61s 7.30s 0.68s Kept default; parallelism wins wall-clock
Asset baseline 7.93s 6.91s 4.09s actool still noisy
Asset catalog removed 5.58s 7.34s n/a Removes the unstable actool source input and drops packaged branding assets

I tried running the guarded Swift unit tests via xcodebuild test-without-building on the macOS UI-test target. Even an allowlist of three device-free methods still launched the UI-test host and was slow/problematic locally, so this PR only compiles that surface in a separate CI job. Actually running those cheaply requires a future target split away from the UI-test runner.

Validation

  • pnpm build: rebuilt local dist before CLI measurements.
  • pnpm exec vitest run src/platforms/ios/__tests__/runner-client.test.ts src/platforms/ios/__tests__/runner-xctestrun.test.ts src/platforms/ios/__tests__/runner-icon.test.ts: 79 tests passed after reverting lazy recording.
  • pnpm check:quick: lint and typecheck passed after reverting lazy recording.
  • pnpm build:xcuitest: passed for iOS and macOS after reverting lazy recording.
  • node ./node_modules/oxfmt/bin/oxfmt --write ...: completed cleanly earlier; direct invocation used because pnpm format tried to verify/fetch pnpm@11.1.2 without network in this sandbox.
  • npm pack --dry-run --ignore-scripts --json --cache /private/tmp/agent-device-npm-cache: package has 161 files, 568 KB packed / 1.97 MB unpacked, and no Assets.xcassets, logo.jpg, or powered-by.png entries.
  • xcodebuild build-for-testing benchmark: 5 alternating clean DerivedData runs for global 0.18 and PR perf: reuse Apple runner cache across version bumps #900; baseline logs contain Assets.xcassets, PR logs do not.
  • End-to-end first-use CLI benchmark: global 0.18 31.5s, PR perf: reuse Apple runner cache across version bumps #900 21.4s with isolated runner DerivedData/state.
  • Local tweet video artifacts generated from the measured first-use timings:
    • .tmp/runner-install-comparison-20260627/videos/agent-device-0.18-first-snapshot.mp4
    • .tmp/runner-install-comparison-20260627/videos/agent-device-pr900-first-snapshot.mp4
    • .tmp/runner-install-comparison-20260627/videos/agent-device-runner-first-snapshot-comparison.mp4
  • AGENT_DEVICE_XCUITEST_INCLUDE_UNIT_TESTS=1 AGENT_DEVICE_IOS_RUNNER_DERIVED_PATH=/private/tmp/agent-device-swift-unit-compile-derived pnpm build:xcuitest:macos: passed earlier; command line showed -D AGENT_DEVICE_RUNNER_UNIT_TESTS.
  • Latest pushed SHA bb032737a; CI is re-running for this push.

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown

Size Report

Metric Base Current Diff
JS raw 1.4 MB 1.4 MB +167 B
JS gzip 445.5 kB 445.6 kB +52 B
npm tarball 584.7 kB 545.9 kB -38.8 kB
npm unpacked 2.0 MB 1.9 MB -38.9 kB

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 23.5 ms 24.8 ms +1.3 ms
CLI --help 42.8 ms 43.4 ms +0.6 ms

Top changed chunks:

Chunk Raw diff Gzip diff
dist/src/9722.js +167 B +52 B

@github-actions

Copy link
Copy Markdown
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstack.github.io/agent-device/pr-preview/pr-900/

Built to branch gh-pages at 2026-06-27 12:53 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@thymikee thymikee force-pushed the perf/apple-runner-build-time branch 2 times, most recently from e2897b2 to 1799d0d Compare June 27, 2026 13:37
@thymikee thymikee force-pushed the perf/apple-runner-build-time branch from b821241 to 10ef1ef Compare June 27, 2026 19:05
@thymikee

Copy link
Copy Markdown
Member Author

Reviewed the latest head, including the new concrete-simulator destination commit.

I do not see a blocker. The default iOS/tvOS script path now prefers an available concrete simulator id, but keeps the generic simulator fallback when simctl lookup fails. Cache metadata still normalizes the destination back to the simulator family, so choosing a specific UDID should not churn runner cache keys. The unit-test Swift flag is now explicitly represented in both the shell build path and metadata comparison, so runtime and unit-test runner variants stay separated. Current CI is green, including Swift Runner Unit Compile, typecheck, unit, integration, smoke, and iOS runner compatibility.

Residual risk: this is still Apple runner build/cache behavior, so I would treat the PR body’s local Xcode measurements and first-use validation as the device-facing evidence rather than relying on fixture tests alone. With that evidence plus green CI, this is ready for maintainer merge judgment.

@thymikee thymikee added the ready-for-human Valid work that needs human implementation, judgment, or maintainer merge label Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-human Valid work that needs human implementation, judgment, or maintainer merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant