feat: add cross-platform audio probe#880
Conversation
Size Report
Startup median (7 runs, lower is better):
Top changed chunks:
|
|
Fallow Code Quality is failing on new complexity in the audio path. Please split/flatten these before review:
After updating, please rerun |
|
Review pass complete: I did not find actionable blockers. What I checked:
Residual risk is the stated product scope: this samples capturable HTML media elements exposed to Web Audio, not whole-tab/OS/device audio. That tradeoff is documented in the PR body and CLI guidance, so I would call this merge-ready once maintainers are comfortable with that first slice. |
422287d to
f4f4d59
Compare
|
cbe14d0 to
6f81454
Compare
|
Review finding on the latest green #880 head: Blocking: normal close does not stop an active host audio probe. session-close.ts defines stopSessionAudioProbe at lines 89-94 and teardownSessionResources calls it, but handleCloseCommand starts cleanup at app logs/perf/snapshot helper and never calls stopSessionAudioProbe before deleting the session. For macOS/iOS simulator/Android emulator audio probes, start stores a long-lived helper child in session.audioProbe; closing the session while a probe is running drops the only handle and lets the ScreenCaptureKit helper continue until its duration expires. Please call the audio cleanup helper from handleCloseCommand and add a close-with-active-audio-probe regression test. |
6f81454 to
f910aaa
Compare
Thermo-Nuclear Code Quality ReviewFeature works and the command/capability/CLI wiring is clean and consistent with the existing observability family. But there's one structural regression worth fixing before merge, plus several duplication/indirection smells that the codebase's own conventions argue against. No file crosses the 1000-line threshold (largest is 1. Blocker — the ~190-line browser probe is a magic string, not code (
|
|
Follow-up review on c83fe7c: the blockers I checked are addressed. Normal close now stops an active host audio probe and has a regression test, the browser probe script moved out of the provider into a checked TS module, the host-audio capability predicate has a single source, the identity wrapper is gone, fallback result construction is centralized, and the web result normalizer no longer relies on a cast. CI is green. With the PR body documenting live web evidence and the macOS Screen Recording permission blocker for host capture, I do not see remaining blockers from this review pass. |
|
Latest-head follow-up on d46e596: I rechecked the post-review delta. It removes the leftover host-audio indirection, routes host audio through the canonical capability predicate, and allowlists/sanitizes the browser eval options before interpolation. I do not see new blockers, and CI is green. The prior residual risk still applies: host capture live verification was blocked by macOS Screen Recording permission, while the web path has live evidence in the PR body. |
d46e596 to
e92ccdd
Compare
|
Latest-head follow-up on e92ccdd after the branch rebased over #890/#891: I still do not see a blocker. Range-diff matches the previously reviewed audio-probe series, with the expected reconciliation against the new session-teardown extraction and web recording help text. CI is green. Same residual risk as before: host audio live capture remains blocked by macOS Screen Recording permission; the web path has live evidence in the PR body. |
|
Latest-head follow-up on b982e4b: I checked the new eval-options delta. The action is still generated from a closed switch over start/stop/status, and duration/bucket values still pass through finite-number truncation before interpolation. I do not see a new blocker; CI is green. Residual risk unchanged: host audio live capture remains blocked by macOS Screen Recording permission; the web path has live evidence in the PR body. |
Summary
Adds
audio probe start|status|stopfor agent-device sessions. Agents can poll compact RMS/peak dBFS buckets while continuing other actions.Supported backends now include:
<audio>/<video>media elements through Web Audio.The Expo test app Audio tab now dogfoods both web and native playback with a generated 6-second 440 Hz sample. Docs/help are updated across CLI help, README, website command/debugging/client API/setup/security pages with platform support, timing units, polling flow, and Screen Recording requirements.
Tradeoffs: web does not capture whole-tab/system audio; host capture requires macOS Screen Recording permission and can include other audible host apps. Physical iOS and Android devices remain unsupported because their audio is not rendered through the Mac host.
Scope: 36 files overall, covering the audio command family, web provider, macOS helper/backend, iOS simulator and Android emulator host-routing, observability tests, docs/help, and the Expo test app.
Validation
Passed locally: focused unit tests for
session-observability,capabilities, observability command metadata, and CLI args/help;CI=true pnpm typecheck;CI=true pnpm test-app:typecheck;CI=true pnpm format;CI=true pnpm build;CI=true pnpm check:fallow --base origin/main;swift build -c release --package-path macos-helper; iOS native test app build/install on iPhone 17 Pro and iPhone 17 Pro Max simulators; Android native test app build/install on Pixel 9 Pro XL emulator.Focused test command:
CI=true pnpm exec vitest run --project unit src/daemon/handlers/__tests__/session-observability.test.ts src/core/__tests__/capabilities.test.ts src/commands/observability/index.test.ts src/utils/__tests__/args.test.tsResult: 4 files passed, 174 tests passed.
Latest GitHub checks after the simulator support fix: Unit Tests, Coverage, Typecheck, Lint & Format, Fallow Code Quality, Integration Tests, Web Platform Smoke, Layering Guard, No test-only DI seams, iOS Runner Swift Compatibility, Bundle Size, CodeQL, and analysis jobs are passing. One duplicate Smoke Tests job was still pending when this description was updated.
Manual web dogfood used a live 20s probe while clicking the sample button:
agent-device audio probe start 20 1000 --platform web --session audio-review-followup --json agent-device click 'label="Start sample"' --platform web --session audio-review-followup --json agent-device audio probe status --platform web --session audio-review-followup --jsonReal web status output excerpt:
{ "audio": "probe", "state": "running", "active": true, "heard": true, "source": "media-elements", "backend": "agent-browser", "durationMs": 20000, "elapsedMs": 9085, "bucketMs": 1000, "sampleCount": 10, "mediaElementCount": 1, "sourceCount": 1, "rmsDbfs": [-90, -90, -11, -11, -11, -11, -11, -90, -90, -90], "peakDbfs": [-90, -90, -9, -9, -9, -9, -9, -90, -90, -90] }Simulator/emulator verification:
8082on iPhone 17 Pro Max, opened the Audio tab, and confirmed the native audio sample UI rendered.expo-audio, loaded the app from Metro8082, opened the Audio tab, and tappedStart sample.audio probe start 10 1000 --platform ios|androidreached the new host-system audio helper path, then failed at the expected local macOS TCC gate because Screen Recording permission is denied on this machine:{ "code": "COMMAND_FAILED", "message": "failed to start host audio probe: {\"error\":{\"details\":{\"error\":\"The user declined TCCs for application, window, display capture\",\"permission\":\"screen-recording\"},\"message\":\"audio probe requires Screen Recording permission on macOS\"},\"ok\":false}" }All manual
agent-devicesessions were closed, Metro was stopped, and the Android emulator booted for verification was shut down.