Skip to content

perf(producer): skip static render probes#494

Closed
miguel-heygen wants to merge 1 commit intomainfrom
perf/render-frame-pipeline
Closed

perf(producer): skip static render probes#494
miguel-heygen wants to merge 1 commit intomainfrom
perf/render-frame-pipeline

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented Apr 25, 2026

Problem

Renderer speed is mostly won or lost on work that sits before or beside the actual frame loop. This PR removes three pieces of avoidable render-path work:

  • Nested data-composition-src hosts could still require a browser duration probe even when the compiled sub-composition root already declared a static duration.
  • Duplicate remote script tags were fetched once per tag during compilation instead of sharing the same download.
  • The high quality tier spent CPU on libx264 slow, which optimizes compression harder than a render-throughput product should by default.

The competitive read from current renderer docs reinforces the direction: Remotion local rendering is browser-tab screenshot work with concurrency tradeoffs, and its H.264 x264Preset default is medium, not slow. FFmpeg/libx264 exposes preset and crf as separate controls; keeping CRF low preserves the quality target while moving the speed/size tradeoff back toward throughput.

Sources used while checking the optimization surface:

Root Cause

The producer already has enough static information after parseSubCompositions() to resolve common sub-composition host durations, but it did not use that information before returning unresolvedCompositions to the render orchestrator. That leaves Chrome on the critical path for a value that is often already present in the HTML.

Separately, inlineExternalScripts() treated every matching <script src="https://..."> as an independent network request, and high encoding tied visual quality to the x264 slow preset instead of to the CRF.

What Changed

  • Resolve statically declared sub-composition durations during compileForRender():
    • reads data-duration from the compiled sub root
    • falls back to data-end - data-start
    • supports template-wrapped sub-compositions
    • injects data-duration / data-end into the host
    • removes resolved hosts from the browser-probe queue
  • Cache external script downloads by URL and the active globalThis.fetch, so duplicate CDN script tags share one in-flight/finished request while tests that replace fetch still stay isolated.
  • Change the high H.264 preset from slow to medium while keeping CRF 15 unchanged.
  • Update the NVENC mapping expectations for the new high preset shape.

Performance

Measured locally against a clean origin/main baseline worktree, using 3-run producer benchmark averages.

Fixture Baseline This PR Delta Notes
chat 10150ms 9097ms -1053ms / -10.4% browserProbeMs: 544ms -> 0ms, compileMs: 1477ms -> 956ms
missing-host-comp-id 2283ms 2107ms -176ms / -7.7% encodeMs: 289ms -> 234ms

Isolated encoder proof on the same 450 captured frames:

x264 preset Wall time Output size
slow 2.90s 981K
medium 1.86s 989K

Medium was 35.9% faster in that isolated encode test, with PSNR 65.37 dB comparing medium output against slow output.

What I Tested And Did Not Ship

These were real experiments, but they lost on wall clock and were reverted instead of being hidden behind optimistic code paths:

  • Streaming encode by default: slower on the tested fixtures (chat around 20s vs roughly 10s).
  • Shared browser-pool launch path: functionally worked, but regressed chat to roughly 12.66s average.
  • Async frame writes: passed tests, but regressed chat to roughly 10.25s average.
  • CDP Runtime.evaluate seek path: passed tests, but regressed chat to roughly 14.47s average.

Verification

  • bunx oxfmt --check packages/engine/src/services/chunkEncoder.ts packages/engine/src/services/chunkEncoder.test.ts packages/producer/src/services/htmlCompiler.ts packages/producer/src/services/htmlCompiler.test.ts
  • bunx oxlint packages/engine/src/services/chunkEncoder.ts packages/engine/src/services/chunkEncoder.test.ts packages/producer/src/services/htmlCompiler.ts packages/producer/src/services/htmlCompiler.test.ts
  • bun run --filter @hyperframes/producer typecheck
  • bun run --filter @hyperframes/engine typecheck
  • bun test packages/producer/src/services/htmlCompiler.test.ts -> 33 tests pass
  • bun run --filter @hyperframes/engine test -- src/services/chunkEncoder.test.ts src/services/streamingEncoder.test.ts src/utils/gpuEncoder.test.ts src/services/frameCapture.test.ts src/services/frameCapture-namePolyfill.test.ts src/services/parallelCoordinator.test.ts -> 124 tests pass
  • bun run --filter @hyperframes/producer test -- missing-host-comp-id --sequential -> compilation, visual, and audio pass; 0 failed visual frames; audio correlation 1.000
  • ffprobe on the fresh proof render -> 1080x1920, 3.000000s, 72 frames
  • agent-browser playback proof -> recorded local playback, verified readyState: 4, currentTime: 1.75, duration: 3, videoWidth: 1080, videoHeight: 1920
  • Lefthook pre-commit -> lint, format, typecheck pass
  • Lefthook commit-msg -> commitlint pass

Known local harness note: bun run --filter @hyperframes/producer test -- chat --sequential still reports 34 visual frame failures in this checkout, but I reproduced the same 34-frame failure on a clean origin/main baseline worktree before this patch. The perf benchmark for chat is still useful for timing, but that regression-test baseline drift is pre-existing local state rather than a failure introduced here.

Comment thread packages/producer/src/services/htmlCompiler.ts Fixed
@miguel-heygen miguel-heygen force-pushed the perf/render-frame-pipeline branch from d298baf to 7a6eb96 Compare April 25, 2026 18:15
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three independent, well-targeted optimizations. Each is small enough to evaluate on its own merits and none of them carry the regression risk profile of #493's streaming-encode default flip.

1. Static sub-composition duration resolution — skips a per-render Chrome roundtrip when the sub-comp's root already declares data-duration. Code path is purely additive: walks unresolvedCompositions, looks up the sub-comp HTML in the existing subCompositions map, and only resolves when a positive declared duration is found. Falls back to the existing probe path otherwise. The parseDeclaredDurationFromCompositionHtml helper handles both bare and <template>-wrapped composition shapes, which is the right thing for hyperframes' <template> convention. Trusting the declaration over the observation aligns with the framework contract (data-duration is authoritative, not GSAP timeline length).

2. External script fetch dedup — module-level promise cache keyed by URL, with fetcher: globalThis.fetch invalidation so test mocking still works, and proper error eviction so a failed fetch doesn't poison the cache. Test asserts fetchCount drops from 2 to 1 for two identical script tags. Worth flagging that the cache is unbounded and process-lifetime, so a long-running studio dev server could in theory hold onto stale script content if a remote URL changes mid-process — for the producer/CLI which is short-lived per render this is fine.

3. x264 slowmedium for the high preset — keeps CRF=15 (quality target untouched), moves preset off the CPU-intensive slow mode. Cited Remotion's medium default is the right reference; libx264 medium is the documented standard-speed preset. NVENC mapping correctly bumps from p5 → p4. Net: faster encode, slightly larger files at equal quality. Tests updated for both the SW and GPU mapping paths.

Verified locally:

  • bun run --filter @hyperframes/engine test → 494/494 pass
  • bun test packages/producer/src/services/htmlCompiler.test.ts → 32/32 pass (incl. the new static-sub-duration test and the updated dedup-fetchCount assertion)
  • bun run --filter @hyperframes/producer typecheck → clean
  • 5 regression fixtures pass with visual + audio parity: font-variant-numeric, many-cuts, chat, sub-composition-video, webm-transparency

A/B wall-clock on this Linux/headless-shell VM (single trial each):

Fixture main (9b72a87) PR (d298baf) Δ
chat 35.18s 34.82s ~parity (within noise)
many-cuts 22.07s 22.21s ~parity (within noise)
sub-composition-video 131.10s (cpu 339%) 126.95s (cpu 279%) -3%

The chat/many-cuts numbers are flat as expected — neither has duplicate external scripts and neither has a sub-comp with a declared root duration that the new path catches. sub-composition-video exercises the static-duration path; the CPU drop (339% → 279%) is consistent with skipping a browser probe phase.

The x264 preset change isn't measured here because the regression harness fixtures default to quality: "standard" (which already uses medium); the change only affects the high preset path.

— Rames Jusso

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants