perf(producer): skip static render probes#494
Conversation
d298baf to
7a6eb96
Compare
jrusso1020
left a comment
There was a problem hiding this comment.
Three independent, well-targeted optimizations. Each is small enough to evaluate on its own merits and none of them carry the regression risk profile of #493's streaming-encode default flip.
1. Static sub-composition duration resolution — skips a per-render Chrome roundtrip when the sub-comp's root already declares data-duration. Code path is purely additive: walks unresolvedCompositions, looks up the sub-comp HTML in the existing subCompositions map, and only resolves when a positive declared duration is found. Falls back to the existing probe path otherwise. The parseDeclaredDurationFromCompositionHtml helper handles both bare and <template>-wrapped composition shapes, which is the right thing for hyperframes' <template> convention. Trusting the declaration over the observation aligns with the framework contract (data-duration is authoritative, not GSAP timeline length).
2. External script fetch dedup — module-level promise cache keyed by URL, with fetcher: globalThis.fetch invalidation so test mocking still works, and proper error eviction so a failed fetch doesn't poison the cache. Test asserts fetchCount drops from 2 to 1 for two identical script tags. Worth flagging that the cache is unbounded and process-lifetime, so a long-running studio dev server could in theory hold onto stale script content if a remote URL changes mid-process — for the producer/CLI which is short-lived per render this is fine.
3. x264 slow → medium for the high preset — keeps CRF=15 (quality target untouched), moves preset off the CPU-intensive slow mode. Cited Remotion's medium default is the right reference; libx264 medium is the documented standard-speed preset. NVENC mapping correctly bumps from p5 → p4. Net: faster encode, slightly larger files at equal quality. Tests updated for both the SW and GPU mapping paths.
Verified locally:
bun run --filter @hyperframes/engine test→ 494/494 passbun test packages/producer/src/services/htmlCompiler.test.ts→ 32/32 pass (incl. the new static-sub-duration test and the updated dedup-fetchCount assertion)bun run --filter @hyperframes/producer typecheck→ clean- 5 regression fixtures pass with visual + audio parity:
font-variant-numeric,many-cuts,chat,sub-composition-video,webm-transparency
A/B wall-clock on this Linux/headless-shell VM (single trial each):
| Fixture | main (9b72a87) |
PR (d298baf) |
Δ |
|---|---|---|---|
| chat | 35.18s | 34.82s | ~parity (within noise) |
| many-cuts | 22.07s | 22.21s | ~parity (within noise) |
| sub-composition-video | 131.10s (cpu 339%) | 126.95s (cpu 279%) | -3% |
The chat/many-cuts numbers are flat as expected — neither has duplicate external scripts and neither has a sub-comp with a declared root duration that the new path catches. sub-composition-video exercises the static-duration path; the CPU drop (339% → 279%) is consistent with skipping a browser probe phase.
The x264 preset change isn't measured here because the regression harness fixtures default to quality: "standard" (which already uses medium); the change only affects the high preset path.
— Rames Jusso
Problem
Renderer speed is mostly won or lost on work that sits before or beside the actual frame loop. This PR removes three pieces of avoidable render-path work:
data-composition-srchosts could still require a browser duration probe even when the compiled sub-composition root already declared a static duration.highquality tier spent CPU on libx264slow, which optimizes compression harder than a render-throughput product should by default.The competitive read from current renderer docs reinforces the direction: Remotion local rendering is browser-tab screenshot work with concurrency tradeoffs, and its H.264
x264Presetdefault ismedium, notslow. FFmpeg/libx264 exposespresetandcrfas separate controls; keeping CRF low preserves the quality target while moving the speed/size tradeoff back toward throughput.Sources used while checking the optimization surface:
Root Cause
The producer already has enough static information after
parseSubCompositions()to resolve common sub-composition host durations, but it did not use that information before returningunresolvedCompositionsto the render orchestrator. That leaves Chrome on the critical path for a value that is often already present in the HTML.Separately,
inlineExternalScripts()treated every matching<script src="https://...">as an independent network request, andhighencoding tied visual quality to the x264 slow preset instead of to the CRF.What Changed
compileForRender():data-durationfrom the compiled sub rootdata-end - data-startdata-duration/data-endinto the hostglobalThis.fetch, so duplicate CDN script tags share one in-flight/finished request while tests that replacefetchstill stay isolated.highH.264 preset fromslowtomediumwhile keeping CRF15unchanged.highpreset shape.Performance
Measured locally against a clean
origin/mainbaseline worktree, using 3-run producer benchmark averages.chat10150ms9097ms-1053ms/-10.4%browserProbeMs: 544ms -> 0ms,compileMs: 1477ms -> 956msmissing-host-comp-id2283ms2107ms-176ms/-7.7%encodeMs: 289ms -> 234msIsolated encoder proof on the same 450 captured frames:
slow2.90s981Kmedium1.86s989KMedium was
35.9%faster in that isolated encode test, with PSNR65.37 dBcomparing medium output against slow output.What I Tested And Did Not Ship
These were real experiments, but they lost on wall clock and were reverted instead of being hidden behind optimistic code paths:
chataround20svs roughly10s).chatto roughly12.66saverage.chatto roughly10.25saverage.Runtime.evaluateseek path: passed tests, but regressedchatto roughly14.47saverage.Verification
bunx oxfmt --check packages/engine/src/services/chunkEncoder.ts packages/engine/src/services/chunkEncoder.test.ts packages/producer/src/services/htmlCompiler.ts packages/producer/src/services/htmlCompiler.test.tsbunx oxlint packages/engine/src/services/chunkEncoder.ts packages/engine/src/services/chunkEncoder.test.ts packages/producer/src/services/htmlCompiler.ts packages/producer/src/services/htmlCompiler.test.tsbun run --filter @hyperframes/producer typecheckbun run --filter @hyperframes/engine typecheckbun test packages/producer/src/services/htmlCompiler.test.ts-> 33 tests passbun run --filter @hyperframes/engine test -- src/services/chunkEncoder.test.ts src/services/streamingEncoder.test.ts src/utils/gpuEncoder.test.ts src/services/frameCapture.test.ts src/services/frameCapture-namePolyfill.test.ts src/services/parallelCoordinator.test.ts-> 124 tests passbun run --filter @hyperframes/producer test -- missing-host-comp-id --sequential-> compilation, visual, and audio pass; 0 failed visual frames; audio correlation1.000ffprobeon the fresh proof render ->1080x1920,3.000000s,72framesagent-browserplayback proof -> recorded local playback, verifiedreadyState: 4,currentTime: 1.75,duration: 3,videoWidth: 1080,videoHeight: 1920Known local harness note:
bun run --filter @hyperframes/producer test -- chat --sequentialstill reports 34 visual frame failures in this checkout, but I reproduced the same 34-frame failure on a cleanorigin/mainbaseline worktree before this patch. The perf benchmark forchatis still useful for timing, but that regression-test baseline drift is pre-existing local state rather than a failure introduced here.