Skip to content

perf: read each wrapped module from disk once#265

Draft
BridgeAR wants to merge 3 commits into
nodejs:mainfrom
BridgeAR:BridgeAR/2026-06-30-dedup-source-reads
Draft

perf: read each wrapped module from disk once#265
BridgeAR wants to merge 3 commits into
nodejs:mainfrom
BridgeAR:BridgeAR/2026-06-30-dedup-source-reads

Conversation

@BridgeAR

@BridgeAR BridgeAR commented Jun 30, 2026

Copy link
Copy Markdown
Member

Wrapping a module reads its source twice and re-lexes shared leaves once per
barrel. Two changes remove both, with no observable behavior change:

  1. The export pass reads a module's source to lex its exports; the wrapper it
    emits then does import * as namespace from <realUrl>, so Node reads the
    same file again to compile that import. Stash the source the export pass
    fetched and serve it to the namespace import.
  2. A leaf reached from several export * barrels is lexed once per barrel even
    though its export set is intrinsic to the file. Memoize the lexed ESM export
    names by URL so the first lex serves the rest.

On a 390-module barrel-heavy graph under the synchronous loader: export lexes
588 → 396 (-33%), total module read calls 1970 → 794 (-60%).

The source cache only holds entries whose upstream load supplied a format; a
load that returns source without one relies on Node to resolve the format on
the real load, and serving such an entry would throw ERR_UNKNOWN_MODULE_FORMAT.
The export memo only caches the pure ESM result — the CommonJS path mutates
context.format and resolves re-exports per call, and built-ins already
memoize via BUILT_INS.

Comment thread create-hook.mjs Outdated
@BridgeAR BridgeAR force-pushed the BridgeAR/2026-06-30-dedup-source-reads branch 3 times, most recently from 9eace65 to 7bcd78f Compare June 30, 2026 20:36
BridgeAR added 2 commits July 2, 2026 18:42
Wrapping an ESM module reads its source to lex the exports, and the wrapper it
emits then does `import * as namespace from <realUrl>`, which without this made
Node read the same file a second time. A leaf reached through several barrels
(`export *`) was additionally read and lexed once per barrel, though its export
set is intrinsic to the file and stable for the process.

`esmExportsCache` now memoizes both: the first lex serves every later barrel,
and the source it read is served to the wrapper's own namespace import via
`takeCachedSource`, so each wrapped module is read from disk once.

The memo holds source only until that one import consumes it, then drops it and
keeps the export names. Source is cached solely for `module` /
`module-typescript`: a wrapped CJS module's real load must return
`source: undefined` (else `require()` of a "module-sync" package fails with
ERR_VM_MODULE_LINK_FAILURE), and a load without a usable format has to be
resolved by Node, which serving a cached entry would short-circuit into
ERR_UNKNOWN_MODULE_FORMAT. The TypeScript source is cached before type-stripping
because Node strips again on the real load.
A leaf reached through two barrels and imported directly exercises the export
memo and the served-once source: the second barrel reaches the leaf through the
memo. Importing the leaf's bindings through both barrels makes the memoized
re-export set observable, so a memo that returned the wrong set for the second
barrel would drop those bindings and fail here.
@BridgeAR BridgeAR force-pushed the BridgeAR/2026-06-30-dedup-source-reads branch from 7bcd78f to 96e400a Compare July 2, 2026 16:44
Wrapping an ESM module reads its source once to lex the exports; the wrapper
then does `import * as namespace from <realUrl>`, and serving that import the
source the export scan cached skips the loader chain's second `load()` for the
real module. An upstream loader that returns different source per call — a
non-idempotent transform or codegen loader, or one that keys off `context` —
then has its second load silently dropped: both the executed module and the
hooked namespace see the export-scan source, not the source the real load would
produce.

The wrapper's namespace import now falls through to the loader chain like any
other import. Only the export names stay memoized, so a leaf reached through
several barrels is still lexed once; the per-wrapped-module source read the
cache saved comes back. On a 480-module mixed graph that costs ~7% of import
time on Node 24 (sync hooks 113 -> 121 ms, n=21, best+worst dropped) and is
within noise on Node 18; the export-name dedup keeps the graph faster than
before the barrel memo landed.
@BridgeAR BridgeAR force-pushed the BridgeAR/2026-06-30-dedup-source-reads branch from 87210c7 to 77fbe68 Compare July 3, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants