fix(zero-cache): cap AST depth to prevent unauthenticated CPU DoS by Karavil · Pull Request #6000 · rocicorp/mono

Karavil · 2026-05-17T23:08:07Z

Summary

An unauthenticated WebSocket client can open a connection to zero-cache and send an initConnection (or changeDesiredQueries) message whose desiredQueriesPatch[*].ast contains a deeply-nested condition tree:

let cond = { type: 'simple', op: '=', ... };
for (let i = 0; i < N; i++) cond = { type: 'and', conditions: [cond] };

There are two recursion surfaces that overflow under such input:

Valita schema parsing of astSchema / conditionSchema in packages/zero-protocol/src/ast.ts. The schema is structurally recursive via v.lazy(...), so every nesting level burns one JS frame. Overflow lands around depth ~900 (RangeError at parse, ~150 ms).
Downstream AST visitors in transformQuery / transformQueryInternal / transformCondition (packages/zero-cache/src/auth/read-authorizer.ts), simplifyCondition (packages/zql/src/query/expression.ts), bindStaticParameters (packages/zql/src/builder/builder.ts), and hashOfAST / normalizeAST. These overflow slightly earlier (~800) and burn ~6.5 seconds of CPU at the sweet spot because the AND/OR rewrites in simplifyCondition re-walk the tree.

The error is caught and the connection closed, so the worker process does not crash — but per-request CPU at the depth-~800 sweet spot is enormous, and no authentication is required. A trickle of crafted requests would saturate sync workers.

Measured against a production zero-cache (protocol v50) before this patch:

Depth	Server result
≤ 700	Parses + transforms successfully (CPU cost grows with depth)
800	`Internal: Maximum call stack size exceeded` after 6.5 seconds of CPU
900+	`InvalidMessage: RangeError: Maximum call stack size exceeded` at parse, ~150 ms

Found during a SOC 2 audit of an opaque-token deployment.

Fix shape

The depth check is folded into the schema layer: a new depthBoundedAstSchema (chain-wrapped variant of astSchema) is substituted at the wire-entry points that accept client-supplied ASTs. No envelope-specific guard or call-site assert(...) is added — too-deep inputs fail through the same v.parse error path as any other malformed message.

New constant MAX_AST_DEPTH = 50 in packages/zero-protocol/src/ast.ts.
New assertAstDepth(ast, max) — an iterative walker (explicit worklist) over the mutually-recursive Condition ↔ AST surface (handles and / or conditions, correlatedSubquery → ast → where → ..., and AST.related[].subquery). Throws if any branch exceeds max. Iterative because the guard cannot itself stack-overflow the path it is protecting.
New depthBoundedAstSchema export — v.unknown().chain(value => { try { assertAstDepth(value); } catch (e) { return v.err(...); } return astSchema.try(value); }).
Wire-entry sites now reference depthBoundedAstSchema instead of astSchema:
- packages/zero-protocol/src/queries-patch.ts:upPutOpSchema.ast (every desiredQueriesPatch[].ast from a client)
- packages/zero-protocol/src/inspect-up.ts:inspectAnalyzeQueryUpSchema.ast and .value (the inspector's analyze-query op)

upstreamSchema is untouched. connection.ts is untouched. The check rides on top of the existing schema mechanism rather than a parallel guard.

The depth limit is 50. Headroom argument: the deepest realistic permission-rule plus user-where combination is ~6 levels; 50 is roughly 8×. Easy to raise if any consumer reports a real workload that needs more — it's a single constant.

Wire compatibility & legacy data

Server-internal usages of astSchema (CVR row loading in cvr-store.ts, inspector downstream responses, etc.) are deliberately not switched. Existing deployments may have CVR-stored ASTs deeper than 50 from before the cap; those should still load. Callers that want a depth bound at a server-side site can opt in by switching their reference to depthBoundedAstSchema.

protocol-version.test.ts hashes JSON.stringify(schema). Switching upPutOpSchema.ast and the inspect-up AST fields to the chain-wrapped variant changes the schema object's serialized shape, so the hash moves even though the wire grammar is unchanged. The hash is bumped accordingly.

To prove the wire grammar is actually preserved (so reviewers don't have to take that on faith), this PR adds packages/zero-protocol/src/upstream-schema.test.ts — a new regression suite that:

Round-trips a minimal representative payload through v.parse(..., upstreamSchema) for every upstream message type (initConnection, ping, deleteClients, changeDesiredQueries, pull, updateAuth, push, closeConnection, inspect × 4 ops, ackMutationResponses) and asserts the parsed value equals the input.
Asserts that obviously-malformed payloads (wrong tag, missing required fields, wrong types) are rejected.
Asserts that initConnection and changeDesiredQueries with a depth-1000 AST in desiredQueriesPatch are rejected through the standard v.parse error path.

protocol-version.test.ts's comment now points at upstream-schema.test.ts as the actual wire-grammar contract; bumping the hash without bumping PROTOCOL_VERSION is safe iff the wire-grammar suite still passes.

What was intentionally not done

Did not rewrite transformQueryInternal, transformCondition, simplifyCondition, or bindStaticParameters to be iterative. The wire layer caps depth at 50; the downstream visitors will never see deeper input. Doing the rewrite would touch subtle semantics (AND/OR flattening, NOT/EXISTS, static-parameter binding) for no security benefit.
Did not add a per-AST byte-size cap. The existing websocketMaxPayloadBytes (10 MB default) already bounds total request size.

Tests

In packages/zero-protocol/src/ast-depth.test.ts (new):

AND chains, EXISTS chains, related[] chains at the limit, just below, and well above
Custom limit override behavior
Malformed input tolerance (depth check doesn't trip on missing fields)
Wall-clock budget at depth 1000

In packages/zero-protocol/src/upstream-schema.test.ts (new):

Wire-grammar round-trip for every upstream message type
Negative cases for the most common malformed shapes
Depth-guard rejection for initConnection and changeDesiredQueries

Test results

packages/zero-protocol: 85 / 85 passing
packages/zero-cache/no-pg: 1275 / 1275 passing
packages/zql: 1277 / 1277 passing
Type check: clean
Lint: 0 errors

Test plan

Wire-grammar suite covers every upstream message type and proves wire-compatibility under this refactor
Depth-guard suite covers AND, EXISTS, and related[] recursion paths
Existing zero-protocol, zero-cache/no-pg, zql test suites pass

vercel · 2026-05-17T23:08:11Z

@Karavil is attempting to deploy a commit to the Rocicorp Team on Vercel.

A member of the Team first needs to authorize it.

An unauthenticated WebSocket client could open a connection to zero-cache and send an `initConnection` (or `changeDesiredQueries`) message whose `desiredQueriesPatch[*].ast` contains a deeply-nested condition tree, e.g. let cond = { type: 'simple', op: '=', ... }; for (let i = 0; i < N; i++) cond = { type: 'and', conditions: [cond] }; There are two recursion surfaces that overflow under such input: 1. Valita schema parsing of `astSchema` / `conditionSchema` in `packages/zero-protocol/src/ast.ts`. The schema is structurally recursive via `v.lazy(...)`, so every nesting level burns one JS frame. Overflow lands around depth ~900 (RangeError at parse, ~150ms). 2. The downstream AST visitors in `transformQuery` / `transformQueryInternal` / `transformCondition` (`packages/zero-cache/src/auth/read-authorizer.ts`), `simplifyCondition` (`packages/zql/src/query/expression.ts`), `bindStaticParameters` (`packages/zql/src/builder/builder.ts`), and `hashOfAST` / `normalizeAST`. These overflow slightly earlier (~800) and burn ~6.5s of CPU at the sweet spot because the AND/OR rewrites in `simplifyCondition` re-walk the tree. A trickle of such requests, unauthenticated, would saturate sync workers. This change caps AST depth at the wire-entry sites that accept client-supplied ASTs by adding a depth-bounded variant of `astSchema`: - New `MAX_AST_DEPTH = 50` constant plus an iterative `assertAstDepth` helper in `zero-protocol/src/ast.ts`. The walker uses an explicit worklist (no recursion) so it cannot itself overflow the stack it is protecting. It counts depth across the mutually-recursive surface: `Condition.conditions[]`, `CorrelatedSubqueryCondition.related.subquery` (Condition -> AST), `AST.where` (AST -> Condition), and `AST.related[].subquery` (AST -> AST). 50 is ~8x headroom over the deepest realistic permission-rule plus user-where combination. - New `depthBoundedAstSchema` export in `ast.ts`. This is `astSchema` wrapped in `v.unknown().chain(value => { assertAstDepth(value); return astSchema.try(value) })`. Too-deep inputs fail through the same `v.parse` error path as any other malformed message, so no new error shape and no envelope-specific parser is introduced. - `queries-patch.ts:upPutOpSchema.ast` and the two AST fields on `inspect-up.ts:inspectAnalyzeQueryUpSchema` are switched to `depthBoundedAstSchema`. These are every place an untrusted client can put an AST on the wire. `upstreamSchema` itself is unchanged, and `connection.ts` does not need a separate guard call. - Server-internal usages of `astSchema` (CVR row loading in `cvr-store.ts`, inspector downstream responses, etc.) are deliberately NOT switched: legacy stored ASTs from existing deployments that happen to be > 50 deep should still load. Callers that want a depth bound at a server-side site can opt in by switching their reference to `depthBoundedAstSchema`. Scope choices, deliberately omitted from this commit: - The downstream visitors are not rewritten to be iterative. Doing so would be a substantial refactor of code with subtle rewrite semantics (AND/OR flattening, NOT/EXISTS propagation, static-parameter binding) and is unnecessary once the wire layer caps depth. - A separate per-AST byte-size cap is not added. The existing `websocketMaxPayloadBytes` (default 10MB) already bounds total request size. Regression tests: - `packages/zero-protocol/src/ast-depth.test.ts`: covers the iterative `assertAstDepth` directly. Includes the mutually-recursive EXISTS axis, the `AST.related[]` axis, custom limits, tolerance of malformed input, and a wall-clock budget at depth 1000. - `packages/zero-protocol/src/upstream-schema.test.ts`: a wire-grammar contract for `upstreamSchema`. Every upstream message type is round-tripped through `v.parse(...)` with a minimal representative value to prove it is still accepted unchanged; a handful of obviously-invalid payloads are asserted to be rejected; and two depth-guard cases prove `initConnection` / `changeDesiredQueries` with depth=1000 ASTs are rejected through the same `v.parse` error path. Protocol-version hash: - `protocol-version.test.ts` hashes a `JSON.stringify` of the schema object. Switching `upPutOpSchema.ast` and the inspect-up AST fields to the chain-wrapped `depthBoundedAstSchema` changes the runtime schema object's serialized shape, so the hash moves even though the wire grammar is unchanged. The hash is bumped accordingly. The comment in the test now points at `upstream-schema.test.ts` as the actual wire-grammar contract; bumping the hash without bumping `PROTOCOL_VERSION` is safe iff the wire-grammar suite still passes (which it does). Verified against `zero-protocol` (85 tests), `zero-cache/no-pg` (1275 tests), and `zql` (1277 tests).

Karavil force-pushed the fix/ast-depth-limit-cpu-dos branch 2 times, most recently from 3eca25b to b1f57e9 Compare May 17, 2026 23:47

Karavil force-pushed the fix/ast-depth-limit-cpu-dos branch from b1f57e9 to e2c92ec Compare May 18, 2026 00:00

danieljohnmorris mentioned this pull request May 21, 2026

parser: cap AST nesting depth to prevent CPU DoS (#5bo) ilo-lang/ilo#528

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(zero-cache): cap AST depth to prevent unauthenticated CPU DoS#6000

fix(zero-cache): cap AST depth to prevent unauthenticated CPU DoS#6000
Karavil wants to merge 1 commit into
rocicorp:mainfrom
Karavil:fix/ast-depth-limit-cpu-dos

Karavil commented May 17, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Karavil commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix shape

Wire compatibility & legacy data

What was intentionally not done

Tests

Test results

Test plan

Uh oh!

vercel Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Karavil commented May 17, 2026 •

edited

Loading