Skip to content

fix(zero-cache): cap AST depth to prevent unauthenticated CPU DoS#6000

Open
Karavil wants to merge 1 commit into
rocicorp:mainfrom
Karavil:fix/ast-depth-limit-cpu-dos
Open

fix(zero-cache): cap AST depth to prevent unauthenticated CPU DoS#6000
Karavil wants to merge 1 commit into
rocicorp:mainfrom
Karavil:fix/ast-depth-limit-cpu-dos

Conversation

@Karavil
Copy link
Copy Markdown
Contributor

@Karavil Karavil commented May 17, 2026

Summary

An unauthenticated WebSocket client can open a connection to zero-cache and send an initConnection (or changeDesiredQueries) message whose desiredQueriesPatch[*].ast contains a deeply-nested condition tree:

let cond = { type: 'simple', op: '=', ... };
for (let i = 0; i < N; i++) cond = { type: 'and', conditions: [cond] };

There are two recursion surfaces that overflow under such input:

  1. Valita schema parsing of astSchema / conditionSchema in packages/zero-protocol/src/ast.ts. The schema is structurally recursive via v.lazy(...), so every nesting level burns one JS frame. Overflow lands around depth ~900 (RangeError at parse, ~150 ms).
  2. Downstream AST visitors in transformQuery / transformQueryInternal / transformCondition (packages/zero-cache/src/auth/read-authorizer.ts), simplifyCondition (packages/zql/src/query/expression.ts), bindStaticParameters (packages/zql/src/builder/builder.ts), and hashOfAST / normalizeAST. These overflow slightly earlier (~800) and burn ~6.5 seconds of CPU at the sweet spot because the AND/OR rewrites in simplifyCondition re-walk the tree.

The error is caught and the connection closed, so the worker process does not crash — but per-request CPU at the depth-~800 sweet spot is enormous, and no authentication is required. A trickle of crafted requests would saturate sync workers.

Measured against a production zero-cache (protocol v50) before this patch:

Depth Server result
≤ 700 Parses + transforms successfully (CPU cost grows with depth)
800 Internal: Maximum call stack size exceeded after 6.5 seconds of CPU
900+ InvalidMessage: RangeError: Maximum call stack size exceeded at parse, ~150 ms

Found during a SOC 2 audit of an opaque-token deployment.

Fix shape

The depth check is folded into the schema layer: a new depthBoundedAstSchema (chain-wrapped variant of astSchema) is substituted at the wire-entry points that accept client-supplied ASTs. No envelope-specific guard or call-site assert(...) is added — too-deep inputs fail through the same v.parse error path as any other malformed message.

  1. New constant MAX_AST_DEPTH = 50 in packages/zero-protocol/src/ast.ts.
  2. New assertAstDepth(ast, max) — an iterative walker (explicit worklist) over the mutually-recursive Condition ↔ AST surface (handles and / or conditions, correlatedSubquery → ast → where → ..., and AST.related[].subquery). Throws if any branch exceeds max. Iterative because the guard cannot itself stack-overflow the path it is protecting.
  3. New depthBoundedAstSchema export — v.unknown().chain(value => { try { assertAstDepth(value); } catch (e) { return v.err(...); } return astSchema.try(value); }).
  4. Wire-entry sites now reference depthBoundedAstSchema instead of astSchema:
    • packages/zero-protocol/src/queries-patch.ts:upPutOpSchema.ast (every desiredQueriesPatch[].ast from a client)
    • packages/zero-protocol/src/inspect-up.ts:inspectAnalyzeQueryUpSchema.ast and .value (the inspector's analyze-query op)

upstreamSchema is untouched. connection.ts is untouched. The check rides on top of the existing schema mechanism rather than a parallel guard.

The depth limit is 50. Headroom argument: the deepest realistic permission-rule plus user-where combination is ~6 levels; 50 is roughly 8×. Easy to raise if any consumer reports a real workload that needs more — it's a single constant.

Wire compatibility & legacy data

Server-internal usages of astSchema (CVR row loading in cvr-store.ts, inspector downstream responses, etc.) are deliberately not switched. Existing deployments may have CVR-stored ASTs deeper than 50 from before the cap; those should still load. Callers that want a depth bound at a server-side site can opt in by switching their reference to depthBoundedAstSchema.

protocol-version.test.ts hashes JSON.stringify(schema). Switching upPutOpSchema.ast and the inspect-up AST fields to the chain-wrapped variant changes the schema object's serialized shape, so the hash moves even though the wire grammar is unchanged. The hash is bumped accordingly.

To prove the wire grammar is actually preserved (so reviewers don't have to take that on faith), this PR adds packages/zero-protocol/src/upstream-schema.test.ts — a new regression suite that:

  • Round-trips a minimal representative payload through v.parse(..., upstreamSchema) for every upstream message type (initConnection, ping, deleteClients, changeDesiredQueries, pull, updateAuth, push, closeConnection, inspect × 4 ops, ackMutationResponses) and asserts the parsed value equals the input.
  • Asserts that obviously-malformed payloads (wrong tag, missing required fields, wrong types) are rejected.
  • Asserts that initConnection and changeDesiredQueries with a depth-1000 AST in desiredQueriesPatch are rejected through the standard v.parse error path.

protocol-version.test.ts's comment now points at upstream-schema.test.ts as the actual wire-grammar contract; bumping the hash without bumping PROTOCOL_VERSION is safe iff the wire-grammar suite still passes.

What was intentionally not done

  • Did not rewrite transformQueryInternal, transformCondition, simplifyCondition, or bindStaticParameters to be iterative. The wire layer caps depth at 50; the downstream visitors will never see deeper input. Doing the rewrite would touch subtle semantics (AND/OR flattening, NOT/EXISTS, static-parameter binding) for no security benefit.
  • Did not add a per-AST byte-size cap. The existing websocketMaxPayloadBytes (10 MB default) already bounds total request size.

Tests

In packages/zero-protocol/src/ast-depth.test.ts (new):

  • AND chains, EXISTS chains, related[] chains at the limit, just below, and well above
  • Custom limit override behavior
  • Malformed input tolerance (depth check doesn't trip on missing fields)
  • Wall-clock budget at depth 1000

In packages/zero-protocol/src/upstream-schema.test.ts (new):

  • Wire-grammar round-trip for every upstream message type
  • Negative cases for the most common malformed shapes
  • Depth-guard rejection for initConnection and changeDesiredQueries

Test results

  • packages/zero-protocol: 85 / 85 passing
  • packages/zero-cache/no-pg: 1275 / 1275 passing
  • packages/zql: 1277 / 1277 passing
  • Type check: clean
  • Lint: 0 errors

Test plan

  • Wire-grammar suite covers every upstream message type and proves wire-compatibility under this refactor
  • Depth-guard suite covers AND, EXISTS, and related[] recursion paths
  • Existing zero-protocol, zero-cache/no-pg, zql test suites pass

@vercel
Copy link
Copy Markdown

vercel Bot commented May 17, 2026

@Karavil is attempting to deploy a commit to the Rocicorp Team on Vercel.

A member of the Team first needs to authorize it.

@Karavil Karavil force-pushed the fix/ast-depth-limit-cpu-dos branch 2 times, most recently from 3eca25b to b1f57e9 Compare May 17, 2026 23:47
An unauthenticated WebSocket client could open a connection to zero-cache
and send an `initConnection` (or `changeDesiredQueries`) message whose
`desiredQueriesPatch[*].ast` contains a deeply-nested condition tree, e.g.

    let cond = { type: 'simple', op: '=', ... };
    for (let i = 0; i < N; i++) cond = { type: 'and', conditions: [cond] };

There are two recursion surfaces that overflow under such input:

  1. Valita schema parsing of `astSchema` / `conditionSchema` in
     `packages/zero-protocol/src/ast.ts`. The schema is structurally
     recursive via `v.lazy(...)`, so every nesting level burns one JS
     frame. Overflow lands around depth ~900 (RangeError at parse, ~150ms).

  2. The downstream AST visitors in `transformQuery` /
     `transformQueryInternal` / `transformCondition`
     (`packages/zero-cache/src/auth/read-authorizer.ts`),
     `simplifyCondition` (`packages/zql/src/query/expression.ts`),
     `bindStaticParameters` (`packages/zql/src/builder/builder.ts`), and
     `hashOfAST` / `normalizeAST`. These overflow slightly earlier (~800)
     and burn ~6.5s of CPU at the sweet spot because the AND/OR rewrites
     in `simplifyCondition` re-walk the tree.

A trickle of such requests, unauthenticated, would saturate sync workers.

This change caps AST depth at the wire-entry sites that accept
client-supplied ASTs by adding a depth-bounded variant of `astSchema`:

  - New `MAX_AST_DEPTH = 50` constant plus an iterative `assertAstDepth`
    helper in `zero-protocol/src/ast.ts`. The walker uses an explicit
    worklist (no recursion) so it cannot itself overflow the stack it is
    protecting. It counts depth across the mutually-recursive surface:
    `Condition.conditions[]`, `CorrelatedSubqueryCondition.related.subquery`
    (Condition -> AST), `AST.where` (AST -> Condition), and
    `AST.related[].subquery` (AST -> AST). 50 is ~8x headroom over the
    deepest realistic permission-rule plus user-where combination.

  - New `depthBoundedAstSchema` export in `ast.ts`. This is `astSchema`
    wrapped in `v.unknown().chain(value => { assertAstDepth(value);
    return astSchema.try(value) })`. Too-deep inputs fail through the
    same `v.parse` error path as any other malformed message, so no
    new error shape and no envelope-specific parser is introduced.

  - `queries-patch.ts:upPutOpSchema.ast` and the two AST fields on
    `inspect-up.ts:inspectAnalyzeQueryUpSchema` are switched to
    `depthBoundedAstSchema`. These are every place an untrusted client
    can put an AST on the wire. `upstreamSchema` itself is unchanged,
    and `connection.ts` does not need a separate guard call.

  - Server-internal usages of `astSchema` (CVR row loading in
    `cvr-store.ts`, inspector downstream responses, etc.) are
    deliberately NOT switched: legacy stored ASTs from existing
    deployments that happen to be > 50 deep should still load.
    Callers that want a depth bound at a server-side site can opt in
    by switching their reference to `depthBoundedAstSchema`.

Scope choices, deliberately omitted from this commit:

  - The downstream visitors are not rewritten to be iterative. Doing so
    would be a substantial refactor of code with subtle rewrite semantics
    (AND/OR flattening, NOT/EXISTS propagation, static-parameter binding)
    and is unnecessary once the wire layer caps depth.

  - A separate per-AST byte-size cap is not added. The existing
    `websocketMaxPayloadBytes` (default 10MB) already bounds total
    request size.

Regression tests:

  - `packages/zero-protocol/src/ast-depth.test.ts`: covers the
    iterative `assertAstDepth` directly. Includes the mutually-recursive
    EXISTS axis, the `AST.related[]` axis, custom limits, tolerance of
    malformed input, and a wall-clock budget at depth 1000.

  - `packages/zero-protocol/src/upstream-schema.test.ts`: a wire-grammar
    contract for `upstreamSchema`. Every upstream message type is
    round-tripped through `v.parse(...)` with a minimal representative
    value to prove it is still accepted unchanged; a handful of
    obviously-invalid payloads are asserted to be rejected; and two
    depth-guard cases prove `initConnection` / `changeDesiredQueries`
    with depth=1000 ASTs are rejected through the same `v.parse` error
    path.

Protocol-version hash:

  - `protocol-version.test.ts` hashes a `JSON.stringify` of the schema
    object. Switching `upPutOpSchema.ast` and the inspect-up AST fields
    to the chain-wrapped `depthBoundedAstSchema` changes the runtime
    schema object's serialized shape, so the hash moves even though the
    wire grammar is unchanged. The hash is bumped accordingly. The
    comment in the test now points at `upstream-schema.test.ts` as the
    actual wire-grammar contract; bumping the hash without bumping
    `PROTOCOL_VERSION` is safe iff the wire-grammar suite still passes
    (which it does).

Verified against `zero-protocol` (85 tests), `zero-cache/no-pg`
(1275 tests), and `zql` (1277 tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant