fix(zero-cache): cap AST depth to prevent unauthenticated CPU DoS#6000
Open
Karavil wants to merge 1 commit into
Open
fix(zero-cache): cap AST depth to prevent unauthenticated CPU DoS#6000Karavil wants to merge 1 commit into
Karavil wants to merge 1 commit into
Conversation
|
@Karavil is attempting to deploy a commit to the Rocicorp Team on Vercel. A member of the Team first needs to authorize it. |
3eca25b to
b1f57e9
Compare
An unauthenticated WebSocket client could open a connection to zero-cache
and send an `initConnection` (or `changeDesiredQueries`) message whose
`desiredQueriesPatch[*].ast` contains a deeply-nested condition tree, e.g.
let cond = { type: 'simple', op: '=', ... };
for (let i = 0; i < N; i++) cond = { type: 'and', conditions: [cond] };
There are two recursion surfaces that overflow under such input:
1. Valita schema parsing of `astSchema` / `conditionSchema` in
`packages/zero-protocol/src/ast.ts`. The schema is structurally
recursive via `v.lazy(...)`, so every nesting level burns one JS
frame. Overflow lands around depth ~900 (RangeError at parse, ~150ms).
2. The downstream AST visitors in `transformQuery` /
`transformQueryInternal` / `transformCondition`
(`packages/zero-cache/src/auth/read-authorizer.ts`),
`simplifyCondition` (`packages/zql/src/query/expression.ts`),
`bindStaticParameters` (`packages/zql/src/builder/builder.ts`), and
`hashOfAST` / `normalizeAST`. These overflow slightly earlier (~800)
and burn ~6.5s of CPU at the sweet spot because the AND/OR rewrites
in `simplifyCondition` re-walk the tree.
A trickle of such requests, unauthenticated, would saturate sync workers.
This change caps AST depth at the wire-entry sites that accept
client-supplied ASTs by adding a depth-bounded variant of `astSchema`:
- New `MAX_AST_DEPTH = 50` constant plus an iterative `assertAstDepth`
helper in `zero-protocol/src/ast.ts`. The walker uses an explicit
worklist (no recursion) so it cannot itself overflow the stack it is
protecting. It counts depth across the mutually-recursive surface:
`Condition.conditions[]`, `CorrelatedSubqueryCondition.related.subquery`
(Condition -> AST), `AST.where` (AST -> Condition), and
`AST.related[].subquery` (AST -> AST). 50 is ~8x headroom over the
deepest realistic permission-rule plus user-where combination.
- New `depthBoundedAstSchema` export in `ast.ts`. This is `astSchema`
wrapped in `v.unknown().chain(value => { assertAstDepth(value);
return astSchema.try(value) })`. Too-deep inputs fail through the
same `v.parse` error path as any other malformed message, so no
new error shape and no envelope-specific parser is introduced.
- `queries-patch.ts:upPutOpSchema.ast` and the two AST fields on
`inspect-up.ts:inspectAnalyzeQueryUpSchema` are switched to
`depthBoundedAstSchema`. These are every place an untrusted client
can put an AST on the wire. `upstreamSchema` itself is unchanged,
and `connection.ts` does not need a separate guard call.
- Server-internal usages of `astSchema` (CVR row loading in
`cvr-store.ts`, inspector downstream responses, etc.) are
deliberately NOT switched: legacy stored ASTs from existing
deployments that happen to be > 50 deep should still load.
Callers that want a depth bound at a server-side site can opt in
by switching their reference to `depthBoundedAstSchema`.
Scope choices, deliberately omitted from this commit:
- The downstream visitors are not rewritten to be iterative. Doing so
would be a substantial refactor of code with subtle rewrite semantics
(AND/OR flattening, NOT/EXISTS propagation, static-parameter binding)
and is unnecessary once the wire layer caps depth.
- A separate per-AST byte-size cap is not added. The existing
`websocketMaxPayloadBytes` (default 10MB) already bounds total
request size.
Regression tests:
- `packages/zero-protocol/src/ast-depth.test.ts`: covers the
iterative `assertAstDepth` directly. Includes the mutually-recursive
EXISTS axis, the `AST.related[]` axis, custom limits, tolerance of
malformed input, and a wall-clock budget at depth 1000.
- `packages/zero-protocol/src/upstream-schema.test.ts`: a wire-grammar
contract for `upstreamSchema`. Every upstream message type is
round-tripped through `v.parse(...)` with a minimal representative
value to prove it is still accepted unchanged; a handful of
obviously-invalid payloads are asserted to be rejected; and two
depth-guard cases prove `initConnection` / `changeDesiredQueries`
with depth=1000 ASTs are rejected through the same `v.parse` error
path.
Protocol-version hash:
- `protocol-version.test.ts` hashes a `JSON.stringify` of the schema
object. Switching `upPutOpSchema.ast` and the inspect-up AST fields
to the chain-wrapped `depthBoundedAstSchema` changes the runtime
schema object's serialized shape, so the hash moves even though the
wire grammar is unchanged. The hash is bumped accordingly. The
comment in the test now points at `upstream-schema.test.ts` as the
actual wire-grammar contract; bumping the hash without bumping
`PROTOCOL_VERSION` is safe iff the wire-grammar suite still passes
(which it does).
Verified against `zero-protocol` (85 tests), `zero-cache/no-pg`
(1275 tests), and `zql` (1277 tests).
b1f57e9 to
e2c92ec
Compare
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
An unauthenticated WebSocket client can open a connection to zero-cache and send an
initConnection(orchangeDesiredQueries) message whosedesiredQueriesPatch[*].astcontains a deeply-nested condition tree:There are two recursion surfaces that overflow under such input:
astSchema/conditionSchemainpackages/zero-protocol/src/ast.ts. The schema is structurally recursive viav.lazy(...), so every nesting level burns one JS frame. Overflow lands around depth ~900 (RangeErrorat parse, ~150 ms).transformQuery/transformQueryInternal/transformCondition(packages/zero-cache/src/auth/read-authorizer.ts),simplifyCondition(packages/zql/src/query/expression.ts),bindStaticParameters(packages/zql/src/builder/builder.ts), andhashOfAST/normalizeAST. These overflow slightly earlier (~800) and burn ~6.5 seconds of CPU at the sweet spot because the AND/OR rewrites insimplifyConditionre-walk the tree.The error is caught and the connection closed, so the worker process does not crash — but per-request CPU at the depth-~800 sweet spot is enormous, and no authentication is required. A trickle of crafted requests would saturate sync workers.
Measured against a production zero-cache (protocol v50) before this patch:
Internal: Maximum call stack size exceededafter 6.5 seconds of CPUInvalidMessage: RangeError: Maximum call stack size exceededat parse, ~150 msFound during a SOC 2 audit of an opaque-token deployment.
Fix shape
The depth check is folded into the schema layer: a new
depthBoundedAstSchema(chain-wrapped variant ofastSchema) is substituted at the wire-entry points that accept client-supplied ASTs. No envelope-specific guard or call-siteassert(...)is added — too-deep inputs fail through the samev.parseerror path as any other malformed message.MAX_AST_DEPTH = 50inpackages/zero-protocol/src/ast.ts.assertAstDepth(ast, max)— an iterative walker (explicit worklist) over the mutually-recursiveCondition ↔ ASTsurface (handlesand/orconditions,correlatedSubquery → ast → where → ..., andAST.related[].subquery). Throws if any branch exceedsmax. Iterative because the guard cannot itself stack-overflow the path it is protecting.depthBoundedAstSchemaexport —v.unknown().chain(value => { try { assertAstDepth(value); } catch (e) { return v.err(...); } return astSchema.try(value); }).depthBoundedAstSchemainstead ofastSchema:packages/zero-protocol/src/queries-patch.ts:upPutOpSchema.ast(everydesiredQueriesPatch[].astfrom a client)packages/zero-protocol/src/inspect-up.ts:inspectAnalyzeQueryUpSchema.astand.value(the inspector'sanalyze-queryop)upstreamSchemais untouched.connection.tsis untouched. The check rides on top of the existing schema mechanism rather than a parallel guard.The depth limit is 50. Headroom argument: the deepest realistic permission-rule plus user-where combination is ~6 levels; 50 is roughly 8×. Easy to raise if any consumer reports a real workload that needs more — it's a single constant.
Wire compatibility & legacy data
Server-internal usages of
astSchema(CVR row loading incvr-store.ts, inspector downstream responses, etc.) are deliberately not switched. Existing deployments may have CVR-stored ASTs deeper than 50 from before the cap; those should still load. Callers that want a depth bound at a server-side site can opt in by switching their reference todepthBoundedAstSchema.protocol-version.test.tshashesJSON.stringify(schema). SwitchingupPutOpSchema.astand the inspect-up AST fields to the chain-wrapped variant changes the schema object's serialized shape, so the hash moves even though the wire grammar is unchanged. The hash is bumped accordingly.To prove the wire grammar is actually preserved (so reviewers don't have to take that on faith), this PR adds
packages/zero-protocol/src/upstream-schema.test.ts— a new regression suite that:v.parse(..., upstreamSchema)for every upstream message type (initConnection,ping,deleteClients,changeDesiredQueries,pull,updateAuth,push,closeConnection,inspect× 4 ops,ackMutationResponses) and asserts the parsed value equals the input.initConnectionandchangeDesiredQuerieswith a depth-1000 AST indesiredQueriesPatchare rejected through the standardv.parseerror path.protocol-version.test.ts's comment now points atupstream-schema.test.tsas the actual wire-grammar contract; bumping the hash without bumpingPROTOCOL_VERSIONis safe iff the wire-grammar suite still passes.What was intentionally not done
transformQueryInternal,transformCondition,simplifyCondition, orbindStaticParametersto be iterative. The wire layer caps depth at 50; the downstream visitors will never see deeper input. Doing the rewrite would touch subtle semantics (AND/OR flattening, NOT/EXISTS, static-parameter binding) for no security benefit.websocketMaxPayloadBytes(10 MB default) already bounds total request size.Tests
In
packages/zero-protocol/src/ast-depth.test.ts(new):related[]chains at the limit, just below, and well aboveIn
packages/zero-protocol/src/upstream-schema.test.ts(new):initConnectionandchangeDesiredQueriesTest results
packages/zero-protocol: 85 / 85 passingpackages/zero-cache/no-pg: 1275 / 1275 passingpackages/zql: 1277 / 1277 passingTest plan
related[]recursion pathszero-protocol,zero-cache/no-pg,zqltest suites pass