Skip to content

chore(zql): penalize flip in the planner's cost model#5992

Draft
tantaman wants to merge 3 commits into
mainfrom
mlaw/penalize-flip-cost
Draft

chore(zql): penalize flip in the planner's cost model#5992
tantaman wants to merge 3 commits into
mainfrom
mlaw/penalize-flip-cost

Conversation

@tantaman
Copy link
Copy Markdown
Contributor

@tantaman tantaman commented May 15, 2026

Summary

The planner's cost model under-counts the runtime cost of
FlippedJoin because it doesn't model two IVM-level costs
that are invisible to SQLite's scanstatus:

  1. Eager child loadFlippedJoin.fetch reads ALL
    children into an array before any parent work, so even with
    a downstream LIMIT every child pays IVM cost (generator
    yields, debug accounting, btree-set inserts).
  2. Chunk primingmergeSortedStreams opens every
    chunk to seed its heap before yielding the first row; each
    open runs an IN-list SQL to its first match.

Concretely this means the cost model treats SQLite row scans
and IVM row processing as if they cost the same — but in
practice IVM is ~100× slower per row. For limited queries
with large child cardinality this leads the planner to pick
a flipped plan when semi would short-circuit far earlier.

This change adds child.scanEst * FLIP_IVM_PER_CHILD_OVERHEAD (constant 3) to the
flipped-join cost, gated on parent.limit !== undefined && child.scanEst > getMultiConstraintChunkSize(). The gate
matters:

  • No downstream limit → semi has to scan everything
    anyway, so flipped's eager-load isn't wasted. Don't
    penalize.
  • Single chunk (child.scanEst ≤ 256) → no
    mergeSortedStreams priming happens, so no IVM tax. Don't
    penalize.

This preserves flipped wins on full-scan queries and on
small child sets, while penalizing the case it's actually
wrong for: limited TAKE queries with huge child
cardinality.

Measured impact

Benchmarked on a 211 GB zbugs replica via
zql-benchmarks/src/zbugs-profile.ts. The pathological
query is project=gatewaycore AND open=true AND whereExists(label=api-gateway) AND whereExists(label=async-processing) ORDER BY modified DESC LIMIT 50:

before after
Plan picked flipped + semi (cost 401k) **semi + semi
(cost 1.24M)**
Rows scanned by SQLite 1,098,099 53,712 (20×
fewer)
Rows read into JS 475,051 48,397 (10× fewer)
Wall time 65 s 7.4 s (8.8× faster)

Other zbugs benchmark variants (with assignee, single-label,
no-label) are unchanged in plan choice and row counts —
their costs don't trip the gate.

Cost model details

For a flipped join with parent P, child C, and chunk
size K:

cost = C.cost
+ ceil(C.scanEst / K) * P.startupCost //
per-chunk prepare
+ C.scanEst * (P.cost + P.scanEst) //
per-child seek
+ (P.limit !== undefined && C.scanEst > K
? C.scanEst * 3 // ← new
IVM-overhead term
: 0)

The constant 3 is calibrated against the observed gap:
pre-change flipped at 401k vs semi-semi at 1.24M for the v6
query; multiplying the 416k-row child scan by 3 (≈ 1.25M
added) tips the choice. Other calibrations tried:

  • 0 — baseline, v6 picks the bad plan (65 s).
  • 1, 2 — v6 still picks flipped (semi cost not reached).
  • 3 — v6 picks semi-semi. Used.
  • Higher values aren't needed and start over-penalizing
    legitimate flipped wins.

Test impact

Existing planner-join.test.ts chunk-boundary tests now
exercise a connection with limit=50 so the gate fires and
the IVM term is exercised. The helper ovhFor(n, chunkSize)
makes assertions read symmetrically across the n ≤ chunkSize / n > chunkSize cases.

Integration tests

All chinook/planner.pg.test.ts (5/5) and
pagila/planner-exec.pg.test.ts (18/18) pass unchanged.

chinook/planner-exec.pg.test.ts is 25/26. The one
regression is correlation-only, not a plan-choice
regression:

  • Test: 'extreme selectivity - artist to album to long tracks' (indexed variant)
  • Picked plan: still optimal (within-optimal: 1.00x,
    passes)
  • Spearman correlation: 0.20 (threshold 0.35, -42.9%
    headroom)

The cost model's ranking of all-flipped plans against semi
alternatives drifts below this test's correlation
threshold, but the planner still picks the actually-optimal
plan. The base-DB threshold for this same test is already
0.15, so loosening the indexed threshold to 0.20 would
be consistent with the existing tolerance. Per the project's
CI margin convention (~30% headroom for cost-model
thresholds), the threshold was tighter than the model's
underlying noise floor; this PR just exposes it.

Depends on

This PR can be reviewed independently but its impact is best
measured on top of the Debug.rowVended O(N²) fix
(separate PR — branch mlaw/fix-debug-rowvended-quadratic).
Without that fix the v6 query takes 260 s instead of 65 s
and the 7.4 s post-fix number can't be reproduced.

Test plan

  • npm --workspace=zql run test -- planner (90 tests,
    all pass)
  • npm --workspace=zql-integration-tests run test -- --project='*pg-18*' src/chinook/planner-exec.pg.test.ts
    (25/26 — known soft fail on extreme selectivity
    correlation)
  • npm --workspace=zql-integration-tests run test -- --project='*pg-18*' src/pagila/planner-exec.pg.test.ts
    (18/18)
  • npm --workspace=zql-integration-tests run test -- --project='*pg-18*' src/chinook/planner.pg.test.ts (5/5)
  • Multi-pg sweep: rerun the same with
    --project='*pg-15*', '*pg-16*', '*pg-17*' before merge
  • Decide on the extreme selectivity indexed
    correlation threshold: leave the test failing as a known
    follow-up, or relax 0.35 → 0.20 in this PR

Follow-ups

  • The extreme selectivity indexed correlation threshold
    above.
  • Calibration: the constant 3 is fitted to one workload.
    If other workloads surface where flipped is the right choice
    but gets penalized, consider scaling the overhead by
    chunks instead of child.scanEst, or making it
    proportional to the existing SQL cost rather than a flat
    constant.
  • Stat4 has accurate per-label cardinality (verified —
    lbl_proj_000_15 and lbl_proj_000_45 both at 416k rows,
    planner gets ~393k estimates), so the cost model is
    correctly informed; this PR is about cost formula, not
    stats.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
replicache-docs Ready Ready Preview, Comment May 15, 2026 4:11pm
zbugs Ready Ready Preview, Comment May 15, 2026 4:11pm

Request Review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

🐰 Bencher Report

Branchmlaw/penalize-flip-cost
TestbedLinux
Click to view all benchmark results
BenchmarkFile SizeBenchmark Result
kilobytes (KB)
(Result Δ%)
Upper Boundary
kilobytes (KB)
(Limit %)
zero-package.tgz📈 view plot
🚷 view threshold
2,113.78 KB
(+0.08%)Baseline: 2,112.06 KB
2,154.30 KB
(98.12%)
zero.js📈 view plot
🚷 view threshold
280.12 KB
(+0.02%)Baseline: 280.07 KB
285.67 KB
(98.06%)
zero.js.br📈 view plot
🚷 view threshold
74.40 KB
(+0.01%)Baseline: 74.39 KB
75.88 KB
(98.05%)
🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

🐰 Bencher Report

Branchmlaw/penalize-flip-cost
Testbedself-hosted-metal
Click to view all benchmark results
BenchmarkThroughputBenchmark Result
operations / second (ops/s) x 1e3
(Result Δ%)
Lower Boundary
operations / second (ops/s) x 1e3
(Limit %)
src/client/custom.bench.ts > big schema📈 view plot
🚷 view threshold
37.17 ops/s x 1e3
(+5.48%)Baseline: 35.24 ops/s x 1e3
33.19 ops/s x 1e3
(89.30%)
src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers)📈 view plot
🚷 view threshold
1.11 ops/s x 1e3
(+1.45%)Baseline: 1.09 ops/s x 1e3
1.02 ops/s x 1e3
(91.44%)
src/client/zero.bench.ts > pk compare > pk = N📈 view plot
🚷 view threshold
19.97 ops/s x 1e3
(+2.22%)Baseline: 19.54 ops/s x 1e3
17.57 ops/s x 1e3
(87.98%)
src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers)📈 view plot
🚷 view threshold
1.53 ops/s x 1e3
(+2.12%)Baseline: 1.50 ops/s x 1e3
1.25 ops/s x 1e3
(81.83%)
🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

🐰 Bencher Report

Branchmlaw/penalize-flip-cost
Testbedself-hosted-metal
Click to view all benchmark results
BenchmarkThroughputBenchmark Result
operations / second (ops/s)
(Result Δ%)
Lower Boundary
operations / second (ops/s)
(Limit %)
src/db/pg-copy.bench.ts > pg-copy benchmark > copy📈 view plot
🚷 view threshold
17.75 ops/s
(-1.38%)Baseline: 18.00 ops/s
17.73 ops/s
(99.86%)
🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

🐰 Bencher Report

Branchmlaw/penalize-flip-cost
Testbedself-hosted-metal
Click to view all benchmark results
BenchmarkThroughputBenchmark Result
operations / second (ops/s)
(Result Δ%)
Lower Boundary
operations / second (ops/s)
(Limit %)
src/btree-set.bench.ts > BTreeSet iterator next() in isolation > forward iterator next()📈 view plot
🚷 view threshold
37,811.00 ops/s
(+3.16%)Baseline: 36,653.36 ops/s
35,460.36 ops/s
(93.78%)
src/btree-set.bench.ts > BTreeSet iterator next() in isolation > forward iterator next() from mid📈 view plot
🚷 view threshold
74,526.93 ops/s
(+3.99%)Baseline: 71,667.85 ops/s
69,395.13 ops/s
(93.11%)
src/btree-set.bench.ts > BTreeSet iterator next() in isolation > reverse iterator next()📈 view plot
🚷 view threshold
37,964.82 ops/s
(-0.57%)Baseline: 38,181.98 ops/s
37,162.61 ops/s
(97.89%)
src/btree-set.bench.ts > BTreeSet iterator next() in isolation > reverse iterator next() from mid📈 view plot
🚷 view threshold
74,874.03 ops/s
(-0.22%)Baseline: 75,036.84 ops/s
73,490.26 ops/s
(98.15%)
src/btree-set.bench.ts > BTreeSet iterators > [Symbol.iterator]() full scan📈 view plot
🚷 view threshold
41,333.98 ops/s
(+1.11%)Baseline: 40,880.34 ops/s
39,911.09 ops/s
(96.56%)
src/btree-set.bench.ts > BTreeSet iterators > values() full scan📈 view plot
🚷 view threshold
40,246.23 ops/s
(+0.57%)Baseline: 40,018.90 ops/s
39,404.50 ops/s
(97.91%)
src/btree-set.bench.ts > BTreeSet iterators > valuesFrom() from mid📈 view plot
🚷 view threshold
80,618.88 ops/s
(+1.75%)Baseline: 79,234.12 ops/s
76,036.12 ops/s
(94.32%)
src/btree-set.bench.ts > BTreeSet iterators > valuesFromReversed() from mid📈 view plot
🚷 view threshold
81,766.42 ops/s
(-0.43%)Baseline: 82,120.96 ops/s
81,027.40 ops/s
(99.10%)
src/btree-set.bench.ts > BTreeSet iterators > valuesReversed() full scan📈 view plot
🚷 view threshold
41,221.35 ops/s
(-0.25%)Baseline: 41,324.84 ops/s
40,792.86 ops/s
(98.96%)
src/btree-set.bench.ts > BTreeSet lookups > get() hit📈 view plot
🚷 view threshold
4,372,734.56 ops/s
(+0.10%)Baseline: 4,368,245.53 ops/s
4,334,044.06 ops/s
(99.12%)
src/btree-set.bench.ts > BTreeSet lookups > has() hit📈 view plot
🚷 view threshold
4,403,969.92 ops/s
(+0.31%)Baseline: 4,390,415.06 ops/s
4,319,084.19 ops/s
(98.07%)
src/btree-set.bench.ts > BTreeSet lookups > has() miss📈 view plot
🚷 view threshold
6,541,780.47 ops/s
(+0.39%)Baseline: 6,516,251.82 ops/s
6,344,996.65 ops/s
(96.99%)
src/btree-set.bench.ts > BTreeSet mutations > add() 100 sequential keys📈 view plot
🚷 view threshold
38,064.15 ops/s
(+0.29%)Baseline: 37,955.82 ops/s
37,212.44 ops/s
(97.76%)
src/btree-set.bench.ts > BTreeSet mutations > add() 1000 sequential keys📈 view plot
🚷 view threshold
3,304.81 ops/s
(-1.31%)Baseline: 3,348.60 ops/s
3,206.75 ops/s
(97.03%)
src/btree-set.bench.ts > BTreeSet mutations > add() then delete() single key📈 view plot
🚷 view threshold
1,945,554.71 ops/s
(+1.02%)Baseline: 1,925,960.69 ops/s
1,893,329.58 ops/s
(97.32%)
src/btree-set.bench.ts > BTreeSet mutations > fromSorted() 100 sequential keys📈 view plot
🚷 view threshold
438,199.84 ops/s
(-3.81%)Baseline: 455,554.12 ops/s
433,395.97 ops/s
(98.90%)
src/btree-set.bench.ts > BTreeSet mutations > fromSorted() 1000 sequential keys📈 view plot
🚷 view threshold
46,803.99 ops/s
(-3.51%)Baseline: 48,508.59 ops/s
46,515.81 ops/s
(99.38%)
src/btree-set.bench.ts > BTreeSet mutations > getOrCreateIndex pattern (new): sort + fromSorted()📈 view plot
🚷 view threshold
21,205.98 ops/s
(-1.95%)Baseline: 21,628.11 ops/s
20,899.99 ops/s
(98.56%)
src/btree-set.bench.ts > BTreeSet mutations > getOrCreateIndex pattern (old): add() loop after sort📈 view plot
🚷 view threshold
2,456.65 ops/s
(-1.06%)Baseline: 2,482.99 ops/s
2,424.31 ops/s
(98.68%)
src/size-of-value.bench.ts > getSizeOfValue performance > arrays > large array (100 items)📈 view plot
🚷 view threshold
576,294.11 ops/s
(+0.10%)Baseline: 575,692.60 ops/s
574,090.67 ops/s
(99.62%)
src/size-of-value.bench.ts > getSizeOfValue performance > arrays > small array (10 items)📈 view plot
🚷 view threshold
4,490,111.48 ops/s
(-0.19%)Baseline: 4,498,764.40 ops/s
4,477,724.45 ops/s
(99.72%)
src/size-of-value.bench.ts > getSizeOfValue performance > datasets > large dataset (100x512B)📈 view plot
🚷 view threshold
45,799.40 ops/s
(-0.39%)Baseline: 45,979.68 ops/s
45,591.75 ops/s
(99.55%)
src/size-of-value.bench.ts > getSizeOfValue performance > datasets > small dataset (10x256B)📈 view plot
🚷 view threshold
452,147.73 ops/s
(-0.45%)Baseline: 454,198.00 ops/s
447,902.68 ops/s
(99.06%)
src/size-of-value.bench.ts > getSizeOfValue performance > objects > nested object📈 view plot
🚷 view threshold
3,119,188.51 ops/s
(+0.10%)Baseline: 3,116,223.98 ops/s
3,087,055.28 ops/s
(98.97%)
src/size-of-value.bench.ts > getSizeOfValue performance > objects > structured object (1KB)📈 view plot
🚷 view threshold
4,744,512.47 ops/s
(-0.21%)Baseline: 4,754,704.18 ops/s
4,718,445.45 ops/s
(99.45%)
src/size-of-value.bench.ts > getSizeOfValue performance > objects > structured object (256B)📈 view plot
🚷 view threshold
4,756,584.24 ops/s
(-0.00%)Baseline: 4,756,699.57 ops/s
4,711,581.79 ops/s
(99.05%)
src/size-of-value.bench.ts > getSizeOfValue performance > primitives > boolean📈 view plot
🚷 view threshold
137,342,015.01 ops/s
(+0.18%)Baseline: 137,096,754.32 ops/s
135,401,138.60 ops/s
(98.59%)
src/size-of-value.bench.ts > getSizeOfValue performance > primitives > integer📈 view plot
🚷 view threshold
98,940,969.76 ops/s
(+0.06%)Baseline: 98,885,423.73 ops/s
98,398,288.33 ops/s
(99.45%)
src/size-of-value.bench.ts > getSizeOfValue performance > primitives > null📈 view plot
🚷 view threshold
120,327,401.88 ops/s
(+1.45%)Baseline: 118,610,613.81 ops/s
115,844,940.70 ops/s
(96.27%)
src/size-of-value.bench.ts > getSizeOfValue performance > primitives > string (100 chars)📈 view plot
🚷 view threshold
643,660.69 ops/s
(-0.58%)Baseline: 647,399.37 ops/s
634,961.39 ops/s
(98.65%)
src/tdigest.bench.ts > TDigest Benchmarks > add📈 view plot
🚷 view threshold
1.39 ops/s
(-3.72%)Baseline: 1.44 ops/s
1.38 ops/s
(99.30%)
src/tdigest.bench.ts > TDigest Benchmarks > addCentroid📈 view plot
🚷 view threshold
1.40 ops/s
(+1.17%)Baseline: 1.39 ops/s
1.34 ops/s
(95.26%)
src/tdigest.bench.ts > TDigest Benchmarks > addCentroidList📈 view plot
🚷 view threshold
1.40 ops/s
(+0.77%)Baseline: 1.39 ops/s
1.35 ops/s
(96.01%)
src/tdigest.bench.ts > TDigest Benchmarks > merge > addCentroid📈 view plot
🚷 view threshold
12,147.45 ops/s
(-0.60%)Baseline: 12,220.91 ops/s
11,923.18 ops/s
(98.15%)
src/tdigest.bench.ts > TDigest Benchmarks > merge > merge📈 view plot
🚷 view threshold
14,491.65 ops/s
(-1.08%)Baseline: 14,650.12 ops/s
14,353.06 ops/s
(99.04%)
src/tdigest.bench.ts > TDigest Benchmarks > quantile📈 view plot
🚷 view threshold
1.46 ops/s
(-0.39%)Baseline: 1.47 ops/s
1.38 ops/s
(94.59%)
🐰 View full continuous benchmarking report in Bencher

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

🐰 Bencher Report

Branchmlaw/penalize-flip-cost
Testbedself-hosted-metal

⚠️ WARNING: Truncated view!

The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🐰 View full continuous benchmarking report in Bencher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant