chore(zql): penalize flip in the planner's cost model by tantaman · Pull Request #5992 · rocicorp/mono

tantaman · 2026-05-15T16:09:39Z

Summary

The planner's cost model under-counts the runtime cost of
FlippedJoin because it doesn't model two IVM-level costs
that are invisible to SQLite's scanstatus:

Eager child load — FlippedJoin.fetch reads ALL
children into an array before any parent work, so even with
a downstream LIMIT every child pays IVM cost (generator
yields, debug accounting, btree-set inserts).
Chunk priming — mergeSortedStreams opens every
chunk to seed its heap before yielding the first row; each
open runs an IN-list SQL to its first match.

Concretely this means the cost model treats SQLite row scans
and IVM row processing as if they cost the same — but in
practice IVM is ~100× slower per row. For limited queries
with large child cardinality this leads the planner to pick
a flipped plan when semi would short-circuit far earlier.

This change adds child.scanEst * FLIP_IVM_PER_CHILD_OVERHEAD (constant 3) to the
flipped-join cost, gated on parent.limit !== undefined && child.scanEst > getMultiConstraintChunkSize(). The gate
matters:

No downstream limit → semi has to scan everything
anyway, so flipped's eager-load isn't wasted. Don't
penalize.
Single chunk (child.scanEst ≤ 256) → no
mergeSortedStreams priming happens, so no IVM tax. Don't
penalize.

This preserves flipped wins on full-scan queries and on
small child sets, while penalizing the case it's actually
wrong for: limited TAKE queries with huge child
cardinality.

Measured impact

Benchmarked on a 211 GB zbugs replica via
zql-benchmarks/src/zbugs-profile.ts. The pathological
query is project=gatewaycore AND open=true AND whereExists(label=api-gateway) AND whereExists(label=async-processing) ORDER BY modified DESC LIMIT 50:

	before	after
Plan picked	flipped + semi (cost 401k)	**semi + semi
(cost 1.24M)**
Rows scanned by SQLite	1,098,099	53,712 (20×
fewer)
Rows read into JS	475,051	48,397 (10× fewer)
Wall time	65 s	7.4 s (8.8× faster)

Other zbugs benchmark variants (with assignee, single-label,
no-label) are unchanged in plan choice and row counts —
their costs don't trip the gate.

Cost model details

For a flipped join with parent P, child C, and chunk
size K:

cost = C.cost
+ ceil(C.scanEst / K) * P.startupCost //
per-chunk prepare
+ C.scanEst * (P.cost + P.scanEst) //
per-child seek
+ (P.limit !== undefined && C.scanEst > K
? C.scanEst * 3 // ← new
IVM-overhead term
: 0)

The constant 3 is calibrated against the observed gap:
pre-change flipped at 401k vs semi-semi at 1.24M for the v6
query; multiplying the 416k-row child scan by 3 (≈ 1.25M
added) tips the choice. Other calibrations tried:

0 — baseline, v6 picks the bad plan (65 s).
1, 2 — v6 still picks flipped (semi cost not reached).
3 — v6 picks semi-semi. Used.
Higher values aren't needed and start over-penalizing
legitimate flipped wins.

Test impact

Existing planner-join.test.ts chunk-boundary tests now
exercise a connection with limit=50 so the gate fires and
the IVM term is exercised. The helper ovhFor(n, chunkSize)
makes assertions read symmetrically across the n ≤ chunkSize / n > chunkSize cases.

Integration tests

All chinook/planner.pg.test.ts (5/5) and
pagila/planner-exec.pg.test.ts (18/18) pass unchanged.

chinook/planner-exec.pg.test.ts is 25/26. The one
regression is correlation-only, not a plan-choice
regression:

Test: 'extreme selectivity - artist to album to long tracks' (indexed variant)
Picked plan: still optimal (within-optimal: 1.00x,
passes)
Spearman correlation: 0.20 (threshold 0.35, -42.9%
headroom)

The cost model's ranking of all-flipped plans against semi
alternatives drifts below this test's correlation
threshold, but the planner still picks the actually-optimal
plan. The base-DB threshold for this same test is already
0.15, so loosening the indexed threshold to 0.20 would
be consistent with the existing tolerance. Per the project's
CI margin convention (~30% headroom for cost-model
thresholds), the threshold was tighter than the model's
underlying noise floor; this PR just exposes it.

Depends on

This PR can be reviewed independently but its impact is best
measured on top of the Debug.rowVended O(N²) fix
(separate PR — branch mlaw/fix-debug-rowvended-quadratic).
Without that fix the v6 query takes 260 s instead of 65 s
and the 7.4 s post-fix number can't be reproduced.

Test plan

npm --workspace=zql run test -- planner (90 tests,
all pass)
npm --workspace=zql-integration-tests run test -- --project='*pg-18*' src/chinook/planner-exec.pg.test.ts
(25/26 — known soft fail on extreme selectivity
correlation)
npm --workspace=zql-integration-tests run test -- --project='*pg-18*' src/pagila/planner-exec.pg.test.ts
(18/18)
npm --workspace=zql-integration-tests run test -- --project='*pg-18*' src/chinook/planner.pg.test.ts (5/5)
Multi-pg sweep: rerun the same with
--project='*pg-15*', '*pg-16*', '*pg-17*' before merge
Decide on the extreme selectivity indexed
correlation threshold: leave the test failing as a known
follow-up, or relax 0.35 → 0.20 in this PR

Follow-ups

The extreme selectivity indexed correlation threshold
above.
Calibration: the constant 3 is fitted to one workload.
If other workloads surface where flipped is the right choice
but gets penalized, consider scaling the overhead by
chunks instead of child.scanEst, or making it
proportional to the existing SQL cost rather than a flat
constant.
Stat4 has accurate per-label cardinality (verified —
lbl_proj_000_15 and lbl_proj_000_45 both at 416k rows,
planner gets ~393k estimates), so the cost model is
correctly informed; this PR is about cost formula, not
stats.

(cherry picked from commit ab50e52)

(cherry picked from commit c6b35ba)

vercel · 2026-05-15T16:09:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
replicache-docs	Ready	Preview, Comment	May 15, 2026 4:11pm
zbugs	Ready	Preview, Comment	May 15, 2026 4:11pm

github-actions · 2026-05-15T16:12:03Z

Bencher Report

Branch	mlaw/penalize-flip-cost
Testbed	Linux

Click to view all benchmark results

Benchmark	File Size	Benchmark Result kilobytes (KB) (Result Δ%)	Upper Boundary kilobytes (KB) (Limit %)
zero-package.tgz	📈 view plot 🚷 view threshold	2,113.78 KB (+0.08%) Baseline: 2,112.06 KB	2,154.30 KB (98.12%)
zero.js	📈 view plot 🚷 view threshold	280.12 KB (+0.02%) Baseline: 280.07 KB	285.67 KB (98.06%)
zero.js.br	📈 view plot 🚷 view threshold	74.40 KB (+0.01%) Baseline: 74.39 KB	75.88 KB (98.05%)

🐰 View full continuous benchmarking report in Bencher

github-actions · 2026-05-15T16:41:27Z

Bencher Report

Branch	mlaw/penalize-flip-cost
Testbed	self-hosted-metal

Click to view all benchmark results

Benchmark	Throughput	Benchmark Result operations / second (ops/s) x 1e3 (Result Δ%)	Lower Boundary operations / second (ops/s) x 1e3 (Limit %)
src/client/custom.bench.ts > big schema	📈 view plot 🚷 view threshold	37.17 ops/s x 1e3 (+5.48%) Baseline: 35.24 ops/s x 1e3	33.19 ops/s x 1e3 (89.30%)
src/client/zero.bench.ts > basics > All 1000 rows x 10 columns (numbers)	📈 view plot 🚷 view threshold	1.11 ops/s x 1e3 (+1.45%) Baseline: 1.09 ops/s x 1e3	1.02 ops/s x 1e3 (91.44%)
src/client/zero.bench.ts > pk compare > pk = N	📈 view plot 🚷 view threshold	19.97 ops/s x 1e3 (+2.22%) Baseline: 19.54 ops/s x 1e3	17.57 ops/s x 1e3 (87.98%)
src/client/zero.bench.ts > with filter > Lower rows 500 x 10 columns (numbers)	📈 view plot 🚷 view threshold	1.53 ops/s x 1e3 (+2.12%) Baseline: 1.50 ops/s x 1e3	1.25 ops/s x 1e3 (81.83%)

🐰 View full continuous benchmarking report in Bencher

github-actions · 2026-05-15T16:43:16Z

Bencher Report

Branch	mlaw/penalize-flip-cost
Testbed	self-hosted-metal

Click to view all benchmark results

Benchmark	Throughput	Benchmark Result operations / second (ops/s) (Result Δ%)	Lower Boundary operations / second (ops/s) (Limit %)
src/db/pg-copy.bench.ts > pg-copy benchmark > copy	📈 view plot 🚷 view threshold	17.75 ops/s (-1.38%) Baseline: 18.00 ops/s	17.73 ops/s (99.86%)

🐰 View full continuous benchmarking report in Bencher

github-actions · 2026-05-15T16:47:52Z

Bencher Report

Branch	mlaw/penalize-flip-cost
Testbed	self-hosted-metal

Click to view all benchmark results

Benchmark	Throughput	Benchmark Result operations / second (ops/s) (Result Δ%)	Lower Boundary operations / second (ops/s) (Limit %)
src/btree-set.bench.ts > BTreeSet iterator next() in isolation > forward iterator next()	📈 view plot 🚷 view threshold	37,811.00 ops/s (+3.16%) Baseline: 36,653.36 ops/s	35,460.36 ops/s (93.78%)
src/btree-set.bench.ts > BTreeSet iterator next() in isolation > forward iterator next() from mid	📈 view plot 🚷 view threshold	74,526.93 ops/s (+3.99%) Baseline: 71,667.85 ops/s	69,395.13 ops/s (93.11%)
src/btree-set.bench.ts > BTreeSet iterator next() in isolation > reverse iterator next()	📈 view plot 🚷 view threshold	37,964.82 ops/s (-0.57%) Baseline: 38,181.98 ops/s	37,162.61 ops/s (97.89%)
src/btree-set.bench.ts > BTreeSet iterator next() in isolation > reverse iterator next() from mid	📈 view plot 🚷 view threshold	74,874.03 ops/s (-0.22%) Baseline: 75,036.84 ops/s	73,490.26 ops/s (98.15%)
src/btree-set.bench.ts > BTreeSet iterators > [Symbol.iterator]() full scan	📈 view plot 🚷 view threshold	41,333.98 ops/s (+1.11%) Baseline: 40,880.34 ops/s	39,911.09 ops/s (96.56%)
src/btree-set.bench.ts > BTreeSet iterators > values() full scan	📈 view plot 🚷 view threshold	40,246.23 ops/s (+0.57%) Baseline: 40,018.90 ops/s	39,404.50 ops/s (97.91%)
src/btree-set.bench.ts > BTreeSet iterators > valuesFrom() from mid	📈 view plot 🚷 view threshold	80,618.88 ops/s (+1.75%) Baseline: 79,234.12 ops/s	76,036.12 ops/s (94.32%)
src/btree-set.bench.ts > BTreeSet iterators > valuesFromReversed() from mid	📈 view plot 🚷 view threshold	81,766.42 ops/s (-0.43%) Baseline: 82,120.96 ops/s	81,027.40 ops/s (99.10%)
src/btree-set.bench.ts > BTreeSet iterators > valuesReversed() full scan	📈 view plot 🚷 view threshold	41,221.35 ops/s (-0.25%) Baseline: 41,324.84 ops/s	40,792.86 ops/s (98.96%)
src/btree-set.bench.ts > BTreeSet lookups > get() hit	📈 view plot 🚷 view threshold	4,372,734.56 ops/s (+0.10%) Baseline: 4,368,245.53 ops/s	4,334,044.06 ops/s (99.12%)
src/btree-set.bench.ts > BTreeSet lookups > has() hit	📈 view plot 🚷 view threshold	4,403,969.92 ops/s (+0.31%) Baseline: 4,390,415.06 ops/s	4,319,084.19 ops/s (98.07%)
src/btree-set.bench.ts > BTreeSet lookups > has() miss	📈 view plot 🚷 view threshold	6,541,780.47 ops/s (+0.39%) Baseline: 6,516,251.82 ops/s	6,344,996.65 ops/s (96.99%)
src/btree-set.bench.ts > BTreeSet mutations > add() 100 sequential keys	📈 view plot 🚷 view threshold	38,064.15 ops/s (+0.29%) Baseline: 37,955.82 ops/s	37,212.44 ops/s (97.76%)
src/btree-set.bench.ts > BTreeSet mutations > add() 1000 sequential keys	📈 view plot 🚷 view threshold	3,304.81 ops/s (-1.31%) Baseline: 3,348.60 ops/s	3,206.75 ops/s (97.03%)
src/btree-set.bench.ts > BTreeSet mutations > add() then delete() single key	📈 view plot 🚷 view threshold	1,945,554.71 ops/s (+1.02%) Baseline: 1,925,960.69 ops/s	1,893,329.58 ops/s (97.32%)
src/btree-set.bench.ts > BTreeSet mutations > fromSorted() 100 sequential keys	📈 view plot 🚷 view threshold	438,199.84 ops/s (-3.81%) Baseline: 455,554.12 ops/s	433,395.97 ops/s (98.90%)
src/btree-set.bench.ts > BTreeSet mutations > fromSorted() 1000 sequential keys	📈 view plot 🚷 view threshold	46,803.99 ops/s (-3.51%) Baseline: 48,508.59 ops/s	46,515.81 ops/s (99.38%)
src/btree-set.bench.ts > BTreeSet mutations > getOrCreateIndex pattern (new): sort + fromSorted()	📈 view plot 🚷 view threshold	21,205.98 ops/s (-1.95%) Baseline: 21,628.11 ops/s	20,899.99 ops/s (98.56%)
src/btree-set.bench.ts > BTreeSet mutations > getOrCreateIndex pattern (old): add() loop after sort	📈 view plot 🚷 view threshold	2,456.65 ops/s (-1.06%) Baseline: 2,482.99 ops/s	2,424.31 ops/s (98.68%)
src/size-of-value.bench.ts > getSizeOfValue performance > arrays > large array (100 items)	📈 view plot 🚷 view threshold	576,294.11 ops/s (+0.10%) Baseline: 575,692.60 ops/s	574,090.67 ops/s (99.62%)
src/size-of-value.bench.ts > getSizeOfValue performance > arrays > small array (10 items)	📈 view plot 🚷 view threshold	4,490,111.48 ops/s (-0.19%) Baseline: 4,498,764.40 ops/s	4,477,724.45 ops/s (99.72%)
src/size-of-value.bench.ts > getSizeOfValue performance > datasets > large dataset (100x512B)	📈 view plot 🚷 view threshold	45,799.40 ops/s (-0.39%) Baseline: 45,979.68 ops/s	45,591.75 ops/s (99.55%)
src/size-of-value.bench.ts > getSizeOfValue performance > datasets > small dataset (10x256B)	📈 view plot 🚷 view threshold	452,147.73 ops/s (-0.45%) Baseline: 454,198.00 ops/s	447,902.68 ops/s (99.06%)
src/size-of-value.bench.ts > getSizeOfValue performance > objects > nested object	📈 view plot 🚷 view threshold	3,119,188.51 ops/s (+0.10%) Baseline: 3,116,223.98 ops/s	3,087,055.28 ops/s (98.97%)
src/size-of-value.bench.ts > getSizeOfValue performance > objects > structured object (1KB)	📈 view plot 🚷 view threshold	4,744,512.47 ops/s (-0.21%) Baseline: 4,754,704.18 ops/s	4,718,445.45 ops/s (99.45%)
src/size-of-value.bench.ts > getSizeOfValue performance > objects > structured object (256B)	📈 view plot 🚷 view threshold	4,756,584.24 ops/s (-0.00%) Baseline: 4,756,699.57 ops/s	4,711,581.79 ops/s (99.05%)
src/size-of-value.bench.ts > getSizeOfValue performance > primitives > boolean	📈 view plot 🚷 view threshold	137,342,015.01 ops/s (+0.18%) Baseline: 137,096,754.32 ops/s	135,401,138.60 ops/s (98.59%)
src/size-of-value.bench.ts > getSizeOfValue performance > primitives > integer	📈 view plot 🚷 view threshold	98,940,969.76 ops/s (+0.06%) Baseline: 98,885,423.73 ops/s	98,398,288.33 ops/s (99.45%)
src/size-of-value.bench.ts > getSizeOfValue performance > primitives > null	📈 view plot 🚷 view threshold	120,327,401.88 ops/s (+1.45%) Baseline: 118,610,613.81 ops/s	115,844,940.70 ops/s (96.27%)
src/size-of-value.bench.ts > getSizeOfValue performance > primitives > string (100 chars)	📈 view plot 🚷 view threshold	643,660.69 ops/s (-0.58%) Baseline: 647,399.37 ops/s	634,961.39 ops/s (98.65%)
src/tdigest.bench.ts > TDigest Benchmarks > add	📈 view plot 🚷 view threshold	1.39 ops/s (-3.72%) Baseline: 1.44 ops/s	1.38 ops/s (99.30%)
src/tdigest.bench.ts > TDigest Benchmarks > addCentroid	📈 view plot 🚷 view threshold	1.40 ops/s (+1.17%) Baseline: 1.39 ops/s	1.34 ops/s (95.26%)
src/tdigest.bench.ts > TDigest Benchmarks > addCentroidList	📈 view plot 🚷 view threshold	1.40 ops/s (+0.77%) Baseline: 1.39 ops/s	1.35 ops/s (96.01%)
src/tdigest.bench.ts > TDigest Benchmarks > merge > addCentroid	📈 view plot 🚷 view threshold	12,147.45 ops/s (-0.60%) Baseline: 12,220.91 ops/s	11,923.18 ops/s (98.15%)
src/tdigest.bench.ts > TDigest Benchmarks > merge > merge	📈 view plot 🚷 view threshold	14,491.65 ops/s (-1.08%) Baseline: 14,650.12 ops/s	14,353.06 ops/s (99.04%)
src/tdigest.bench.ts > TDigest Benchmarks > quantile	📈 view plot 🚷 view threshold	1.46 ops/s (-0.39%) Baseline: 1.47 ops/s	1.38 ops/s (94.59%)

🐰 View full continuous benchmarking report in Bencher

github-actions · 2026-05-15T16:49:32Z

Bencher Report

Branch	mlaw/penalize-flip-cost
Testbed	self-hosted-metal

⚠️ WARNING: Truncated view!
The full continuous benchmarking report exceeds the maximum length allowed on this platform.

🐰 View full continuous benchmarking report in Bencher

tantaman added 3 commits May 15, 2026 12:07

penalize flip? idk idk

7d52460

(cherry picked from commit ab50e52)

tweak of planner-join to fit planner tests??

f76c075

(cherry picked from commit c6b35ba)

remove opt md

02070ef

vercel Bot deployed to Preview – zbugs May 15, 2026 16:10 View deployment

vercel Bot deployed to Preview – replicache-docs May 15, 2026 16:11 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(zql): penalize flip in the planner's cost model#5992

chore(zql): penalize flip in the planner's cost model#5992
tantaman wants to merge 3 commits into
mainfrom
mlaw/penalize-flip-cost

tantaman commented May 15, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

⚠️ WARNING: Truncated view!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tantaman commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measured impact

Cost model details

Test impact

Integration tests

Depends on

Test plan

Follow-ups

Uh oh!

vercel Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

⚠️ WARNING: Truncated view!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tantaman commented May 15, 2026 •

edited

Loading

vercel Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading