bench(fullhistory): 2026-05-21 cross-machine results report by chowbao · Pull Request #750 · stellar/stellar-rpc

chowbao · 2026-05-21T19:40:17Z

Summary

Adds cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md — a Markdown report comparing bench runs across four AWS instance types (c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge) on identical data (chunks 5859–5999 cold, chunk 5000 hot, chunk 5999 for ingest).
Sections: machine inventory, peak read throughput, worker scaling, tx-page page-size sweep, tx-hash xdr-views vs round-trip, per-ledger and bulk ingest, cold-vs-hot speedup, x86 vs Graviton2 at matched vCPU.
Each table is paired with a Mermaid xychart-beta block where a chart adds clarity. Source per-iter CSVs and the summary CSVs that back every table live at gs://rpc-full-history/benchmarks/_summary/ and gs://rpc-full-history/benchmarks/<machine-dir>/.

Test plan

Open the file on github.com (or any Mermaid-rendering Markdown viewer) and confirm tables + xychart blocks render.
Optionally cross-check headline numbers against the source CSVs at gs://rpc-full-history/benchmarks/_summary/.

🤖 Generated with Claude Code

Markdown report covering the cross-machine bench run captured under gs://rpc-full-history/benchmarks/{c6id.2xlarge,c6id.4xlarge,c6id.8xlarge,im4gn.4xlarge}-2026-05-21*. Tables + Mermaid xychart-beta blocks for: peak read throughput, worker scaling (cold and hot n=1), tx-page page-size sweep, xdr-views vs round-trip on tx-hash + events-ingest, per-ledger ingest, bulk ingest, cold-vs-hot speedup, and x86 vs Graviton2 at matched vCPU. Source per-iter CSVs and the summary CSVs that back every table here live at gs://rpc-full-history/benchmarks/_summary/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… report New section 11 transposes the cross-machine tables: one consolidated table per machine (c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge) listing every bench result — full ledger grid sweep, tx-page, tx-hash (hit/miss × xdrviews/roundtrip), per-ledger ingest, and bulk ingest — with p50/p90/p99 and throughput. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New Section 2 ("Internal vs production RPC providers") includes the prior black-box benchmark across 4–6 production RPC providers and juxtaposes their p50s with the internal hot/cold tiers. Adds a Mermaid bar/line chart of the per-workload speedups. Remaining sections renumbered 3–12. Headline: hot/cold full-history is 10×–1773× faster than the average production RPC across ledger-point, ledger-range, tx-page, tx-hash, and the four event-filter scenarios. Note: 'onfinality' and 'sorobanrpc' are absent from tx-hash and events workloads (n=4 instead of 6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Colons in x-axis labels (ev:nofilt, ev:contract, ev:topic, ev:both) break Mermaid's xychart-beta parser. Replaced with hyphens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New-data-only report over the 2026-06-03 runs (4 machines) on the rewritten rpc-hack bench harness. Notes methodology changes vs 2026-05-21: ops/s is no longer comparable across runs (only single-in-flight p50 latency is), the sweep axis is now query-concurrency 1-16, and ledger/tx-page/tx-hash read coverage narrowed while events query + ingest stage detail broadened. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Condensed two-table view (typical p50 latency + peak throughput) with a full glossary defining every row, column, tier, and variable (n, page, c, p50/p99, ops/s). Links back to the full cross-machine report. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds Table 3 (ingest throughput: hot-ingest ledgers/s, build-txhash-index keys/s) and Table 4 (per-stage ingest cost), plus glossary entries for the ingest workloads and ledgers/s, keys/s, and stage terms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cold-ingest ledgers/s computed as sum(chunk_wall) / chunk-workers (upper-bound estimate, since the harness records summed per-chunk wall, not true end-to-end wall). Flagged as an estimate; scales with --chunk-workers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…1 report Source/summary CSV paths were missing the dated prefix (data lives under .../benchmarks/2026-05-21/, the undated paths don't exist). Also dates the title and forward-links the 2026-06-03 run, noting the harness changed and ops/s is not comparable across runs. Historical 5/21 numbers are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Drives the full read + ingest bench suite in bench-fullhistory: builds the binary once, then runs cold+hot ledgers/txpage/txhash/events read benches (each a 1,4,8,16 query-concurrency sweep) plus the hot-ingest, cold-ingest, and build-txhash-index ingest benches. By default the reads use prebuilt fixtures and ingest writes to scratch (independent measurements). INGEST_FIRST=1 instead ingests first and repoints every read bench at the freshly-ingested stores, so the suite is self-contained from a single raw-ledger packfile seed — usable on a fresh machine with no prebuilt data. Paths/sizing knobs are env- overridable for running across different machines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

PR #750 review (tamirms) flagged two harness gaps and several execution issues. Code fixes: - txpage (hot+cold) previously only touched TransactionHash + ResultPair — it never fetched the page contents, so it measured a tx *count*, not a getTransactions response. New walkPageMaterialize (tx_page_helpers.go) builds a full db.Transaction per tx in the page (envelope, result, meta, events, hash, application order, ledger info). - txpage (hot+cold) had no --xdr-views flag, so it only measured the slow full-decode path. Added --xdr-views with a single-pass view materializer, mirroring the txhash bench. CSVs suffix -roundtrip / -xdrviews; detail column scan_ns -> materialize_ns (decode_ns stays 0 under views). Execution (run-all-benches.sh): - Run the decode-heavy query benches (txpage/txhash/events) once per mode (QUERY_VIEW_MODES = roundtrip + xdrviews) so the report can compare with/ without XDR views. Previously every query ran views-off (slow path). - Events use the worst-case query (EVENTS_BUCKETS=15, max filters/request). - Ingest runs with --parallel; hot-ingest runs both xdr-views on and off (the views run feeds the reads, the parsed run is kept for its CSVs). Smoke-tested: 0 errors, pages fully materialized; views 4-8x faster than round-trip (decode_ns=0 confirms the path dispatch). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

) Re-ran c6id.8xlarge with the corrected harness and rewrote the report to address the PR #750 review: - New "c6id.8xlarge — corrected" section: query latency split into hot/cold tables with roundtrip vs xdr-views columns and P50+P99; events use worst-case K=15; ingest shown hot (parsed vs view, --parallel) and cold with the per-stage phase breakdown + per-ledger driver total. - The other three machines (2xlarge/4xlarge/im4gn) are marked STALE (old harness: tx-page-as-count, views-off) pending a re-run. - Dropped the per-machine raw-cell dump (§12) — the CSVs are on GCS. - Summary table: same treatment (banner, corrected c6id.8xlarge rows, stale markers on the rest). Headline corrected numbers: xdr-views cuts tx-page/tx-hash p50 4-9x (hot tx-hash 10.6->1.2ms) and lifts peak throughput 5-10x (hot tx-hash 706->7253 ops/s); events is decode-insensitive (1.1-1.4x). Hot ingest with views is ~2.1x faster than parsed (skips the 8.4ms/ledger UnmarshalBinary). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ply-load Adds an `lcm` ledger source and an apply-load-gen.sh driver so the bench-fullhistory suite can run on fully synthetic, density-controlled data instead of real pubnet chunks. - sources.go: new --source=lcm reader over apply-load's framed-XDR METADATA_OUTPUT_STREAM. Skips setup ledgers (<= --lcm-checkpoint) and decode-free frame-skips to each chunk's 10k-ledger block; reuses the entire cold-ingest/hot-ingest/build-txhash-index pipeline. Wired --lcm-file/ --lcm-checkpoint flags into both ingest commands. - apply-load-gen.sh: drives stellar-core new-db/new-hist/apply-load -> meta.xdr -> cold-ingest --source=lcm -> packfiles -> build-txhash-index. Profiles map to apply-load model txs + target TPS: sac (~10k), token/oz (~9k custom_token), soroswap (~2.5k). Uses the installed core's protocol. - lcm_source_test.go: unit-tests setup-skip, chunk-block mapping, short-read. - README: documents the lcm source, the driver, profiles, BUILD_TESTS requirement, and the real cost of full 10k-ledger chunks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…4-machine set All four machines now have corrected-harness (PR #750, b712b86) runs in GCS, so this drops the stale/pending framing and regenerates both docs from the complete set. Incorporates the PR #750 review: - query benches show roundtrip vs xdr-views side by side, with p50 AND p99 - hot and cold presented as separate tables - events uses the worst-case query (15 filters) - ingest: hot --parallel in both modes (views on/off) with per-ledger total + per-stage breakdown; cold per-stage + throughput - per-machine raw-results dump omitted (raw CSVs live on GCS) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ew format) Keeps the compact cross-machine p50 grids as an overview and adds the per-machine stage-row × p50/p90/p99/max tables with run-context headers (chunk, ledger count, --parallel --xdr-views, source, end-to-end wall) that PR #750 review (r3351681282) laid out — for hot and cold ingest, all four machines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-core apply-load" This reverts commit a8c8295.

Documents the suite driver (run-all-benches.sh), the roundtrip vs xdr-views decode paths for query benches, txpage full-page materialization + --page-size, the events --buckets flag, and points to the results/ reports. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s + report wording - tx_page_helpers/tx_hash_helpers: materializePageRangeView gathered envelopes via a per-element envAt() that restarts the V1/V2 GeneralizedTransactionSet walk at index 0 each call (O(page²)). Replace with single-pass range collectors (collectEnvelopeRange{FromV0TxSet,FromGeneralized}) that walk the TxSet once. Matters at large --page-size; page=20 numbers unchanged. Added TestMaterializePageRangeViewMatchesRoundtrip (view vs roundtrip, non-zero windows on V1 set). - report: reword the xdr-views ingest saving (~80% lcm_decode + ~20% per-event UnmarshalView in fan_out) and the events cold/hot speedup (fixed per-event decode as a proportion of each tier's baseline), per reviewer suggestions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chowbao and others added 4 commits May 21, 2026 19:37

bench(fullhistory): fix Mermaid parse error in Section 2 RPC chart

ef9351b

Colons in x-axis labels (ev:nofilt, ev:contract, ev:topic, ev:both) break Mermaid's xychart-beta parser. Replaced with hyphens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

leighmcculloch approved these changes May 27, 2026

View reviewed changes

Simon Chow and others added 7 commits June 3, 2026 17:58

Merge branch 'rpc-hack' into bench/cross-machine-report-2026-05-21

02c4122

bench(fullhistory): add per-machine RAM to summary-table glossary

e542ee9

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tamirms reviewed Jun 3, 2026

View reviewed changes