bench(fullhistory): 2026-05-21 cross-machine results report#750
Merged
Conversation
Markdown report covering the cross-machine bench run captured under
gs://rpc-full-history/benchmarks/{c6id.2xlarge,c6id.4xlarge,c6id.8xlarge,im4gn.4xlarge}-2026-05-21*.
Tables + Mermaid xychart-beta blocks for: peak read throughput,
worker scaling (cold and hot n=1), tx-page page-size sweep,
xdr-views vs round-trip on tx-hash + events-ingest, per-ledger ingest,
bulk ingest, cold-vs-hot speedup, and x86 vs Graviton2 at matched vCPU.
Source per-iter CSVs and the summary CSVs that back every table here
live at gs://rpc-full-history/benchmarks/_summary/.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… report New section 11 transposes the cross-machine tables: one consolidated table per machine (c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge) listing every bench result — full ledger grid sweep, tx-page, tx-hash (hit/miss × xdrviews/roundtrip), per-ledger ingest, and bulk ingest — with p50/p90/p99 and throughput. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New Section 2 ("Internal vs production RPC providers") includes the
prior black-box benchmark across 4–6 production RPC providers and
juxtaposes their p50s with the internal hot/cold tiers. Adds a
Mermaid bar/line chart of the per-workload speedups. Remaining
sections renumbered 3–12.
Headline: hot/cold full-history is 10×–1773× faster than the average
production RPC across ledger-point, ledger-range, tx-page, tx-hash,
and the four event-filter scenarios. Note: 'onfinality' and 'sorobanrpc'
are absent from tx-hash and events workloads (n=4 instead of 6).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colons in x-axis labels (ev:nofilt, ev:contract, ev:topic, ev:both) break Mermaid's xychart-beta parser. Replaced with hyphens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
leighmcculloch
approved these changes
May 27, 2026
New-data-only report over the 2026-06-03 runs (4 machines) on the rewritten rpc-hack bench harness. Notes methodology changes vs 2026-05-21: ops/s is no longer comparable across runs (only single-in-flight p50 latency is), the sweep axis is now query-concurrency 1-16, and ledger/tx-page/tx-hash read coverage narrowed while events query + ingest stage detail broadened. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Condensed two-table view (typical p50 latency + peak throughput) with a full glossary defining every row, column, tier, and variable (n, page, c, p50/p99, ops/s). Links back to the full cross-machine report. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds Table 3 (ingest throughput: hot-ingest ledgers/s, build-txhash-index keys/s) and Table 4 (per-stage ingest cost), plus glossary entries for the ingest workloads and ledgers/s, keys/s, and stage terms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cold-ingest ledgers/s computed as sum(chunk_wall) / chunk-workers (upper-bound estimate, since the harness records summed per-chunk wall, not true end-to-end wall). Flagged as an estimate; scales with --chunk-workers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1 report Source/summary CSV paths were missing the dated prefix (data lives under .../benchmarks/2026-05-21/, the undated paths don't exist). Also dates the title and forward-links the 2026-06-03 run, noting the harness changed and ops/s is not comparable across runs. Historical 5/21 numbers are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tamirms
reviewed
Jun 3, 2026
tamirms
reviewed
Jun 3, 2026
tamirms
reviewed
Jun 3, 2026
tamirms
reviewed
Jun 3, 2026
Drives the full read + ingest bench suite in bench-fullhistory: builds the binary once, then runs cold+hot ledgers/txpage/txhash/events read benches (each a 1,4,8,16 query-concurrency sweep) plus the hot-ingest, cold-ingest, and build-txhash-index ingest benches. By default the reads use prebuilt fixtures and ingest writes to scratch (independent measurements). INGEST_FIRST=1 instead ingests first and repoints every read bench at the freshly-ingested stores, so the suite is self-contained from a single raw-ledger packfile seed — usable on a fresh machine with no prebuilt data. Paths/sizing knobs are env- overridable for running across different machines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR #750 review (tamirms) flagged two harness gaps and several execution issues. Code fixes: - txpage (hot+cold) previously only touched TransactionHash + ResultPair — it never fetched the page contents, so it measured a tx *count*, not a getTransactions response. New walkPageMaterialize (tx_page_helpers.go) builds a full db.Transaction per tx in the page (envelope, result, meta, events, hash, application order, ledger info). - txpage (hot+cold) had no --xdr-views flag, so it only measured the slow full-decode path. Added --xdr-views with a single-pass view materializer, mirroring the txhash bench. CSVs suffix -roundtrip / -xdrviews; detail column scan_ns -> materialize_ns (decode_ns stays 0 under views). Execution (run-all-benches.sh): - Run the decode-heavy query benches (txpage/txhash/events) once per mode (QUERY_VIEW_MODES = roundtrip + xdrviews) so the report can compare with/ without XDR views. Previously every query ran views-off (slow path). - Events use the worst-case query (EVENTS_BUCKETS=15, max filters/request). - Ingest runs with --parallel; hot-ingest runs both xdr-views on and off (the views run feeds the reads, the parsed run is kept for its CSVs). Smoke-tested: 0 errors, pages fully materialized; views 4-8x faster than round-trip (decode_ns=0 confirms the path dispatch). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
) Re-ran c6id.8xlarge with the corrected harness and rewrote the report to address the PR #750 review: - New "c6id.8xlarge — corrected" section: query latency split into hot/cold tables with roundtrip vs xdr-views columns and P50+P99; events use worst-case K=15; ingest shown hot (parsed vs view, --parallel) and cold with the per-stage phase breakdown + per-ledger driver total. - The other three machines (2xlarge/4xlarge/im4gn) are marked STALE (old harness: tx-page-as-count, views-off) pending a re-run. - Dropped the per-machine raw-cell dump (§12) — the CSVs are on GCS. - Summary table: same treatment (banner, corrected c6id.8xlarge rows, stale markers on the rest). Headline corrected numbers: xdr-views cuts tx-page/tx-hash p50 4-9x (hot tx-hash 10.6->1.2ms) and lifts peak throughput 5-10x (hot tx-hash 706->7253 ops/s); events is decode-insensitive (1.1-1.4x). Hot ingest with views is ~2.1x faster than parsed (skips the 8.4ms/ledger UnmarshalBinary). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ply-load Adds an `lcm` ledger source and an apply-load-gen.sh driver so the bench-fullhistory suite can run on fully synthetic, density-controlled data instead of real pubnet chunks. - sources.go: new --source=lcm reader over apply-load's framed-XDR METADATA_OUTPUT_STREAM. Skips setup ledgers (<= --lcm-checkpoint) and decode-free frame-skips to each chunk's 10k-ledger block; reuses the entire cold-ingest/hot-ingest/build-txhash-index pipeline. Wired --lcm-file/ --lcm-checkpoint flags into both ingest commands. - apply-load-gen.sh: drives stellar-core new-db/new-hist/apply-load -> meta.xdr -> cold-ingest --source=lcm -> packfiles -> build-txhash-index. Profiles map to apply-load model txs + target TPS: sac (~10k), token/oz (~9k custom_token), soroswap (~2.5k). Uses the installed core's protocol. - lcm_source_test.go: unit-tests setup-skip, chunk-block mapping, short-read. - README: documents the lcm source, the driver, profiles, BUILD_TESTS requirement, and the real cost of full 10k-ledger chunks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4-machine set All four machines now have corrected-harness (PR #750, b712b86) runs in GCS, so this drops the stale/pending framing and regenerates both docs from the complete set. Incorporates the PR #750 review: - query benches show roundtrip vs xdr-views side by side, with p50 AND p99 - hot and cold presented as separate tables - events uses the worst-case query (15 filters) - ingest: hot --parallel in both modes (views on/off) with per-ledger total + per-stage breakdown; cold per-stage + throughput - per-machine raw-results dump omitted (raw CSVs live on GCS) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ew format) Keeps the compact cross-machine p50 grids as an overview and adds the per-machine stage-row × p50/p90/p99/max tables with run-context headers (chunk, ledger count, --parallel --xdr-views, source, end-to-end wall) that PR #750 review (r3351681282) laid out — for hot and cold ingest, all four machines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-core apply-load" This reverts commit a8c8295.
Documents the suite driver (run-all-benches.sh), the roundtrip vs xdr-views decode paths for query benches, txpage full-page materialization + --page-size, the events --buckets flag, and points to the results/ reports. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tamirms
reviewed
Jun 4, 2026
tamirms
reviewed
Jun 4, 2026
tamirms
reviewed
Jun 4, 2026
tamirms
approved these changes
Jun 4, 2026
…s + report wording
- tx_page_helpers/tx_hash_helpers: materializePageRangeView gathered envelopes
via a per-element envAt() that restarts the V1/V2 GeneralizedTransactionSet
walk at index 0 each call (O(page²)). Replace with single-pass range
collectors (collectEnvelopeRange{FromV0TxSet,FromGeneralized}) that walk the
TxSet once. Matters at large --page-size; page=20 numbers unchanged. Added
TestMaterializePageRangeViewMatchesRoundtrip (view vs roundtrip, non-zero
windows on V1 set).
- report: reword the xdr-views ingest saving (~80% lcm_decode + ~20% per-event
UnmarshalView in fan_out) and the events cold/hot speedup (fixed per-event
decode as a proportion of each tier's baseline), per reviewer suggestions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md— a Markdown report comparing bench runs across four AWS instance types (c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge) on identical data (chunks 5859–5999 cold, chunk 5000 hot, chunk 5999 for ingest).xychart-betablock where a chart adds clarity. Source per-iter CSVs and the summary CSVs that back every table live atgs://rpc-full-history/benchmarks/_summary/andgs://rpc-full-history/benchmarks/<machine-dir>/.Test plan
gs://rpc-full-history/benchmarks/_summary/.🤖 Generated with Claude Code