Skip to content

bench(fullhistory): 2026-05-21 cross-machine results report#750

Merged
chowbao merged 20 commits into
rpc-hackfrom
bench/cross-machine-report-2026-05-21
Jun 4, 2026
Merged

bench(fullhistory): 2026-05-21 cross-machine results report#750
chowbao merged 20 commits into
rpc-hackfrom
bench/cross-machine-report-2026-05-21

Conversation

@chowbao

@chowbao chowbao commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-05-21-cross-machine.md — a Markdown report comparing bench runs across four AWS instance types (c6id.2xlarge, c6id.4xlarge, c6id.8xlarge, im4gn.4xlarge) on identical data (chunks 5859–5999 cold, chunk 5000 hot, chunk 5999 for ingest).
  • Sections: machine inventory, peak read throughput, worker scaling, tx-page page-size sweep, tx-hash xdr-views vs round-trip, per-ledger and bulk ingest, cold-vs-hot speedup, x86 vs Graviton2 at matched vCPU.
  • Each table is paired with a Mermaid xychart-beta block where a chart adds clarity. Source per-iter CSVs and the summary CSVs that back every table live at gs://rpc-full-history/benchmarks/_summary/ and gs://rpc-full-history/benchmarks/<machine-dir>/.

Test plan

  • Open the file on github.com (or any Mermaid-rendering Markdown viewer) and confirm tables + xychart blocks render.
  • Optionally cross-check headline numbers against the source CSVs at gs://rpc-full-history/benchmarks/_summary/.

🤖 Generated with Claude Code

chowbao and others added 4 commits May 21, 2026 19:37
Markdown report covering the cross-machine bench run captured under
gs://rpc-full-history/benchmarks/{c6id.2xlarge,c6id.4xlarge,c6id.8xlarge,im4gn.4xlarge}-2026-05-21*.
Tables + Mermaid xychart-beta blocks for: peak read throughput,
worker scaling (cold and hot n=1), tx-page page-size sweep,
xdr-views vs round-trip on tx-hash + events-ingest, per-ledger ingest,
bulk ingest, cold-vs-hot speedup, and x86 vs Graviton2 at matched vCPU.

Source per-iter CSVs and the summary CSVs that back every table here
live at gs://rpc-full-history/benchmarks/_summary/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… report

New section 11 transposes the cross-machine tables: one consolidated
table per machine (c6id.2xlarge, c6id.4xlarge, c6id.8xlarge,
im4gn.4xlarge) listing every bench result — full ledger grid sweep,
tx-page, tx-hash (hit/miss × xdrviews/roundtrip), per-ledger ingest,
and bulk ingest — with p50/p90/p99 and throughput.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New Section 2 ("Internal vs production RPC providers") includes the
prior black-box benchmark across 4–6 production RPC providers and
juxtaposes their p50s with the internal hot/cold tiers. Adds a
Mermaid bar/line chart of the per-workload speedups. Remaining
sections renumbered 3–12.

Headline: hot/cold full-history is 10×–1773× faster than the average
production RPC across ledger-point, ledger-range, tx-page, tx-hash,
and the four event-filter scenarios. Note: 'onfinality' and 'sorobanrpc'
are absent from tx-hash and events workloads (n=4 instead of 6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Colons in x-axis labels (ev:nofilt, ev:contract, ev:topic, ev:both)
break Mermaid's xychart-beta parser. Replaced with hyphens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simon Chow and others added 7 commits June 3, 2026 17:58
New-data-only report over the 2026-06-03 runs (4 machines) on the rewritten
rpc-hack bench harness. Notes methodology changes vs 2026-05-21: ops/s is no
longer comparable across runs (only single-in-flight p50 latency is), the
sweep axis is now query-concurrency 1-16, and ledger/tx-page/tx-hash read
coverage narrowed while events query + ingest stage detail broadened.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Condensed two-table view (typical p50 latency + peak throughput) with a full
glossary defining every row, column, tier, and variable (n, page, c, p50/p99,
ops/s). Links back to the full cross-machine report.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds Table 3 (ingest throughput: hot-ingest ledgers/s, build-txhash-index
keys/s) and Table 4 (per-stage ingest cost), plus glossary entries for the
ingest workloads and ledgers/s, keys/s, and stage terms.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cold-ingest ledgers/s computed as sum(chunk_wall) / chunk-workers (upper-bound
estimate, since the harness records summed per-chunk wall, not true end-to-end
wall). Flagged as an estimate; scales with --chunk-workers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1 report

Source/summary CSV paths were missing the dated prefix (data lives under
.../benchmarks/2026-05-21/, the undated paths don't exist). Also dates the title
and forward-links the 2026-06-03 run, noting the harness changed and ops/s is
not comparable across runs. Historical 5/21 numbers are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md Outdated
Comment thread cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md Outdated
Comment thread cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md Outdated
Comment thread cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md Outdated
Simon Chow and others added 8 commits June 3, 2026 21:16
Drives the full read + ingest bench suite in bench-fullhistory: builds
the binary once, then runs cold+hot ledgers/txpage/txhash/events read
benches (each a 1,4,8,16 query-concurrency sweep) plus the hot-ingest,
cold-ingest, and build-txhash-index ingest benches.

By default the reads use prebuilt fixtures and ingest writes to scratch
(independent measurements). INGEST_FIRST=1 instead ingests first and
repoints every read bench at the freshly-ingested stores, so the suite
is self-contained from a single raw-ledger packfile seed — usable on a
fresh machine with no prebuilt data. Paths/sizing knobs are env-
overridable for running across different machines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR #750 review (tamirms) flagged two harness gaps and several execution
issues. Code fixes:

- txpage (hot+cold) previously only touched TransactionHash + ResultPair —
  it never fetched the page contents, so it measured a tx *count*, not a
  getTransactions response. New walkPageMaterialize (tx_page_helpers.go)
  builds a full db.Transaction per tx in the page (envelope, result, meta,
  events, hash, application order, ledger info).
- txpage (hot+cold) had no --xdr-views flag, so it only measured the slow
  full-decode path. Added --xdr-views with a single-pass view materializer,
  mirroring the txhash bench. CSVs suffix -roundtrip / -xdrviews; detail
  column scan_ns -> materialize_ns (decode_ns stays 0 under views).

Execution (run-all-benches.sh):

- Run the decode-heavy query benches (txpage/txhash/events) once per mode
  (QUERY_VIEW_MODES = roundtrip + xdrviews) so the report can compare with/
  without XDR views. Previously every query ran views-off (slow path).
- Events use the worst-case query (EVENTS_BUCKETS=15, max filters/request).
- Ingest runs with --parallel; hot-ingest runs both xdr-views on and off
  (the views run feeds the reads, the parsed run is kept for its CSVs).

Smoke-tested: 0 errors, pages fully materialized; views 4-8x faster than
round-trip (decode_ns=0 confirms the path dispatch).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
)

Re-ran c6id.8xlarge with the corrected harness and rewrote the report to
address the PR #750 review:

- New "c6id.8xlarge — corrected" section: query latency split into hot/cold
  tables with roundtrip vs xdr-views columns and P50+P99; events use
  worst-case K=15; ingest shown hot (parsed vs view, --parallel) and cold
  with the per-stage phase breakdown + per-ledger driver total.
- The other three machines (2xlarge/4xlarge/im4gn) are marked STALE (old
  harness: tx-page-as-count, views-off) pending a re-run.
- Dropped the per-machine raw-cell dump (§12) — the CSVs are on GCS.
- Summary table: same treatment (banner, corrected c6id.8xlarge rows, stale
  markers on the rest).

Headline corrected numbers: xdr-views cuts tx-page/tx-hash p50 4-9x (hot
tx-hash 10.6->1.2ms) and lifts peak throughput 5-10x (hot tx-hash 706->7253
ops/s); events is decode-insensitive (1.1-1.4x). Hot ingest with views is
~2.1x faster than parsed (skips the 8.4ms/ledger UnmarshalBinary).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ply-load

Adds an `lcm` ledger source and an apply-load-gen.sh driver so the
bench-fullhistory suite can run on fully synthetic, density-controlled data
instead of real pubnet chunks.

- sources.go: new --source=lcm reader over apply-load's framed-XDR
  METADATA_OUTPUT_STREAM. Skips setup ledgers (<= --lcm-checkpoint) and
  decode-free frame-skips to each chunk's 10k-ledger block; reuses the entire
  cold-ingest/hot-ingest/build-txhash-index pipeline. Wired --lcm-file/
  --lcm-checkpoint flags into both ingest commands.
- apply-load-gen.sh: drives stellar-core new-db/new-hist/apply-load ->
  meta.xdr -> cold-ingest --source=lcm -> packfiles -> build-txhash-index.
  Profiles map to apply-load model txs + target TPS: sac (~10k), token/oz
  (~9k custom_token), soroswap (~2.5k). Uses the installed core's protocol.
- lcm_source_test.go: unit-tests setup-skip, chunk-block mapping, short-read.
- README: documents the lcm source, the driver, profiles, BUILD_TESTS
  requirement, and the real cost of full 10k-ledger chunks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4-machine set

All four machines now have corrected-harness (PR #750, b712b86) runs in GCS, so
this drops the stale/pending framing and regenerates both docs from the complete
set. Incorporates the PR #750 review:

- query benches show roundtrip vs xdr-views side by side, with p50 AND p99
- hot and cold presented as separate tables
- events uses the worst-case query (15 filters)
- ingest: hot --parallel in both modes (views on/off) with per-ledger total +
  per-stage breakdown; cold per-stage + throughput
- per-machine raw-results dump omitted (raw CSVs live on GCS)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ew format)

Keeps the compact cross-machine p50 grids as an overview and adds the per-machine
stage-row × p50/p90/p99/max tables with run-context headers (chunk, ledger count,
--parallel --xdr-views, source, end-to-end wall) that PR #750 review (r3351681282)
laid out — for hot and cold ingest, all four machines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documents the suite driver (run-all-benches.sh), the roundtrip vs xdr-views
decode paths for query benches, txpage full-page materialization + --page-size,
the events --buckets flag, and points to the results/ reports.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md Outdated
Comment thread cmd/stellar-rpc/scripts/bench-fullhistory/results/2026-06-03-cross-machine.md Outdated
Comment thread cmd/stellar-rpc/scripts/bench-fullhistory/tx_page_helpers.go Outdated
…s + report wording

- tx_page_helpers/tx_hash_helpers: materializePageRangeView gathered envelopes
  via a per-element envAt() that restarts the V1/V2 GeneralizedTransactionSet
  walk at index 0 each call (O(page²)). Replace with single-pass range
  collectors (collectEnvelopeRange{FromV0TxSet,FromGeneralized}) that walk the
  TxSet once. Matters at large --page-size; page=20 numbers unchanged. Added
  TestMaterializePageRangeViewMatchesRoundtrip (view vs roundtrip, non-zero
  windows on V1 set).
- report: reword the xdr-views ingest saving (~80% lcm_decode + ~20% per-event
  UnmarshalView in fan_out) and the events cold/hot speedup (fixed per-event
  decode as a proportion of each tier's baseline), per reviewer suggestions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@chowbao chowbao merged commit 31efe8b into rpc-hack Jun 4, 2026
13 of 15 checks passed
@chowbao chowbao deleted the bench/cross-machine-report-2026-05-21 branch June 4, 2026 15:58
@tamirms tamirms mentioned this pull request Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants