Skip to content

fix(l1): charge snap serving budget on every lookup attempt#6957

Open
ElFantasma wants to merge 5 commits into
mainfrom
fix/l1-charge-snap-serving-budget
Open

fix(l1): charge snap serving budget on every lookup attempt#6957
ElFantasma wants to merge 5 commits into
mainfrom
fix/l1-charge-snap-serving-budget

Conversation

@ElFantasma

Copy link
Copy Markdown
Contributor

Motivation

The snap serving handlers (GetByteCodes, GetStorageRanges, GetTrieNodes) clamp the response to a byte budget, but that budget only advances when a lookup hits. A missed lookup, a duplicate, or an overlong path (which resolves to an empty node) costs nothing, so a request full of misses/empties never trips the budget and the handler walks the entire peer-supplied list (hashes / account_hashes / paths) — O(N) DB/cache probes returning a near-empty response. The per-peer rate limiter counts messages, not items, so it doesn't catch this.

Description

Introduces MIN_LOOKUP_COST (32 B, a hash's wire length) and charges it against the byte budget on every lookup attempt, regardless of hit/miss/empty, in all three handlers:

  • process_byte_codes_request — charge misses.
  • process_storage_ranges_request — charge every account attempt (covers absent accounts).
  • process_trie_nodes_request — charge at least MIN_LOOKUP_COST per pathset (covers overlong/empty-result paths).

With the 512 KiB response cap this bounds an all-miss walk to ~16K probes instead of the full list. This reuses the existing response-byte clamp mechanism; there is no reject/truncate behavior change for well-formed requests.

Adds regression tests (byte_codes_charges_missed_lookups, storage_ranges_charges_missed_accounts): a real hit placed after a miss with a 1-byte budget must not be served, proving misses now consume the budget. The existing *_clamps_response_bytes tests still pass.

Checklist

  • cargo test -p ethrex-test --test ethrex_tests snap_server (17 tests) and cargo clippy -p ethrex-p2p pass.

@ElFantasma ElFantasma requested a review from a team as a code owner July 2, 2026 21:11
@github-actions github-actions Bot added the L1 Ethereum client label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

⚠️ Known Issues — intentionally skipped tests

Source: docs/known_issues.md

Stateless blockchain EF-tests (zkevm bundle) skipped under Amsterdam v6.1.0

make -C tooling/ef_tests/blockchain test-stateless runs against the
tests-zkevm@v0.4.1 fixture bundle — currently the only published zkevm test
release. Those fixtures were filled against an older glamsterdam devnet and
re-execute every case under the for_amsterdam fork, so they lag the
glamsterdam-devnet v6.1.0 gas accounting this client now implements
(EIP-8037 / EIP-8038 / EIP-2780 / EIP-7976 / EIP-7981 …). Re-executing them
yields ~2790/2864 stale-gas failures ("Transaction execution unexpectedly
failed"), spread pervasively across every fork and through the EIP-8025 proof
suite, so there is no clean passing subset to keep.

Until a v6.1.0-aligned zkevm bundle is published, the entire bundle is skipped
for the stateless run via the fork_Amsterdam entry in the stateless-only
EXTRA_SKIPS (tooling/ef_tests/blockchain/tests/all.rs) — every test in this
Amsterdam-only bundle carries the [fork_Amsterdam-…] parametrization in its
test key. The skip is #[cfg(feature = "stateless")], so it does not touch
the non-stateless test-levm run. Coverage of these EIPs is retained by
test-levm, the engine EF-tests, and the state EF-tests, all of which execute
against the live v6.1.0 fixtures and pass.

Re-enable by removing the "fork_Amsterdam" skip once .fixtures_url_zkevm
points at a zkevm bundle filled for glamsterdam-devnet v6.1.0.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

🤖 Codex Code Review

  1. High: GetTrieNodes is still susceptible to miss-heavy request amplification. The new clamp in crates/networking/p2p/snap/server.rs only charges MIN_LOOKUP_COST once per outer pathset, but crates/storage/store.rs still iterates every storage subpath and only advances its internal budget by node.len(). For missing or overlong paths, Trie::get_node returns vec![] (crates/common/trie/trie.rs, crates/common/trie/trie.rs), so a request like [account_path, miss1, miss2, ...] can still force arbitrary trie walks while consuming only 32 bytes of budget. This means the PR fixes the amplification in GetByteCodes and GetStorageRanges, but not the analogous GetTrieNodes case. The accounting needs to happen per subpath attempt inside Store::get_trie_nodes (and a regression test should cover that path).

No other material correctness issues stood out in the diff.

I could not run the Rust tests in this sandbox because cargo/rustup tried to write under /home/runner/.rustup and failed on the read-only filesystem.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

🤖 Claude Code Review

Now I have enough to write the review.

Review: fix(l1): charge snap serving budget on every lookup attempt (PR 6957)

Summary of intent: Charge a MIN_LOOKUP_COST (32 B) against the byte budget on every lookup attempt in GetByteCodes, GetStorageRanges, and GetTrieNodes handlers, so miss-heavy/empty-result requests trip the existing byte clamp instead of walking the entire peer-supplied list unbounded. Good, well-motivated fix for process_byte_codes_request and process_storage_ranges_request, with solid regression tests proving the miss-then-hit ordering now stops the walk.

Findings

1. process_trie_nodes_request still allows unbounded amplification via a single oversized pathset (crates/networking/p2p/snap/server.rs:174-196)

The fix charges MIN_LOOKUP_COST once per outer request.paths entry (a "pathset": [acc_path, slot_path_1, slot_path_2, ...]), but each pathset can itself contain an unbounded number of storage sub-paths. The actual per-sub-path work happens inside Store::get_trie_nodes (crates/storage/store.rs:2937-2946):

for path in paths.iter().skip(1) {
    if bytes_used >= byte_limit {
        break;
    }
    let node = storage_trie.get_node(path)?;
    bytes_used += node.len() as u64;
    nodes.push(node);
}

bytes_used only advances on an actual hit. A missed/overlong storage path resolves through Trie::get_node to Ok(vec![]) (crates/common/trie/trie.rs:509,545,548, 0-byte cost), so a pathset packed with e.g. tens of thousands of bogus/missing storage paths will run this inner loop to completion — one real trie/DB probe per entry — while the outer handler in server.rs only ever deducts a single MIN_LOOKUP_COST (32 bytes) for the entire pathset once the call returns.

Given P2P_MAX_MESSAGE_SIZE/frame limits (MAX_MESSAGE_SIZE = 0xFFFFFF ≈ 16.7 MiB, crates/networking/p2p/rlpx/connection/codec.rs:26), an attacker can fit roughly 400K+ 32-byte paths into a single GetTrieNodes message inside one pathset, forcing ~400K synchronous trie/DB lookups for a budget cost of 32 bytes. This is exactly the amplification pattern the PR describes ("a request full of misses/empties never trips the budget and the handler walks the entire peer-supplied list"), except it applies to the nested path list rather than the outer one, and this PR leaves it open.

  • Failure scenario: peer sends one GetTrieNodes request with paths = [[valid_or_overlong_account_path, miss_path_1, miss_path_2, ..., miss_path_400000]], bytes: 1. The outer loop in server.rs runs once (single pathset), but get_trie_nodes internally iterates all 400,000 sub-paths performing real trie descents, none of which are charged individually.
  • Suggested fix: charge MIN_LOOKUP_COST per attempted sub-path inside Store::get_trie_nodes's inner loop (or return per-path costs to the caller so server.rs can account for them), not just once per outer pathset. Alternatively, bound paths[1..].len() per pathset before dispatching.

This is the same class of bug the PR fixes elsewhere, so it's worth confirming whether it's in scope for this PR or being tracked separately — the PR's own test-coverage note ("not given a dedicated regression test... requires a committed state trie") suggests process_trie_nodes_request wasn't exercised by a miss-heavy test, which is likely why this gap wasn't caught.

Minor observations (not blocking)

  • crates/networking/p2p/snap/server.rs:112-115: in process_storage_ranges_request, MIN_LOOKUP_COST is charged unconditionally per account, even when the account produced real slots (already charged 64 B/slot). This slightly over-charges hit-heavy responses but is conservative/safe, not a correctness bug.
  • The doc comment on MIN_LOOKUP_COST (lines 15-22) is thorough and clearly explains the rationale — good documentation for a non-obvious constant.
  • Regression tests for byte_codes_charges_missed_lookups and storage_ranges_charges_missed_accounts correctly prove the fix by placing a real hit after a miss with a 1-byte budget — a solid, minimal reproduction of the amplification bug for the two handlers they cover.

Correctness / style

  • No RLP, gas-accounting, or consensus-rule concerns — this is purely a request-serving resource-accounting change, and the byte-clamp mechanism it reuses is unchanged for well-formed/hit-heavy requests.
  • byte_budget.saturating_sub(returned_bytes.max(MIN_LOOKUP_COST)) (line 192) is a clean way to charge "actual or minimum, whichever is greater" without double-charging.

Overall: The fix is correct and well-tested for GetByteCodes and GetStorageRanges. GetTrieNodes (Item 1 above) still has an unaddressed amplification path through nested storage sub-paths within a single pathset — worth deciding whether to fix in this PR or file as a fast-follow before calling this class of bug closed.


Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Lines of code report

Total lines added: 9
Total lines removed: 0
Total lines changed: 9

Detailed view
+---------------------------------------------+-------+------+
| File                                        | Lines | Diff |
+---------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/snap/server.rs | 163   | +9   |
+---------------------------------------------+-------+------+

@greptile-apps

greptile-apps Bot commented Jul 2, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a request-amplification vulnerability in the snap protocol server where missed lookups, absent accounts, and empty-result trie pathsets consumed no budget, allowing a peer to trigger O(N) DB probes with a near-empty response that the message-rate limiter could not catch. The fix introduces a MIN_LOOKUP_COST of 32 bytes charged against the byte budget on every lookup attempt regardless of outcome.

  • process_byte_codes_request: misses now charge MIN_LOOKUP_COST; hits still charge actual code size.
  • process_storage_ranges_request: every account attempt (hit or miss) charges an additional MIN_LOOKUP_COST on top of slot bytes, bounding all-miss walks.
  • process_trie_nodes_request: each pathset is now charged max(returned_bytes, MIN_LOOKUP_COST), covering overlong/empty-result paths.

Confidence Score: 4/5

Safe to merge; the fix correctly bounds all-miss walks across all three snap handlers and the regression tests verify the key behavior change.

The core security fix is sound and well-tested for byte codes and storage ranges. The storage handler adds MIN_LOOKUP_COST on top of slot bytes for every account (not just misses), which slightly over-debits hit accounts compared to the max-based approach used in the trie-nodes handler — a minor inconsistency that reduces per-request throughput for hit-heavy storage queries but does not break correctness. The trie-nodes miss path has no dedicated regression test, leaving that leg of the fix untested at the unit level.

crates/networking/p2p/snap/server.rs — specifically the storage-ranges accounting around MIN_LOOKUP_COST for hit accounts.

Important Files Changed

Filename Overview
crates/networking/p2p/snap/server.rs Introduces MIN_LOOKUP_COST (32 B) charged per attempt in all three snap handlers. The fix is correct for misses in byte-codes and trie-nodes. The storage-ranges handler charges MIN_LOOKUP_COST additively for every account (hits AND misses), unlike the other two handlers which charge only the larger of actual vs minimum — a minor accounting inconsistency that slightly over-debits hit accounts.
test/tests/p2p/snap_server_tests.rs Adds two focused regression tests (byte_codes_charges_missed_lookups, storage_ranges_charges_missed_accounts) that correctly verify miss charging using a 1-byte budget. No new test for the trie-nodes empty-pathset path, which is acknowledged in a comment.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming snap request] --> B[Clamp budget to MAX_RESPONSE_BYTES]
    B --> C{Next hash / account / pathset}
    C --> D[DB or trie lookup]
    D --> E{Handler type}
    E --> BC[byte codes]
    BC --> BC1{Hit?}
    BC1 -- Yes --> BC2[charge actual code length]
    BC1 -- No --> BC3[charge MIN_LOOKUP_COST=32]
    E --> SR[storage ranges]
    SR --> SR1{Account found?}
    SR1 -- Yes --> SR2[charge 64 x n_slots]
    SR2 --> SR3[charge MIN_LOOKUP_COST=32 - NEW]
    SR1 -- No --> SR3
    E --> TN[trie nodes]
    TN --> TN1[compute returned_bytes]
    TN1 --> TN2[charge max of returned_bytes and MIN_LOOKUP_COST=32 - NEW]
    BC2 --> CHK{Budget exhausted?}
    BC3 --> CHK
    SR3 --> CHK
    TN2 --> CHK
    CHK -- Yes --> STOP[Return partial response]
    CHK -- No --> C
    C -- Done --> DONE[Return full response]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Incoming snap request] --> B[Clamp budget to MAX_RESPONSE_BYTES]
    B --> C{Next hash / account / pathset}
    C --> D[DB or trie lookup]
    D --> E{Handler type}
    E --> BC[byte codes]
    BC --> BC1{Hit?}
    BC1 -- Yes --> BC2[charge actual code length]
    BC1 -- No --> BC3[charge MIN_LOOKUP_COST=32]
    E --> SR[storage ranges]
    SR --> SR1{Account found?}
    SR1 -- Yes --> SR2[charge 64 x n_slots]
    SR2 --> SR3[charge MIN_LOOKUP_COST=32 - NEW]
    SR1 -- No --> SR3
    E --> TN[trie nodes]
    TN --> TN1[compute returned_bytes]
    TN1 --> TN2[charge max of returned_bytes and MIN_LOOKUP_COST=32 - NEW]
    BC2 --> CHK{Budget exhausted?}
    BC3 --> CHK
    SR3 --> CHK
    TN2 --> CHK
    CHK -- Yes --> STOP[Return partial response]
    CHK -- No --> C
    C -- Done --> DONE[Return full response]
Loading

Comments Outside Diff (2)

  1. crates/networking/p2p/snap/server.rs, line 108-116 (link)

    P2 The storage-ranges handler charges MIN_LOOKUP_COST unconditionally for every account — including those that returned slots — making the effective cost 64 × n_slots + 32. The other two handlers use a max-based approach: byte codes charges code.len() for hits and MIN_LOOKUP_COST only for misses; trie nodes charges max(returned_bytes, MIN_LOOKUP_COST). Applying the same max pattern here avoids the unintentional penalty on non-empty accounts while still protecting against all-miss walks.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/networking/p2p/snap/server.rs
    Line: 108-116
    
    Comment:
    The storage-ranges handler charges `MIN_LOOKUP_COST` unconditionally for every account — including those that returned slots — making the effective cost `64 × n_slots + 32`. The other two handlers use a max-based approach: byte codes charges `code.len()` for hits and `MIN_LOOKUP_COST` only for misses; trie nodes charges `max(returned_bytes, MIN_LOOKUP_COST)`. Applying the same `max` pattern here avoids the unintentional penalty on non-empty accounts while still protecting against all-miss walks.
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. test/tests/p2p/snap_server_tests.rs, line 1013-1017 (link)

    P2 Missing test for trie-nodes empty-pathset charging

    The comment acknowledges that process_trie_nodes_request has no miss/empty-pathset regression test. The max(returned_bytes, MIN_LOOKUP_COST) change is the most structurally different of the three fixes (it also refactors the byte accumulation from usize-fold-then-cast to a u64 fold), so an untested path here is a gap worth tracking. Consider adding a test that sends a pathset whose path exceeds 32 bytes (resolved to an empty node) followed by a real pathset with a 1-byte budget, analogous to the two new tests above.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: test/tests/p2p/snap_server_tests.rs
    Line: 1013-1017
    
    Comment:
    **Missing test for trie-nodes empty-pathset charging**
    
    The comment acknowledges that `process_trie_nodes_request` has no miss/empty-pathset regression test. The `max(returned_bytes, MIN_LOOKUP_COST)` change is the most structurally different of the three fixes (it also refactors the byte accumulation from `usize`-fold-then-cast to a `u64` fold), so an untested path here is a gap worth tracking. Consider adding a test that sends a pathset whose path exceeds 32 bytes (resolved to an empty node) followed by a real pathset with a 1-byte budget, analogous to the two new tests above.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/networking/p2p/snap/server.rs:108-116
The storage-ranges handler charges `MIN_LOOKUP_COST` unconditionally for every account — including those that returned slots — making the effective cost `64 × n_slots + 32`. The other two handlers use a max-based approach: byte codes charges `code.len()` for hits and `MIN_LOOKUP_COST` only for misses; trie nodes charges `max(returned_bytes, MIN_LOOKUP_COST)`. Applying the same `max` pattern here avoids the unintentional penalty on non-empty accounts while still protecting against all-miss walks.

```suggestion
            if !account_slots.is_empty() {
                slots.push(account_slots);
            }

            // Charge every account attempt (even a miss that returned no slots) so an
            // all-miss `account_hashes` list trips the budget instead of opening the storage
            // trie once per entry.  Use max(actual, MIN_LOOKUP_COST) — consistent with the
            // trie-nodes handler — so hit accounts are not over-penalised.
            let account_bytes = (account_slots.len() as u64) * 64;
            bytes_used += account_bytes.max(MIN_LOOKUP_COST);
            if bytes_used >= byte_budget {
```

### Issue 2 of 2
test/tests/p2p/snap_server_tests.rs:1013-1017
**Missing test for trie-nodes empty-pathset charging**

The comment acknowledges that `process_trie_nodes_request` has no miss/empty-pathset regression test. The `max(returned_bytes, MIN_LOOKUP_COST)` change is the most structurally different of the three fixes (it also refactors the byte accumulation from `usize`-fold-then-cast to a `u64` fold), so an untested path here is a gap worth tracking. Consider adding a test that sends a pathset whose path exceeds 32 bytes (resolved to an empty node) followed by a real pathset with a 1-byte budget, analogous to the two new tests above.

Reviews (1): Last reviewed commit: "fix(l1): bound duplicate tiny-code snap ..." | Re-trigger Greptile

@ElFantasma

Copy link
Copy Markdown
Contributor Author

Thanks — the reviews caught a real gap I'd missed. Addressed in 368182a:

GetTrieNodes nested sub-path amplification (Codex High / Claude Finding 1) — fixed. Correct: charging per outer pathset was one level too shallow. The per-sub-path work is inside Store::get_trie_nodes, whose inner loop advanced bytes_used only on a hit (+= node.len()), so a single pathset packed with missing/overlong storage sub-paths (each → Ok(vec![]), 0 B) walked to completion for 32 B of outer budget. Fixed at the source: that loop now charges max(node.len(), MIN_TRIE_NODE_LOOKUP_COST) per sub-path, so it breaks once the per-call byte_limit is exhausted regardless of hits.

Storage-ranges over-charge (greptile P2 #1) — fixed. Now charges the per-probe minimum only when an account produced no slots (absent account / empty range); hit accounts keep paying 64 B/slot and aren't charged extra. Note the literal bytes_used += (n_slots*64).max(MIN) suggestion would double-count — the 64 B/slot is already added in the inner loop — so charging-on-empty is the equivalent that avoids that.

Trie-nodes miss-path test (greptile P2 #2 / Claude) — acknowledged, not added. Exercising get_trie_nodes' storage loop needs a state trie readable via its snapshot-gated open_state_trie, which the direct-write test fixtures don't populate (it goes through the block/layer commit machinery — the same limitation already documented for this handler). The fix mirrors the byte-codes/storage mechanism that is unit-tested; I updated the code comment to reflect the new per-sub-path charge and the standing fixture gap. Happy to add a committed-trie fixture as a fast-follow if preferred.

Minor (greptile): the storage over-charge above was the substantive half of that note; the accounting is now consistent with the other two handlers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant