Skip to content

migrate: count & report skipped storage slots, not just accounts#6902

Open
0xalizk wants to merge 1 commit into
lambdaclass:eip-7864-planfrom
0xalizk:fix/migrate-count-skipped-storage-slots
Open

migrate: count & report skipped storage slots, not just accounts#6902
0xalizk wants to merge 1 commit into
lambdaclass:eip-7864-planfrom
0xalizk:fix/migrate-count-skipped-storage-slots

Conversation

@0xalizk

@0xalizk 0xalizk commented Jun 23, 2026

Copy link
Copy Markdown

What

ethrex migrate (binary-trie bootstrap from a geth snapshot + preimages) tracks and
reports how many accounts were skipped due to a missing keccak preimage, but
storage slots with a missing preimage are dropped silently — no counter, no warning.
This makes the storage path symmetric with the account path: count skipped slots, warn
(capped), and report the total.

Why

Every leaf migrate drops is an account/slot the resulting binary trie won't contain. The
account skip count is currently the only completeness signal at migration time, and it
says nothing about storage — so a structurally incomplete trie (missing slots) can look
clean. We hit this bootstrapping mainnet: the run reported its account skips, but tens of
millions of storage entries had dropped silently, discoverable only by manually diffing
storage_count against the input. This surfaces it directly.

The subtlety: zero-value slots

In the collect phase a storage entry resolves to None for two different reasons:

  1. a missing preimage (a real coverage gap — what we want to count), or
  2. a legitimately zero-valued slot, which is intentionally not stored.

Counting all Nones would conflate them and overcount. The new code re-checks the
preimages on the None path and counts a slot as skipped only when its address/slot
preimage is genuinely absent; zero-value slots are not counted.

Changes

  • collect_phase returns skipped_accounts and skipped_slots separately (was one skipped).
  • Storage None arm distinguishes missing-preimage from zero-value, increments
    skipped_slots, and emits a capped warning mirroring the account path.
  • Final report: "{skipped_accounts} accounts + {skipped_slots} storage slots skipped …".
  • Live progress line also shows a running {accounts}+{slots} skipped.

Observability only — no change to the trie output. One file (cmd/ethrex/migrate.rs), +37/−11.

Testing

Compiles against eip-7864-plan. Semantics validated on a full mainnet migration
(~2 B entries): the account skip count matches the existing counter, the new slot counter
surfaces a previously-silent storage gap, and zero-value slots are correctly excluded.

@0xalizk 0xalizk requested a review from a team as a code owner June 23, 2026 13:46
@greptile-apps

greptile-apps Bot commented Jun 23, 2026

Copy link
Copy Markdown

Greptile Summary

This PR makes the ethrex migrate collect phase symmetric between accounts and storage: skipped storage slots with missing keccak preimages are now counted, capped-warned (up to 10), and reported alongside the existing account skip counter, while legitimately zero-valued slots are correctly excluded by re-checking the preimage maps on the None path.

  • collect_phase return type expands from a 4-tuple to a 5-tuple (skipped_accounts, skipped_slots are now separate), with the function signature, doc comment, and call site updated accordingly.
  • A new None branch for RawEntry::Storage re-queries both preimage maps to distinguish a missing-preimage skip from a zero-value slot, increments skipped_slots, and emits a mirrored capped warning.
  • Both the live progress line and the final post-collect report now surface the two skip counters.

Confidence Score: 4/5

Observability-only change with no effect on trie output; the zero-value slot exclusion logic is correct and the warning cap mirrors the existing account path.

The core logic — re-checking preimages on the None storage path to distinguish missing preimage from zero-value slots — is sound. The function signature change is mechanical and correctly propagated to the single call site. The only feedback is minor style points around redundant preimage lookups in the warning block and the ambiguous {skipped_accounts}+{skipped_slots} format in the progress line.

No files require special attention; cmd/ethrex/migrate.rs is the only changed file and the logic is straightforward.

Important Files Changed

Filename Overview
cmd/ethrex/migrate.rs Splits the single skipped counter into skipped_accounts + skipped_slots, re-checks preimages on the None storage path to distinguish missing-preimage from zero-value slots, and surfaces both counts in progress and final log lines. Logic is correct; minor redundant preimage lookups exist in the warning path.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[RawEntry::Storage from snapshot] --> B{Parallel processing}
    B --> C{get_addr preimage?}
    C -- None --> D[closure returns None]
    C -- Some --> E{get_slot preimage?}
    E -- None --> D
    E -- Some --> F{value_u256 == 0?}
    F -- Yes --> D
    F -- No --> G[ProcessedEntry::Storage]
    G --> H[inserts.push + storage_count++]
    D --> I{Serial re-check preimages}
    I --> J{get_addr.is_none OR get_slot.is_none?}
    J -- Yes: missing preimage --> K[skipped_slots++, maybe warn]
    J -- No: zero-value slot --> L[silently ignored, not counted]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[RawEntry::Storage from snapshot] --> B{Parallel processing}
    B --> C{get_addr preimage?}
    C -- None --> D[closure returns None]
    C -- Some --> E{get_slot preimage?}
    E -- None --> D
    E -- Some --> F{value_u256 == 0?}
    F -- Yes --> D
    F -- No --> G[ProcessedEntry::Storage]
    G --> H[inserts.push + storage_count++]
    D --> I{Serial re-check preimages}
    I --> J{get_addr.is_none OR get_slot.is_none?}
    J -- Yes: missing preimage --> K[skipped_slots++, maybe warn]
    J -- No: zero-value slot --> L[silently ignored, not counted]
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
cmd/ethrex/migrate.rs:541
The progress line uses `{skipped_accounts}+{skipped_slots}` without any label context, which can be hard to parse at a glance — a reader watching live output may not know which side of the `+` is accounts and which is slots. Since the final summary already uses the clearer `"{skipped_accounts} accounts + {skipped_slots} storage slots"` format, aligning the progress line to a similar short form would make both outputs consistent.

```suggestion
                "{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts} acc+{skipped_slots} slot skipped | {mb_per_sec} MB/s | ETA {eta}",
```

### Issue 2 of 2
cmd/ethrex/migrate.rs:500-510
**Redundant preimage lookups in the warning block**

The `missing_preimage` flag is already computed on lines 498–499, so calling `preimages.get_addr(keccak_addr).is_some()` and `preimages.get_slot(keccak_slot).is_some()` inside the `warn!` re-invokes the preimage lookup. Additionally, due to short-circuit evaluation in the `missing_preimage` check (`||`), when `get_addr` is `None`, `get_slot` is never called there — but it is called in the warning, meaning the warning emits a potentially additional lookup that wasn't part of the gate. These are negligible in practice (capped at 10 warnings), but storing the intermediate booleans would make the code clearer and avoid the redundancy.

Reviews (1): Last reviewed commit: "migrate: count & report skipped storage ..." | Re-trigger Greptile

Comment thread cmd/ethrex/migrate.rs Outdated
};
info!(
"{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {mb_per_sec} MB/s | ETA {eta}",
"{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts}+{skipped_slots} skipped | {mb_per_sec} MB/s | ETA {eta}",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The progress line uses {skipped_accounts}+{skipped_slots} without any label context, which can be hard to parse at a glance — a reader watching live output may not know which side of the + is accounts and which is slots. Since the final summary already uses the clearer "{skipped_accounts} accounts + {skipped_slots} storage slots" format, aligning the progress line to a similar short form would make both outputs consistent.

Suggested change
"{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts}+{skipped_slots} skipped | {mb_per_sec} MB/s | ETA {eta}",
"{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts} acc+{skipped_slots} slot skipped | {mb_per_sec} MB/s | ETA {eta}",
Prompt To Fix With AI
This is a comment left during a code review.
Path: cmd/ethrex/migrate.rs
Line: 541

Comment:
The progress line uses `{skipped_accounts}+{skipped_slots}` without any label context, which can be hard to parse at a glance — a reader watching live output may not know which side of the `+` is accounts and which is slots. Since the final summary already uses the clearer `"{skipped_accounts} accounts + {skipped_slots} storage slots"` format, aligning the progress line to a similar short form would make both outputs consistent.

```suggestion
                "{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts} acc+{skipped_slots} slot skipped | {mb_per_sec} MB/s | ETA {eta}",
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread cmd/ethrex/migrate.rs Outdated
Comment on lines +500 to +510
if missing_preimage {
if skipped_slots < 10 {
warn!(
"Skipped storage slot (addr-keccak {}, slot-keccak {}): has_addr_preimage={}, has_slot_preimage={}",
hex::encode(keccak_addr),
hex::encode(keccak_slot),
preimages.get_addr(keccak_addr).is_some(),
preimages.get_slot(keccak_slot).is_some(),
);
}
skipped_slots += 1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant preimage lookups in the warning block

The missing_preimage flag is already computed on lines 498–499, so calling preimages.get_addr(keccak_addr).is_some() and preimages.get_slot(keccak_slot).is_some() inside the warn! re-invokes the preimage lookup. Additionally, due to short-circuit evaluation in the missing_preimage check (||), when get_addr is None, get_slot is never called there — but it is called in the warning, meaning the warning emits a potentially additional lookup that wasn't part of the gate. These are negligible in practice (capped at 10 warnings), but storing the intermediate booleans would make the code clearer and avoid the redundancy.

Prompt To Fix With AI
This is a comment left during a code review.
Path: cmd/ethrex/migrate.rs
Line: 500-510

Comment:
**Redundant preimage lookups in the warning block**

The `missing_preimage` flag is already computed on lines 498–499, so calling `preimages.get_addr(keccak_addr).is_some()` and `preimages.get_slot(keccak_slot).is_some()` inside the `warn!` re-invokes the preimage lookup. Additionally, due to short-circuit evaluation in the `missing_preimage` check (`||`), when `get_addr` is `None`, `get_slot` is never called there — but it is called in the warning, meaning the warning emits a potentially additional lookup that wasn't part of the gate. These are negligible in practice (capped at 10 warnings), but storing the intermediate booleans would make the code clearer and avoid the redundancy.

How can I resolve this? If you propose a fix, please make it concise.

The collect phase tracked and reported a count of accounts skipped for a missing
keccak preimage, but storage slots with a missing preimage were dropped silently
(no counter, no warning) — so a structurally incomplete trie could look clean.

Make the storage path symmetric: track skipped_slots, warn (capped), and report it
alongside skipped_accounts. A storage entry resolving to None means either a missing
preimage (a real gap) or a legitimately zero-valued slot (intentionally not stored);
re-check the preimages on the None path so zero-value slots are not miscounted.

Observability only; no change to the trie output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@0xalizk 0xalizk force-pushed the fix/migrate-count-skipped-storage-slots branch from af77e56 to eccb463 Compare June 23, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant