migrate: count & report skipped storage slots, not just accounts#6902
migrate: count & report skipped storage slots, not just accounts#69020xalizk wants to merge 1 commit into
Conversation
Greptile SummaryThis PR makes the
Confidence Score: 4/5Observability-only change with no effect on trie output; the zero-value slot exclusion logic is correct and the warning cap mirrors the existing account path. The core logic — re-checking preimages on the No files require special attention;
|
| Filename | Overview |
|---|---|
| cmd/ethrex/migrate.rs | Splits the single skipped counter into skipped_accounts + skipped_slots, re-checks preimages on the None storage path to distinguish missing-preimage from zero-value slots, and surfaces both counts in progress and final log lines. Logic is correct; minor redundant preimage lookups exist in the warning path. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[RawEntry::Storage from snapshot] --> B{Parallel processing}
B --> C{get_addr preimage?}
C -- None --> D[closure returns None]
C -- Some --> E{get_slot preimage?}
E -- None --> D
E -- Some --> F{value_u256 == 0?}
F -- Yes --> D
F -- No --> G[ProcessedEntry::Storage]
G --> H[inserts.push + storage_count++]
D --> I{Serial re-check preimages}
I --> J{get_addr.is_none OR get_slot.is_none?}
J -- Yes: missing preimage --> K[skipped_slots++, maybe warn]
J -- No: zero-value slot --> L[silently ignored, not counted]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[RawEntry::Storage from snapshot] --> B{Parallel processing}
B --> C{get_addr preimage?}
C -- None --> D[closure returns None]
C -- Some --> E{get_slot preimage?}
E -- None --> D
E -- Some --> F{value_u256 == 0?}
F -- Yes --> D
F -- No --> G[ProcessedEntry::Storage]
G --> H[inserts.push + storage_count++]
D --> I{Serial re-check preimages}
I --> J{get_addr.is_none OR get_slot.is_none?}
J -- Yes: missing preimage --> K[skipped_slots++, maybe warn]
J -- No: zero-value slot --> L[silently ignored, not counted]
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
cmd/ethrex/migrate.rs:541
The progress line uses `{skipped_accounts}+{skipped_slots}` without any label context, which can be hard to parse at a glance — a reader watching live output may not know which side of the `+` is accounts and which is slots. Since the final summary already uses the clearer `"{skipped_accounts} accounts + {skipped_slots} storage slots"` format, aligning the progress line to a similar short form would make both outputs consistent.
```suggestion
"{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts} acc+{skipped_slots} slot skipped | {mb_per_sec} MB/s | ETA {eta}",
```
### Issue 2 of 2
cmd/ethrex/migrate.rs:500-510
**Redundant preimage lookups in the warning block**
The `missing_preimage` flag is already computed on lines 498–499, so calling `preimages.get_addr(keccak_addr).is_some()` and `preimages.get_slot(keccak_slot).is_some()` inside the `warn!` re-invokes the preimage lookup. Additionally, due to short-circuit evaluation in the `missing_preimage` check (`||`), when `get_addr` is `None`, `get_slot` is never called there — but it is called in the warning, meaning the warning emits a potentially additional lookup that wasn't part of the gate. These are negligible in practice (capped at 10 warnings), but storing the intermediate booleans would make the code clearer and avoid the redundancy.
Reviews (1): Last reviewed commit: "migrate: count & report skipped storage ..." | Re-trigger Greptile
| }; | ||
| info!( | ||
| "{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {mb_per_sec} MB/s | ETA {eta}", | ||
| "{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts}+{skipped_slots} skipped | {mb_per_sec} MB/s | ETA {eta}", |
There was a problem hiding this comment.
The progress line uses
{skipped_accounts}+{skipped_slots} without any label context, which can be hard to parse at a glance — a reader watching live output may not know which side of the + is accounts and which is slots. Since the final summary already uses the clearer "{skipped_accounts} accounts + {skipped_slots} storage slots" format, aligning the progress line to a similar short form would make both outputs consistent.
| "{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts}+{skipped_slots} skipped | {mb_per_sec} MB/s | ETA {eta}", | |
| "{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts} acc+{skipped_slots} slot skipped | {mb_per_sec} MB/s | ETA {eta}", |
Prompt To Fix With AI
This is a comment left during a code review.
Path: cmd/ethrex/migrate.rs
Line: 541
Comment:
The progress line uses `{skipped_accounts}+{skipped_slots}` without any label context, which can be hard to parse at a glance — a reader watching live output may not know which side of the `+` is accounts and which is slots. Since the final summary already uses the clearer `"{skipped_accounts} accounts + {skipped_slots} storage slots"` format, aligning the progress line to a similar short form would make both outputs consistent.
```suggestion
"{pct}% ({} MB / {} MB) | {account_count} accounts, {storage_count} storage | {skipped_accounts} acc+{skipped_slots} slot skipped | {mb_per_sec} MB/s | ETA {eta}",
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| if missing_preimage { | ||
| if skipped_slots < 10 { | ||
| warn!( | ||
| "Skipped storage slot (addr-keccak {}, slot-keccak {}): has_addr_preimage={}, has_slot_preimage={}", | ||
| hex::encode(keccak_addr), | ||
| hex::encode(keccak_slot), | ||
| preimages.get_addr(keccak_addr).is_some(), | ||
| preimages.get_slot(keccak_slot).is_some(), | ||
| ); | ||
| } | ||
| skipped_slots += 1; |
There was a problem hiding this comment.
Redundant preimage lookups in the warning block
The missing_preimage flag is already computed on lines 498–499, so calling preimages.get_addr(keccak_addr).is_some() and preimages.get_slot(keccak_slot).is_some() inside the warn! re-invokes the preimage lookup. Additionally, due to short-circuit evaluation in the missing_preimage check (||), when get_addr is None, get_slot is never called there — but it is called in the warning, meaning the warning emits a potentially additional lookup that wasn't part of the gate. These are negligible in practice (capped at 10 warnings), but storing the intermediate booleans would make the code clearer and avoid the redundancy.
Prompt To Fix With AI
This is a comment left during a code review.
Path: cmd/ethrex/migrate.rs
Line: 500-510
Comment:
**Redundant preimage lookups in the warning block**
The `missing_preimage` flag is already computed on lines 498–499, so calling `preimages.get_addr(keccak_addr).is_some()` and `preimages.get_slot(keccak_slot).is_some()` inside the `warn!` re-invokes the preimage lookup. Additionally, due to short-circuit evaluation in the `missing_preimage` check (`||`), when `get_addr` is `None`, `get_slot` is never called there — but it is called in the warning, meaning the warning emits a potentially additional lookup that wasn't part of the gate. These are negligible in practice (capped at 10 warnings), but storing the intermediate booleans would make the code clearer and avoid the redundancy.
How can I resolve this? If you propose a fix, please make it concise.The collect phase tracked and reported a count of accounts skipped for a missing keccak preimage, but storage slots with a missing preimage were dropped silently (no counter, no warning) — so a structurally incomplete trie could look clean. Make the storage path symmetric: track skipped_slots, warn (capped), and report it alongside skipped_accounts. A storage entry resolving to None means either a missing preimage (a real gap) or a legitimately zero-valued slot (intentionally not stored); re-check the preimages on the None path so zero-value slots are not miscounted. Observability only; no change to the trie output. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
af77e56 to
eccb463
Compare
What
ethrex migrate(binary-trie bootstrap from a geth snapshot + preimages) tracks andreports how many accounts were skipped due to a missing keccak preimage, but
storage slots with a missing preimage are dropped silently — no counter, no warning.
This makes the storage path symmetric with the account path: count skipped slots, warn
(capped), and report the total.
Why
Every leaf migrate drops is an account/slot the resulting binary trie won't contain. The
account skip count is currently the only completeness signal at migration time, and it
says nothing about storage — so a structurally incomplete trie (missing slots) can look
clean. We hit this bootstrapping mainnet: the run reported its account skips, but tens of
millions of storage entries had dropped silently, discoverable only by manually diffing
storage_countagainst the input. This surfaces it directly.The subtlety: zero-value slots
In the collect phase a storage entry resolves to
Nonefor two different reasons:Counting all
Nones would conflate them and overcount. The new code re-checks thepreimages on the
Nonepath and counts a slot as skipped only when its address/slotpreimage is genuinely absent; zero-value slots are not counted.
Changes
collect_phasereturnsskipped_accountsandskipped_slotsseparately (was oneskipped).Nonearm distinguishes missing-preimage from zero-value, incrementsskipped_slots, and emits a capped warning mirroring the account path."{skipped_accounts} accounts + {skipped_slots} storage slots skipped …".{accounts}+{slots} skipped.Observability only — no change to the trie output. One file (
cmd/ethrex/migrate.rs), +37/−11.Testing
Compiles against
eip-7864-plan. Semantics validated on a full mainnet migration(~2 B entries): the account skip count matches the existing counter, the new slot counter
surfaces a previously-silent storage gap, and zero-value slots are correctly excluded.