Add optimized secp384r1 (NIST P-384) ECDSA implementation#973
Add optimized secp384r1 (NIST P-384) ECDSA implementation#973tamashi095 wants to merge 4 commits into
Conversation
Mirrors the secp256r1 module architecture: p384 crate types for encoding and RFC6979 nonce generation, ark-secp384r1 for field and curve arithmetic, and a WindowedScalarMultiplier with a precomputed generator table for fast fixed-base and double-base scalar multiplication. Unlike secp256r1, high-s signatures are accepted and sign does not normalize s, matching the RustCrypto p384 crate exactly (required for verifying X.509/attestation certificate chains). Gated behind the experimental feature (not yet audited).
Each iteration signs a random message with both implementations (asserting byte equality) and verifies a signature variant (valid, bit-flipped, edge-case scalars, random bytes, wrong key, malleated) with both, asserting identical accept/reject decisions for SHA-256, SHA-384 and SHA-512 digests. Runs 100 iterations in CI; longer sessions via SECP384R1_FUZZ_ITERATIONS.
|
Added a differential fuzz test ( Ran a 250,000-iteration release-mode session locally (~250k differential sign comparisons, ~1.3M differential verify decisions): zero disagreements, zero panics (seed Since the two pipelines share no math code (arkworks Jacobian + Straus windows here vs fiat-crypto field + complete formulas in |
|
A few additional notes for reviewers, surfaced while double-checking the PR for gaps:
Also added the module to the README scheme list (including the high-s acceptance note, since the README is where the other curves' low-s rules are documented). |
Attestation certificate chains carry DER-encoded ECDSA signatures. Parsing accepts exactly the same encodings as the p384 crate, which is asserted per-vector on the wycheproof DER test cases.
|
Resolved note 3 from the previous comment: added Parsing delegates to the |
|
Resolved note 1 (Apple Silicon-only benchmarks): re-ran the same bench on x86_64 — an AMD EPYC dedicated-performance VM (Fly.io, 2 dedicated vCPUs, rust 1.92,
Absolute numbers on the EPYC VM are ~2.8x the Apple Silicon ones across the board (virtualized, conservative clocks — the hypervisor masks the EPYC generation), which is exactly why the ratio, not the absolute latency, is the portable claim. Both implementations were measured back-to-back on the same core in the same run, and the ours/p384 ratio agrees within ~4% across the two ISAs. Final gas numbers should still come from Sui's own reference hardware, but the speedup is not an artifact of ARM codegen. |
Description
Adds a
secp384r1module implementing ECDSA over NIST P-384, mirroring thesecp256r1module's architecture, ~3.6x faster verify and ~2.4x faster sign than the RustCryptop384crate on the same inputs.Closes the alignment questions in #972 — please see that issue for the design questions (low-s policy, experimental gating, no-recovery scope) before review.
Motivation: Sui's new
0x2::ecdsa_p384::secp384r1_verifynative (sui#26934, for Apple App Attest / Android Key Attestation cert chains) verifies via thep384crate and shipped with a 54,000 gas base priced off its ~532µs verify (~12.8x fastcrypto's secp256r1). With this implementation the ratio drops to ~3.45x secp256r1, supporting a gas base around ~15,000.Benchmarks
Criterion, Apple Silicon, rustc 1.92, identical keys/messages/signatures for both implementations (
cargo bench --features experimental --bench secp384r1):What drives it: the
p384crate uses fully constant-time complete-addition formulas with no precomputation; this PR reuses fastcrypto'sWindowedScalarMultiplier(256-point precomputed generator table; Straus interleaved double-mul with width-5 sliding window) over arkworks Jacobian arithmetic (ark-secp384r10.4.0), exactly assecp256r1does. Tuning was benchmarked: width 6 measured ~3% slower than width 5; a 512-point table gained <1% (not worth 2x memory) — so the constants matchsecp256r1.Design notes
p384-crate equivalence: accepts/rejects exactly the same signatures asp384(asserted per-vector on both wycheproof sets) and produces byte-identical RFC6979/SHA-384 signatures. Consequently — and unlikesecp256r1— high-s signatures are accepted and sign does not normalize s: the use case is verifying externally-produced X.509/attestation signatures, so rejecting high-s would break real certificate chains. Documented in the module docs; signatures are therefore malleable ((r,s) -> (r,n-s)).experimentalwith an explicit not-yet-audited note (open question in Optimized secp384r1 (NIST P-384) ECDSA: ~3.6x faster verify than RustCrypto p384 #972: Sui consuming an experimental module for a production native is a real tension).recoverable.rs— public-key recovery isn't needed for certificate verification (v1 scope).Sha384hash function added tohash.rs; sign/verify are generic over digest length (verify_with_hash::<Sha256, 32>etc.);DefaultHashis SHA-384 matching thep384crate.secp256r1: signing uses the same constant-time fixed-window path as secp256r1's signer; verification (public data) uses the same vartime sliding-window double-mul.p384 0.13,ark-secp384r1 0.4.0—cargo deny check bans licenses sourcesis clean.Test plan
46 new tests, all green (
cargo test --all-features):EcdsaSecp384r1Sha384+EcdsaSecp384r1Sha512(no SHA-256 set exists upstream); every vector additionally asserted to accept/reject identically to thep384crate.p384(fixed vectors + 64-case proptest incl. mutation rejection), cross-verification both directions, high-s acceptance test.secp256r1_tests.rs(serde/base64/ordering/zeroize-on-drop/display-elision/batch/...), group + conversion + multiplier tests, runnable doctest.CI gates verified locally:
cargo fmt --check,cargo xclippy(zero warnings), license headers,cargo deny check bans licenses sources,cargo build --benches --features experimental,copy_key,unsecure_schemes,scripts/changed-files.sh. (The only failing test in the full workspace run is the pre-existingfastcrypto-zkpzk_login e2e test, which calls an external rapidsnark service that returned HTTP 500 — unrelated.)Honest limits
This does not reach secp256r1's ~42µs: P-384's 6-limb field makes each multiplication ~2.3x dearer with 1.5x more bits to process, so ~3.4x secp256r1's cost is in line with the curve-size scaling — the remaining gap to
p384was the precomputation/window machinery, which this PR captures. Further gains would need a specialized P-384 field backend (assembly or fiat-crypto-style dedicated reduction), which I deliberately avoided in favor of vetted arkworks arithmetic.