-
Notifications
You must be signed in to change notification settings - Fork 75
BREAKING: Make random_mod_core platform-independent (take 2)
#1021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1021 +/- ##
==========================================
+ Coverage 79.86% 79.89% +0.02%
==========================================
Files 163 163
Lines 17737 17757 +20
==========================================
+ Hits 14166 14187 +21
+ Misses 3571 3570 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
random_mod_core platform-independent (take 2)
This comment was marked as outdated.
This comment was marked as outdated.
This appears to have fallen afoul of the compiler bug. This reverts commit d0da670.
This comment was marked as outdated.
This comment was marked as outdated.
random_mod_core platform-independent (take 2)random_mod_core platform-independent (take 2)
Justification: `random_mod` is already variable-time due to the rejection sampling. This allows us to optimize the loop without running afoul of the compiler bug.
|
Current benchmarks show equivocal performance compared with master, slower in some cases, faster in some cases: |
|
@tarcieri any thoughts on this? I could see the switch from |
|
Sorry I haven't looked at this, it was a holiday weekend here in the US, and I got a bit lost following your other PR with supposed miscompilation? (did you ever get that into some sort of outside-of-CI reproducible state? If so we should report it upstream) I feel like this is fundamentally trying to something quite similar to #285, but I find the implementation there a bit more comprehensible. cc @fjarri |
|
No worries; miscompilation bug is here: E.g.: |
Yes; this implementation is kinda wonky for two reasons:
|
|
Yeah actually re-reading #285, I prefer it myself - difference (in terms of values produced / rng state) is that this is (64-bit) word-aligned whereas that is byte-aligned; this wastes more entropy but may provide more opportunities for optimization, whereas that wastes less entropy (but still some; we are not going down to the bit level after all.) |
|
And |
This introduces a fairly robust platform-independence test for
random_mod_core. The test exercises the rejection sampling and also the way multi-wordUints are constructed, which changed in this PR, as you can see in commit d2f4b89:The difference, under most circumstances, is just that word order will be rotated; the old entropy gathering approach independently rejection-sampled the high word first, then built up the rest, so the resulting
Uintwas sort of like(MSW, LSW, LSW+1, …, MSW-1)over the raw entropy stream, except in cases where the high-word rejection sample triggered, in which case it was a trickier transform. The new interpretation is simply little-endian, both bytewise and wordwise, aligned to 64-bit word boundaries.I elected to make this change for two reasons:
random_mod, particularly in rare cases of multiple rejection samples over the most significant word. Rather than make this a rare, subtle, difficult-to-detect change that will bite people unexpectedly, I elected to make this a big, obvious, easy-to-detect change that people will usually notice.To work around the apparent compiler bug that required #1010 to be reverted (See #1018 for exploration of the bug), I also elected to make the comparison loop variable-time rather than using
ct_lt, since the latter is what seems to trigger the bug. My defense is thatrandom_modis already variable-time; it leaks RNG state via timing information every time it rejection-samples. (It might be a good idea to rename it torandom_mod_vartime, since AFAICT leaking this timing information is unavoidable in the general case.)By making this change, we are able to carry through the prior copy-elision optimization from master, resulting in an implementation that performs about at parity with master in the benchmarks I ran.
One other note: I elected to take 64-bit rather than 32-bit words as primary here, i.e., 32-bit platforms need to adjust by sampling and discarding an extra word in odd cases rather than 64-bit platforms needing to adjust by only sampling half a word in those cases. Both are doable, but operating in terms of 64-bit word sizes frees us from the constraint of our RNG needing to be "4-byte sequential."
Fixes #1009