Make tests deterministic from seed #31

wrongu · 2023-05-26T20:05:19Z

Add conftest.py and use it to set a random seed for all pytest tests.

This PR makes pytest deterministic and solves the problem of tests randomly failing due to unlucky RNG. There are pros and cons to this, of course. I find that it tends to help in CI settings.

WardBrian · 2023-05-26T20:07:22Z

We need to work on this issue to make CI less of a hassle, but I’m not convinced fixing a seed is the approach we should take

codecov-commenter · 2023-05-26T20:10:52Z

Codecov Report

Merging #31 (ccee362) into main (f845628) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #31   +/-   ##
=======================================
  Coverage   95.76%   95.76%           
=======================================
  Files           9        9           
  Lines         260      260           
=======================================
  Hits          249      249           
  Misses         11       11

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

wrongu · 2023-05-26T21:05:27Z

This could alternatively be configured to run twice, once from seed and once without. Or n times with n seeds. Or any further combination.

Ultimately up to you of course. Just a suggestion.

bob-carpenter · 2023-05-30T18:06:24Z

I didn't see a related issue this solves, but the PR indicates it's failures for some RNG seeds. So the obvious question is why there are failures. Is it just too strict a test interval like testing that 95% intervals cover true results and cause failures 5% of the time? In that case, we can either (a) widen the test interval, or (b) do repeated tests (if we do 20 tests, we expect binomial(.05, 20) failures and can test that expectation), or (c) fix seeds.

The downside of (a) is that it reduces test power, the downside of (b) is a lot more computation, and the downside of (c) is brittle tests. I don't think there's an optimal solution, so we just need to pick something.

WardBrian · 2023-05-30T18:25:23Z

Yes, the tests for e.g. HMC have tolerances that mean they still fail on some runs.

Our current test suite runs in ~12 seconds, so we have a lot head room to run more iterations/tests but we need to keep an eye on this as the number of algorithms increases. At the moment for the purposes of reviewing I've been doing a sort of hand-wavey equivalent of (b) by seeing that only one or two test runs (out of the 12 we do in our matrix of python version and OS) failed

WardBrian · 2023-11-16T15:14:45Z

I recently stumbled on https://github.com/pytest-dev/pytest-rerunfailures, which could be another approach we use (maybe only as a stopgap)

roualdes · 2023-11-16T17:56:37Z

I see the appeal of pytest-rerunfailures, especially as a stopgap.

Brian's "hand-wavey" strategy seems good to me, especially if we just couch it in stats-speak. Modify Bob's suggestion (b) to test for 90% coverage (or some such number not too much smaller than 95%), e.g. at least 18 of 20 repeated tests pass, instead of boolean testing the expectation. I think this would allow more wiggle room while still protecting test power.

Then pytest-rerunfailures maybe once or twice, in case we didn't build in enough wiggle room and we find ourselves frustrated by the tests.

Add conftest.py and use it to set a random seed for all pytest tests

ccee362

WardBrian mentioned this pull request Jun 14, 2023

HMC Tests Are Failing #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make tests deterministic from seed #31

Make tests deterministic from seed #31

Uh oh!

wrongu commented May 26, 2023

Uh oh!

WardBrian commented May 26, 2023

Uh oh!

codecov-commenter commented May 26, 2023

Uh oh!

wrongu commented May 26, 2023

Uh oh!

bob-carpenter commented May 30, 2023

Uh oh!

WardBrian commented May 30, 2023

Uh oh!

WardBrian commented Nov 16, 2023

Uh oh!

roualdes commented Nov 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Make tests deterministic from seed #31

Are you sure you want to change the base?

Make tests deterministic from seed #31

Uh oh!

Conversation

wrongu commented May 26, 2023

Uh oh!

WardBrian commented May 26, 2023

Uh oh!

codecov-commenter commented May 26, 2023

Codecov Report

Uh oh!

wrongu commented May 26, 2023

Uh oh!

bob-carpenter commented May 30, 2023

Uh oh!

WardBrian commented May 30, 2023

Uh oh!

WardBrian commented Nov 16, 2023

Uh oh!

roualdes commented Nov 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants