We are leveraging CADISHI-style (https://doi.org/10.1016/j.cpc.2018.10.018, https://github.com/bio-phys/cadishi) schemes on GPU but really we could probably leverage them on CPU to great gains. I belive we have good round robin threading and SIMD on the inner loop, but we should see if we can optimize using their style of calculation.
We are leveraging CADISHI-style (https://doi.org/10.1016/j.cpc.2018.10.018, https://github.com/bio-phys/cadishi) schemes on GPU but really we could probably leverage them on CPU to great gains. I belive we have good round robin threading and SIMD on the inner loop, but we should see if we can optimize using their style of calculation.