Skip to content

Conversation

@Sentimentron
Copy link
Contributor

Results from silicon (bpp=3)

CPU Baseline Result Speedup
Arm Cortex A520 910.9 MiB/s 2025.9 MiB/s 122.40%
Arm Cortex X4 6551.8 MiB/s 6313 MiB/s -3.64%
Apple Silicon M2 5839.2 MiB/s 5751.8 MiB/s -1.33%
AMD EPYC 7B13 3830.6 MiB/s 5472.9 MiB/s 42.87%

Results from silicon (bpp=4)

CPU Baseline Result Speedup
Arm Cortex A520 607.0 MiB/s 3226.8 MiB/s 431.62%
Arm Cortex X4 6551.8 MiB/s 6313.0 MiB/s 105.71%
Apple Silicon M2 5800.3 MiB/s 10616.0 MiB/s 83.03%
AMD EPYC 7B13 10796.0 MiB/s 15268.0 MiB/s 41.42%

Opened as a draft until #632 is resolved.

Prefix sum improves performance on A520 and Epyc by
122.40% and 42.87% respectively.
Improves performance by around 40% on the Epyc system,
431% on the Cortex-A520.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant