Skip to content

Conversation

@mr-c
Copy link
Collaborator

@mr-c mr-c commented Oct 17, 2025

As inspired by @victorshoup in #1349 #1348

@mr-c mr-c force-pushed the clmul_arm64_clang branch from c0e656d to b64f00a Compare October 17, 2025 14:54
@mr-c mr-c force-pushed the clmul_arm64_clang branch from b64f00a to 449290c Compare October 17, 2025 15:02
@victorshoup
Copy link

I tried using this PR on my macbook. The FMA code works perfectly now.
But I'm not getting the PCLMUL code to work: it seems SIMDE_DETECT_CLANG_VERSION_NOT(22,0,0) is true on my platform, so I'm not getting the good stuff...

I'm not sure if it's possible to update my compiler really, and I wouldn't expect others to do so either.
I've just recently updated developer tools on my macbook, so I don't think there is much to do.
Any thoughts?

$ gcc --version
Apple clang version 17.0.0 (clang-1700.3.19.1)
Target: arm64-apple-darwin24.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

@victorshoup
Copy link

I suppose I could "brew install gcc"...I think I've done this in the past on another machine...I always get a bit nervous doing that...or do you think I'm generally better off this way?

@victorshoup
Copy link

More feedback... I installed gcc-15 using homebrew. My PCLMUL code now fails to compile:

/opt/homebrew/Cellar/gcc/15.2.0/lib/gcc/current/gcc/aarch64-apple-darwin24/15/include/arm_neon.h: In function 'void pclmul_mul1(long unsigned int*, long unsigned int, long unsigned int)':
/opt/homebrew/Cellar/gcc/15.2.0/lib/gcc/current/gcc/aarch64-apple-darwin24/15/include/arm_neon.h:7297:1: error: inlining failed in call to 'always_inline' 'poly128_t vmull_p64(poly64_t, poly64_t)': target specific option mismatch
7297 | vmull_p64 (poly64_t __a, poly64_t __b)
| ^~~~~~~~~
In file included from /Users/shoup/repos/simde/x86/avx512/../sse3.h:30,
from /Users/shoup/repos/simde/x86/avx512/../ssse3.h:30,
from /Users/shoup/repos/simde/x86/avx512/../sse4.1.h:31,
from /Users/shoup/repos/simde/x86/avx512/../sse4.2.h:31,
from /Users/shoup/repos/simde/x86/avx512/../avx.h:32:
/Users/shoup/repos/simde/x86/clmul.h:210:18: note: called from here
210 | vmull_p64(
| ~~~~~~~~~^~~
211 | vgetq_lane_p64(vreinterpretq_p64_u64(simde__m128i_to_neon_u64(a)), (imm8 ) & 1),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
212 | vgetq_lane_p64(vreinterpretq_p64_u64(simde__m128i_to_neon_u64(b)), (imm8 >> 4) & 1)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
213 | )
| ~

Note that the calling code is just this:

void
pclmul_mul1 (unsigned long *c, unsigned long a, unsigned long b)
{
   __m128i aa = _mm_setr_epi64( _mm_cvtsi64_m64(a), _mm_cvtsi64_m64(0));
   __m128i bb = _mm_setr_epi64( _mm_cvtsi64_m64(b), _mm_cvtsi64_m64(0));
   _mm_storeu_si128((__m128i*)c, _mm_clmulepi64_si128(aa, bb, 0));
}

Not sure what to make of this...
So for now, my asm hack is still the only working solution for me...

I'm not in a rush, though...I will be happy to help in any way I can...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants