-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AArch64] Add support for C1 CPUs #171124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-backend-aarch64 Author: None (dcandler) ChangesThis patch adds initial support for the Arm v9.3 C1 processors:
For more information on each, see: Technical Reference Manual for C1-Nano: Technical Reference Manual for C1-Pro: Technical Reference Manual for C1-Premium: Technical Reference Manual for C1-Ultra: Patch is 49.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/171124.diff 13 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index acca997e0ff64..e36a4c64965cb 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -631,6 +631,11 @@ X86 Support
Arm and AArch64 Support
^^^^^^^^^^^^^^^^^^^^^^^
+- Support has been added for the following processors (command-line identifiers in parentheses):
+ - Arm C1-Nano (``c1-nano``)
+ - Arm C1-Pro (``c1-pro``)
+ - Arm C1-Premium (``c1-premium``)
+ - Arm C1-Ultra (``c1-ultra``)
- More intrinsics for the following AArch64 instructions:
FCVTZ[US], FCVTN[US], FCVTM[US], FCVTP[US], FCVTA[US]
diff --git a/clang/test/Driver/aarch64-mcpu.c b/clang/test/Driver/aarch64-mcpu.c
index 447ee4bd3a6f9..fdf2e4011487a 100644
--- a/clang/test/Driver/aarch64-mcpu.c
+++ b/clang/test/Driver/aarch64-mcpu.c
@@ -84,6 +84,14 @@
// CORTEX-A520: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "cortex-a520"
// RUN: %clang --target=aarch64 -mcpu=cortex-a520ae -### -c %s 2>&1 | FileCheck -check-prefix=CORTEX-A520AE %s
// CORTEX-A520AE: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "cortex-a520ae"
+// RUN: %clang --target=aarch64 -mcpu=c1-nano -### -c %s 2>&1 | FileCheck -check-prefix=C1-NANO %s
+// C1-NANO: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "c1-nano"
+// RUN: %clang --target=aarch64 -mcpu=c1-pro -### -c %s 2>&1 | FileCheck -check-prefix=C1-PRO %s
+// C1-PRO: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "c1-pro"
+// RUN: %clang --target=aarch64 -mcpu=c1-premium -### -c %s 2>&1 | FileCheck -check-prefix=C1-PREMIUM %s
+// C1-PREMIUM: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "c1-premium"
+// RUN: %clang --target=aarch64 -mcpu=c1-ultra -### -c %s 2>&1 | FileCheck -check-prefix=C1-ULTRA %s
+// C1-ULTRA: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "c1-ultra"
// RUN: %clang --target=aarch64 -mcpu=cortex-r82 -### -c %s 2>&1 | FileCheck -check-prefix=CORTEXR82 %s
// CORTEXR82: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "cortex-r82"
diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-c1-nano.c b/clang/test/Driver/print-enabled-extensions/aarch64-c1-nano.c
new file mode 100644
index 0000000000000..33112527c9add
--- /dev/null
+++ b/clang/test/Driver/print-enabled-extensions/aarch64-c1-nano.c
@@ -0,0 +1,69 @@
+// REQUIRES: aarch64-registered-target
+// RUN: %clang --target=aarch64 --print-enabled-extensions -mcpu=c1-nano | FileCheck --strict-whitespace --implicit-check-not=FEAT_ %s
+
+// CHECK: Extensions enabled for the given AArch64 target
+// CHECK-EMPTY:
+// CHECK-NEXT: Architecture Feature(s) Description
+// CHECK-NEXT: FEAT_AMUv1 Enable Armv8.4-A Activity Monitors extension
+// CHECK-NEXT: FEAT_AMUv1p1 Enable Armv8.6-A Activity Monitors Virtualization support
+// CHECK-NEXT: FEAT_AdvSIMD Enable Advanced SIMD instructions
+// CHECK-NEXT: FEAT_BF16 Enable BFloat16 Extension
+// CHECK-NEXT: FEAT_BTI Enable Branch Target Identification
+// CHECK-NEXT: FEAT_CCIDX Enable Armv8.3-A Extend of the CCSIDR number of sets
+// CHECK-NEXT: FEAT_CHK Enable Armv8.0-A Check Feature Status Extension
+// CHECK-NEXT: FEAT_CLRBHB Enable Clear BHB instruction
+// CHECK-NEXT: FEAT_CRC32 Enable Armv8.0-A CRC-32 checksum instructions
+// CHECK-NEXT: FEAT_CSV2_2 Enable architectural speculation restriction
+// CHECK-NEXT: FEAT_DIT Enable Armv8.4-A Data Independent Timing instructions
+// CHECK-NEXT: FEAT_DPB Enable Armv8.2-A data Cache Clean to Point of Persistence
+// CHECK-NEXT: FEAT_DPB2 Enable Armv8.5-A Cache Clean to Point of Deep Persistence
+// CHECK-NEXT: FEAT_DotProd Enable dot product support
+// CHECK-NEXT: FEAT_ECV Enable enhanced counter virtualization extension
+// CHECK-NEXT: FEAT_ETE Enable Embedded Trace Extension
+// CHECK-NEXT: FEAT_FCMA Enable Armv8.3-A Floating-point complex number support
+// CHECK-NEXT: FEAT_FGT Enable fine grained virtualization traps extension
+// CHECK-NEXT: FEAT_FHM Enable FP16 FML instructions
+// CHECK-NEXT: FEAT_FP Enable Armv8.0-A Floating Point Extensions
+// CHECK-NEXT: FEAT_FP16 Enable half-precision floating-point data processing
+// CHECK-NEXT: FEAT_FPAC Enable Armv8.3-A Pointer Authentication Faulting enhancement
+// CHECK-NEXT: FEAT_FRINTTS Enable FRInt[32|64][Z|X] instructions that round a floating-point number to an integer (in FP format) forcing it to fit into a 32- or 64-bit int
+// CHECK-NEXT: FEAT_FlagM Enable Armv8.4-A Flag Manipulation instructions
+// CHECK-NEXT: FEAT_FlagM2 Enable alternative NZCV format for floating point comparisons
+// CHECK-NEXT: FEAT_HBC Enable Armv8.8-A Hinted Conditional Branches Extension
+// CHECK-NEXT: FEAT_HCX Enable Armv8.7-A HCRX_EL2 system register
+// CHECK-NEXT: FEAT_I8MM Enable Matrix Multiply Int8 Extension
+// CHECK-NEXT: FEAT_JSCVT Enable Armv8.3-A JavaScript FP conversion instructions
+// CHECK-NEXT: FEAT_LOR Enable Armv8.1-A Limited Ordering Regions extension
+// CHECK-NEXT: FEAT_LRCPC Enable support for RCPC extension
+// CHECK-NEXT: FEAT_LRCPC2 Enable Armv8.4-A RCPC instructions with Immediate Offsets
+// CHECK-NEXT: FEAT_LRCPC3 Enable Armv8.9-A RCPC instructions for A64 and Advanced SIMD and floating-point instruction set
+// CHECK-NEXT: FEAT_LSE Enable Armv8.1-A Large System Extension (LSE) atomic instructions
+// CHECK-NEXT: FEAT_LSE2 Enable Armv8.4-A Large System Extension 2 (LSE2) atomicity rules
+// CHECK-NEXT: FEAT_MOPS Enable Armv8.8-A memcpy and memset acceleration instructions
+// CHECK-NEXT: FEAT_MPAM Enable Armv8.4-A Memory system Partitioning and Monitoring extension
+// CHECK-NEXT: FEAT_MTE, FEAT_MTE2 Enable Memory Tagging Extension
+// CHECK-NEXT: FEAT_NMI, FEAT_GICv3_NMI Enable Armv8.8-A Non-maskable Interrupts
+// CHECK-NEXT: FEAT_NV, FEAT_NV2 Enable Armv8.4-A Nested Virtualization Enchancement
+// CHECK-NEXT: FEAT_PAN Enable Armv8.1-A Privileged Access-Never extension
+// CHECK-NEXT: FEAT_PAN2 Enable Armv8.2-A PAN s1e1R and s1e1W Variants
+// CHECK-NEXT: FEAT_PAuth Enable Armv8.3-A Pointer Authentication extension
+// CHECK-NEXT: FEAT_PMUv3 Enable Armv8.0-A PMUv3 Performance Monitors extension
+// CHECK-NEXT: FEAT_RAS, FEAT_RASv1p1 Enable Armv8.0-A Reliability, Availability and Serviceability Extensions
+// CHECK-NEXT: FEAT_RDM Enable Armv8.1-A Rounding Double Multiply Add/Subtract instructions
+// CHECK-NEXT: FEAT_SB Enable Armv8.5-A Speculation Barrier
+// CHECK-NEXT: FEAT_SEL2 Enable Armv8.4-A Secure Exception Level 2 extension
+// CHECK-NEXT: FEAT_SME Enable Scalable Matrix Extension (SME)
+// CHECK-NEXT: FEAT_SME2 Enable Scalable Matrix Extension 2 (SME2) instructions
+// CHECK-NEXT: FEAT_SPECRES Enable Armv8.5-A execution and data prediction invalidation instructions
+// CHECK-NEXT: FEAT_SPECRES2 Enable Speculation Restriction Instruction
+// CHECK-NEXT: FEAT_SSBS, FEAT_SSBS2 Enable Speculative Store Bypass Safe bit
+// CHECK-NEXT: FEAT_SVE Enable Scalable Vector Extension (SVE) instructions
+// CHECK-NEXT: FEAT_SVE2 Enable Scalable Vector Extension 2 (SVE2) instructions
+// CHECK-NEXT: FEAT_SVE_BitPerm Enable bit permutation SVE2 instructions
+// CHECK-NEXT: FEAT_TLBIOS, FEAT_TLBIRANGE Enable Armv8.4-A TLB Range and Maintenance instructions
+// CHECK-NEXT: FEAT_TRBE Enable Trace Buffer Extension
+// CHECK-NEXT: FEAT_TRF Enable Armv8.4-A Trace extension
+// CHECK-NEXT: FEAT_UAO Enable Armv8.2-A UAO PState
+// CHECK-NEXT: FEAT_VHE Enable Armv8.1-A Virtual Host extension
+// CHECK-NEXT: FEAT_WFxT Enable Armv8.7-A WFET and WFIT instruction
+// CHECK-NEXT: FEAT_XS Enable Armv8.7-A limited-TLB-maintenance instruction
diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-c1-premium.c b/clang/test/Driver/print-enabled-extensions/aarch64-c1-premium.c
new file mode 100644
index 0000000000000..3b146e36f81a2
--- /dev/null
+++ b/clang/test/Driver/print-enabled-extensions/aarch64-c1-premium.c
@@ -0,0 +1,71 @@
+// REQUIRES: aarch64-registered-target
+// RUN: %clang --target=aarch64 --print-enabled-extensions -mcpu=c1-premium | FileCheck --strict-whitespace --implicit-check-not=FEAT_ %s
+
+// CHECK: Extensions enabled for the given AArch64 target
+// CHECK-EMPTY:
+// CHECK-NEXT: Architecture Feature(s) Description
+// CHECK-NEXT: FEAT_AMUv1 Enable Armv8.4-A Activity Monitors extension
+// CHECK-NEXT: FEAT_AMUv1p1 Enable Armv8.6-A Activity Monitors Virtualization support
+// CHECK-NEXT: FEAT_AdvSIMD Enable Advanced SIMD instructions
+// CHECK-NEXT: FEAT_BF16 Enable BFloat16 Extension
+// CHECK-NEXT: FEAT_BTI Enable Branch Target Identification
+// CHECK-NEXT: FEAT_CCIDX Enable Armv8.3-A Extend of the CCSIDR number of sets
+// CHECK-NEXT: FEAT_CHK Enable Armv8.0-A Check Feature Status Extension
+// CHECK-NEXT: FEAT_CLRBHB Enable Clear BHB instruction
+// CHECK-NEXT: FEAT_CRC32 Enable Armv8.0-A CRC-32 checksum instructions
+// CHECK-NEXT: FEAT_CSV2_2 Enable architectural speculation restriction
+// CHECK-NEXT: FEAT_DIT Enable Armv8.4-A Data Independent Timing instructions
+// CHECK-NEXT: FEAT_DPB Enable Armv8.2-A data Cache Clean to Point of Persistence
+// CHECK-NEXT: FEAT_DPB2 Enable Armv8.5-A Cache Clean to Point of Deep Persistence
+// CHECK-NEXT: FEAT_DotProd Enable dot product support
+// CHECK-NEXT: FEAT_ECV Enable enhanced counter virtualization extension
+// CHECK-NEXT: FEAT_ETE Enable Embedded Trace Extension
+// CHECK-NEXT: FEAT_FCMA Enable Armv8.3-A Floating-point complex number support
+// CHECK-NEXT: FEAT_FGT Enable fine grained virtualization traps extension
+// CHECK-NEXT: FEAT_FHM Enable FP16 FML instructions
+// CHECK-NEXT: FEAT_FP Enable Armv8.0-A Floating Point Extensions
+// CHECK-NEXT: FEAT_FP16 Enable half-precision floating-point data processing
+// CHECK-NEXT: FEAT_FPAC Enable Armv8.3-A Pointer Authentication Faulting enhancement
+// CHECK-NEXT: FEAT_FRINTTS Enable FRInt[32|64][Z|X] instructions that round a floating-point number to an integer (in FP format) forcing it to fit into a 32- or 64-bit int
+// CHECK-NEXT: FEAT_FlagM Enable Armv8.4-A Flag Manipulation instructions
+// CHECK-NEXT: FEAT_FlagM2 Enable alternative NZCV format for floating point comparisons
+// CHECK-NEXT: FEAT_HBC Enable Armv8.8-A Hinted Conditional Branches Extension
+// CHECK-NEXT: FEAT_HCX Enable Armv8.7-A HCRX_EL2 system register
+// CHECK-NEXT: FEAT_I8MM Enable Matrix Multiply Int8 Extension
+// CHECK-NEXT: FEAT_JSCVT Enable Armv8.3-A JavaScript FP conversion instructions
+// CHECK-NEXT: FEAT_LOR Enable Armv8.1-A Limited Ordering Regions extension
+// CHECK-NEXT: FEAT_LRCPC Enable support for RCPC extension
+// CHECK-NEXT: FEAT_LRCPC2 Enable Armv8.4-A RCPC instructions with Immediate Offsets
+// CHECK-NEXT: FEAT_LRCPC3 Enable Armv8.9-A RCPC instructions for A64 and Advanced SIMD and floating-point instruction set
+// CHECK-NEXT: FEAT_LSE Enable Armv8.1-A Large System Extension (LSE) atomic instructions
+// CHECK-NEXT: FEAT_LSE2 Enable Armv8.4-A Large System Extension 2 (LSE2) atomicity rules
+// CHECK-NEXT: FEAT_MOPS Enable Armv8.8-A memcpy and memset acceleration instructions
+// CHECK-NEXT: FEAT_MPAM Enable Armv8.4-A Memory system Partitioning and Monitoring extension
+// CHECK-NEXT: FEAT_MTE, FEAT_MTE2 Enable Memory Tagging Extension
+// CHECK-NEXT: FEAT_NMI, FEAT_GICv3_NMI Enable Armv8.8-A Non-maskable Interrupts
+// CHECK-NEXT: FEAT_NV, FEAT_NV2 Enable Armv8.4-A Nested Virtualization Enchancement
+// CHECK-NEXT: FEAT_PAN Enable Armv8.1-A Privileged Access-Never extension
+// CHECK-NEXT: FEAT_PAN2 Enable Armv8.2-A PAN s1e1R and s1e1W Variants
+// CHECK-NEXT: FEAT_PAuth Enable Armv8.3-A Pointer Authentication extension
+// CHECK-NEXT: FEAT_PMUv3 Enable Armv8.0-A PMUv3 Performance Monitors extension
+// CHECK-NEXT: FEAT_RAS, FEAT_RASv1p1 Enable Armv8.0-A Reliability, Availability and Serviceability Extensions
+// CHECK-NEXT: FEAT_RDM Enable Armv8.1-A Rounding Double Multiply Add/Subtract instructions
+// CHECK-NEXT: FEAT_SB Enable Armv8.5-A Speculation Barrier
+// CHECK-NEXT: FEAT_SEL2 Enable Armv8.4-A Secure Exception Level 2 extension
+// CHECK-NEXT: FEAT_SME Enable Scalable Matrix Extension (SME)
+// CHECK-NEXT: FEAT_SME2 Enable Scalable Matrix Extension 2 (SME2) instructions
+// CHECK-NEXT: FEAT_SPE Enable Statistical Profiling extension
+// CHECK-NEXT: FEAT_SPECRES Enable Armv8.5-A execution and data prediction invalidation instructions
+// CHECK-NEXT: FEAT_SPECRES2 Enable Speculation Restriction Instruction
+// CHECK-NEXT: FEAT_SPEv1p2 Enable extra register in the Statistical Profiling Extension
+// CHECK-NEXT: FEAT_SSBS, FEAT_SSBS2 Enable Speculative Store Bypass Safe bit
+// CHECK-NEXT: FEAT_SVE Enable Scalable Vector Extension (SVE) instructions
+// CHECK-NEXT: FEAT_SVE2 Enable Scalable Vector Extension 2 (SVE2) instructions
+// CHECK-NEXT: FEAT_SVE_BitPerm Enable bit permutation SVE2 instructions
+// CHECK-NEXT: FEAT_TLBIOS, FEAT_TLBIRANGE Enable Armv8.4-A TLB Range and Maintenance instructions
+// CHECK-NEXT: FEAT_TRBE Enable Trace Buffer Extension
+// CHECK-NEXT: FEAT_TRF Enable Armv8.4-A Trace extension
+// CHECK-NEXT: FEAT_UAO Enable Armv8.2-A UAO PState
+// CHECK-NEXT: FEAT_VHE Enable Armv8.1-A Virtual Host extension
+// CHECK-NEXT: FEAT_WFxT Enable Armv8.7-A WFET and WFIT instruction
+// CHECK-NEXT: FEAT_XS Enable Armv8.7-A limited-TLB-maintenance instruction
diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-c1-pro.c b/clang/test/Driver/print-enabled-extensions/aarch64-c1-pro.c
new file mode 100644
index 0000000000000..d31a598463267
--- /dev/null
+++ b/clang/test/Driver/print-enabled-extensions/aarch64-c1-pro.c
@@ -0,0 +1,71 @@
+// REQUIRES: aarch64-registered-target
+// RUN: %clang --target=aarch64 --print-enabled-extensions -mcpu=c1-pro | FileCheck --strict-whitespace --implicit-check-not=FEAT_ %s
+
+// CHECK: Extensions enabled for the given AArch64 target
+// CHECK-EMPTY:
+// CHECK-NEXT: Architecture Feature(s) Description
+// CHECK-NEXT: FEAT_AMUv1 Enable Armv8.4-A Activity Monitors extension
+// CHECK-NEXT: FEAT_AMUv1p1 Enable Armv8.6-A Activity Monitors Virtualization support
+// CHECK-NEXT: FEAT_AdvSIMD Enable Advanced SIMD instructions
+// CHECK-NEXT: FEAT_BF16 Enable BFloat16 Extension
+// CHECK-NEXT: FEAT_BTI ...
[truncated]
|
|
It looks like all C1 series enabled SME and MOPS unconditionally. But according to Arm, only C1-Ultra/Premium are forced with SME support, and MOPS may cause performance degradation. https://developer.arm.com/documentation/111076/0100
https://developer.arm.com/documentation/111077/8-0
|
My understanding is that the convention of the |
jthackray
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes. LGTM now.
As far as I know, -mcpu=c1-nano did 't enable optional crypto by default unless +crypto is appended. While SME is also optional for C1-Nano, but is always enabled, which feels very inconsistent. And if LLVM implemented SME auto-vectorization in the future, binaries compiled with -mcpu=c1-nano won't be able to execute on non-SME-configured C1-Nano. As for MOPS, I'm just concerned that -mcpu=c1-* will result in memcpy and memset to be inlined unconditionally, which according to Arm's errata will result in performance degradation, and thus be inconsistent with the user's intended purpose of maximum performance. |
On the other hand, if someone explicitly writes a MOPS instruction in assembly language, it shouldn't fail to assemble. The questions of "does the CPU understand this instruction?" and "is it a good idea to use it in code generation?" are conceptually separate. |
|
We do have a convention for CPUs, agreed with the Arm GNU team as we want CPU features to be consistent as possible across toolchains, that for each Arm CPU we enable all optional features (like SME), with crypto being the exception of always being opt-in. This is partly historical as GCC has always modelled it that way and crypto extensions are subject to export control. I understand that a lot of this will look inconsistent externally. As statham-arm mentions we want to separate what features the CPU supports from whether it is the right thing to do to make use of them. |
But at the moment llvm doesn't seem to be able to distinguish between the two. I think a tune option could be added to avoid preferring MOPS in code generation, similar to x86's “FeatureFSRM”.
Well, that makes sense. And it seems unlikely that there will be actual products that indeed don't have C1-SME2. But I'm still concerned that this will cause people to avoid using |
davemgreen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the features and they look OK to me, using a comparison to the previous generation. RPRFM should be added to these cpus too, but that is only being added in #170490.
This patch adds initial support for the Arm v9.3 C1 processors: * C1-Nano * C1-Pro * C1-Premium * C1-Ultra For more information on each, see: https://developer.arm.com/Processors/C1-Nano https://developer.arm.com/Processors/C1-Pro https://developer.arm.com/Processors/C1-Premium https://developer.arm.com/Processors/C1-Ultra Technical Reference Manual for C1-Nano: https://developer.arm.com/documentation/107753/latest/ Technical Reference Manual for C1-Pro: https://developer.arm.com/documentation/107771/latest/ Technical Reference Manual for C1-Premium: https://developer.arm.com/documentation/109416/latest/ Technical Reference Manual for C1-Ultra: https://developer.arm.com/documentation/108014/latest/
This patch adds initial support for the Arm v9.3 C1 processors:
For more information on each, see:
https://developer.arm.com/Processors/C1-Nano
https://developer.arm.com/Processors/C1-Pro
https://developer.arm.com/Processors/C1-Premium
https://developer.arm.com/Processors/C1-Ultra
Technical Reference Manual for C1-Nano:
https://developer.arm.com/documentation/107753/latest/
Technical Reference Manual for C1-Pro:
https://developer.arm.com/documentation/107771/latest/
Technical Reference Manual for C1-Premium:
https://developer.arm.com/documentation/109416/latest/
Technical Reference Manual for C1-Ultra:
https://developer.arm.com/documentation/108014/latest/