-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AArch64] Update zero latency instructions in Neoverse scheduling tables #165690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…eoverse cores NeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and V3AE. Use NeoverseZeroMove for Neoverse-V1, N2, N3 in the same way, including these instructions: MOV Xd|Wd, #0|XZR|WZR For all Neoverse targets, the following instructions are also decoded as not utilizing the scheduling and execution resources of the machine: MOV Wd,Wn MOV Xd,Xn For Neoverse-N3 only, these instructions also have zero latency FMOV Dd, Dn FMOV Sd, Sn Change-Id: I1a5f86e049798582d33d96ba99389e4b2ffb210e
|
@llvm/pr-subscribers-backend-aarch64 Author: Simon Wallis (simonwallis2) ChangesNeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and V3AE. For all the above Neoverse targets, the following instructions are also decoded as not utilizing the scheduling and execution resources of the machine: For Neoverse-N3 only, these instructions also have zero latency Patch is 43.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/165690.diff 16 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseN2.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseN2.td
index 50f10114989d0..d1ce5a13d0510 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseN2.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseN2.td
@@ -72,6 +72,13 @@ def : WriteRes<WriteLDHi, []> { let Latency = 4; }
// Define customized scheduler read/write types specific to the Neoverse N2.
//===----------------------------------------------------------------------===//
+
+// Define generic 0 micro-op types
+def N2Write_0c : SchedWriteRes<[]> {
+ let Latency = 0;
+ let NumMicroOps = 0;
+}
+
// Define generic 1 micro-op types
def N2Write_1c_1B : SchedWriteRes<[N2UnitB]> { let Latency = 1; }
@@ -645,6 +652,21 @@ def N2Write_11c_9L01_9S_9V : SchedWriteRes<[N2UnitL01, N2UnitL01, N2UnitL01,
let NumMicroOps = 27;
}
+//===----------------------------------------------------------------------===//
+// Define predicate-controlled types
+
+def N2Write_0or1c_1I : SchedWriteVariant<[
+ SchedVar<NeoverseZeroMove, [N2Write_0c]>,
+ SchedVar<NoSchedPred, [N2Write_1c_1I]>]>;
+
+def N2Write_0or2c_1V : SchedWriteVariant<[
+ SchedVar<NeoverseZeroMove, [N2Write_0c]>,
+ SchedVar<NoSchedPred, [N2Write_2c_1V]>]>;
+
+def N2Write_0or3c_1M0 : SchedWriteVariant<[
+ SchedVar<NeoverseZeroMove, [N2Write_0c]>,
+ SchedVar<NoSchedPred, [N2Write_3c_1M0]>]>;
+
//===----------------------------------------------------------------------===//
// Define types for arithmetic and logical ops with short shifts
def N2Write_Arith : SchedWriteVariant<[
@@ -680,6 +702,7 @@ def : InstRW<[N2Write_1c_1B_1S], (instrs BL, BLR)>;
// ALU, basic
// ALU, basic, flagset
def : SchedAlias<WriteI, N2Write_1c_1I>;
+def : InstRW<[N2Write_0or1c_1I], (instregex "^MOVZ[WX]i$")>;
// ALU, extend and shift
def : SchedAlias<WriteIEReg, N2Write_2c_1M>;
@@ -691,7 +714,8 @@ def : SchedAlias<WriteISReg, N2Write_Arith>;
// Logical, shift, no flagset
def : InstRW<[N2Write_1c_1I],
- (instregex "^(AND|BIC|EON|EOR|ORN|ORR)[WX]rs$")>;
+ (instregex "^(AND|BIC|EON|EOR|ORN)[WX]rs$")>;
+def : InstRW<[N2Write_0or1c_1I], (instregex "^ORR[WX]rs$")>;
// Logical, shift, flagset
def : InstRW<[N2Write_Logical], (instregex "^(AND|BIC)S[WX]rs$")>;
@@ -882,7 +906,7 @@ def : SchedAlias<WriteFImm, N2Write_2c_1V>;
def : InstRW<[N2Write_2c_1V], (instrs FMOVHr, FMOVSr, FMOVDr)>;
// FP transfer, from gen to low half of vec reg
-def : InstRW<[N2Write_3c_1M0], (instrs FMOVWHr, FMOVXHr, FMOVWSr, FMOVXDr,
+def : InstRW<[N2Write_0or3c_1M0], (instrs FMOVWHr, FMOVXHr, FMOVWSr, FMOVXDr,
FMOVHWr, FMOVHXr, FMOVSWr, FMOVDXr)>;
// FP transfer, from gen to high half of vec reg
@@ -1225,6 +1249,8 @@ def : InstRW<[N2Write_3c_1V0], (instrs BFCVT)>;
// ASIMD unzip/zip
// Handled by SchedAlias<WriteV[dq], ...>
+def : InstRW<[N2Write_0or2c_1V], (instrs MOVID, MOVIv2d_ns)>;
+
// ASIMD duplicate, gen reg
def : InstRW<[N2Write_3c_1M0], (instregex "^DUPv.+gpr")>;
diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td
index 411b372a3f533..32d48ca66ee2d 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td
@@ -553,6 +553,22 @@ def N3Write_16c_16V0 : SchedWriteRes<[N3UnitV0, N3UnitV0, N3UnitV0, N3UnitV0,
let NumMicroOps = 16;
}
+
+//===----------------------------------------------------------------------===//
+// Define predicate-controlled types
+
+def N3Write_0or1c_1I : SchedWriteVariant<[
+ SchedVar<NeoverseZeroMove, [N3Write_0c]>,
+ SchedVar<NoSchedPred, [N3Write_1c_1I]>]>;
+
+def N3Write_0or2c_1V : SchedWriteVariant<[
+ SchedVar<NeoverseZeroMove, [N3Write_0c]>,
+ SchedVar<NoSchedPred, [N3Write_2c_1V]>]>;
+
+def N3Write_0or3c_1M0 : SchedWriteVariant<[
+ SchedVar<NeoverseZeroMove, [N3Write_0c]>,
+ SchedVar<NoSchedPred, [N3Write_3c_1M0]>]>;
+
// Miscellaneous
// -----------------------------------------------------------------------------
@@ -581,6 +597,7 @@ def : InstRW<[N3Write_1c_1B_1S], (instrs BL, BLR)>;
// Conditional compare
// Conditional select
def : SchedAlias<WriteI, N3Write_1c_1I>;
+def : InstRW<[N3Write_0or1c_1I], (instregex "^MOVZ[WX]i$")>;
// ALU, extend and shift
def : SchedAlias<WriteIEReg, N3Write_2c_1M>;
@@ -610,7 +627,8 @@ def : InstRW<[N3Write_1c_1I], (instrs GMI, SUBP, SUBPS)>;
// Logical, shift, no flagset
def : InstRW<[N3Write_1c_1I],
- (instregex "^(AND|BIC|EON|EOR|ORN|ORR)[WX]rs$")>;
+ (instregex "^(AND|BIC|EON|EOR|ORN)[WX]rs$")>;
+def : InstRW<[N3Write_0or1c_1I], (instregex "^ORR[WX]rs$")>;
// Logical, shift, flagset
def : InstRW<[N3Write_2c_1M], (instregex "^(AND|BIC)S[WX]rs$")>;
@@ -855,10 +873,11 @@ def : SchedAlias<WriteFCvt, N3Write_3c_1V0>;
def : SchedAlias<WriteFImm, N3Write_2c_1V>;
// FP move, register
-def : InstRW<[N3Write_2c_1V], (instrs FMOVHr, FMOVSr, FMOVDr)>;
+def : InstRW<[N3Write_2c_1V], (instrs FMOVHr)>;
+def : InstRW<[N3Write_0c], (instrs FMOVSr, FMOVDr)>;
// FP transfer, from gen to low half of vec reg
-def : InstRW<[N3Write_3c_1M0], (instrs FMOVWHr, FMOVXHr, FMOVWSr, FMOVXDr)>;
+def : InstRW<[N3Write_0or3c_1M0], (instrs FMOVWHr, FMOVXHr, FMOVWSr, FMOVXDr)>;
// FP transfer, from gen to high half of vec reg
def : InstRW<[N3Write_5c_1M0_1V], (instrs FMOVXDHighr)>;
@@ -1186,6 +1205,7 @@ def : InstRW<[N3Write_3c_1V0], (instrs BFCVT)>;
// ASIMD transpose
// ASIMD unzip/zip
// Covered by WriteV[dq]
+def : InstRW<[N3Write_0or2c_1V], (instrs MOVID, MOVIv2d_ns)>;
// ASIMD duplicate, gen reg
def : InstRW<[N3Write_3c_1M0], (instregex "^DUPv.+gpr")>;
diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV1.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV1.td
index 3cbfc59423c9a..8d33ca22616c2 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV1.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV1.td
@@ -472,6 +472,21 @@ def V1Write_11c_9L01_9S_9V : SchedWriteRes<[V1UnitL01, V1UnitL01, V1UnitL01,
V1UnitV, V1UnitV, V1UnitV,
V1UnitV, V1UnitV, V1UnitV]>;
+//===----------------------------------------------------------------------===//
+// Define predicate-controlled types
+
+def V1Write_0or1c_1I : SchedWriteVariant<[
+ SchedVar<NeoverseZeroMove, [V1Write_0c_0Z]>,
+ SchedVar<NoSchedPred, [V1Write_1c_1I]>]>;
+
+def V1Write_0or2c_1V : SchedWriteVariant<[
+ SchedVar<NeoverseZeroMove, [V1Write_0c_0Z]>,
+ SchedVar<NoSchedPred, [V1Write_2c_1V]>]>;
+
+def V1Write_0or3c_1M0 : SchedWriteVariant<[
+ SchedVar<NeoverseZeroMove, [V1Write_0c_0Z]>,
+ SchedVar<NoSchedPred, [V1Write_3c_1M0]>]>;
+
//===----------------------------------------------------------------------===//
// Define forwarded types
@@ -603,6 +618,7 @@ def : InstRW<[V1Write_1c_1I_1Flg],
"^(ADC|SBC)S[WX]r$",
"^ANDS[WX]ri$",
"^(AND|BIC)S[WX]rr$")>;
+def : InstRW<[V1Write_0or1c_1I], (instregex "^MOVZ[WX]i$")>;
// ALU, extend and shift
def : SchedAlias<WriteIEReg, V1Write_2c_1M>;
@@ -623,7 +639,8 @@ def : InstRW<[V1WriteISRegS],
(instregex "^(ADD|SUB)S(([WX]r[sx])|Xrx64)$")>;
// Logical, shift, no flagset
-def : InstRW<[V1Write_1c_1I], (instregex "^(AND|BIC|EON|EOR|ORN|ORR)[WX]rs$")>;
+def : InstRW<[V1Write_1c_1I], (instregex "^(AND|BIC|EON|EOR|ORN)[WX]rs$")>;
+def : InstRW<[V1Write_0or1c_1I], (instregex "^ORR[WX]rs$")>;
// Logical, shift, flagset
def : InstRW<[V1Write_2c_1M_1Flg], (instregex "^(AND|BIC)S[WX]rs$")>;
@@ -805,7 +822,7 @@ def : SchedAlias<WriteFImm, V1Write_2c_1V>;
def : InstRW<[V1Write_2c_1V], (instrs FMOVHr, FMOVSr, FMOVDr)>;
// FP transfer, from gen to low half of vec reg
-def : InstRW<[V1Write_3c_1M0], (instrs FMOVWHr, FMOVXHr, FMOVWSr, FMOVXDr)>;
+def : InstRW<[V1Write_0or3c_1M0], (instrs FMOVWHr, FMOVXHr, FMOVWSr, FMOVXDr)>;
// FP transfer, from gen to high half of vec reg
def : InstRW<[V1Write_5c_1M0_1V], (instrs FMOVXDHighr)>;
@@ -1122,6 +1139,7 @@ def : InstRW<[V1Write_3c_1V02], (instrs BFCVT)>;
// ASIMD transpose
// ASIMD unzip/zip
// Covered by "SchedAlias (WriteV[dq]...)" above
+def : InstRW<[V1Write_0or2c_1V], (instrs MOVID, MOVIv2d_ns)>;
// ASIMD duplicate, gen reg
def : InstRW<[V1Write_3c_1M0],
diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
index 2387f176f3051..1ef087f07022d 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td
@@ -94,7 +94,10 @@ def : WriteRes<WriteLDHi, []> { let Latency = 4; }
//===----------------------------------------------------------------------===//
// Define generic 0 micro-op types
-def V2Write_0c : SchedWriteRes<[]> { let Latency = 0; }
+def V2Write_0c : SchedWriteRes<[]> {
+ let Latency = 0;
+ let NumMicroOps = 0;
+}
// Define generic 1 micro-op types
diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV3.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV3.td
index e23576a20d277..3dd2988088f0b 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV3.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV3.td
@@ -94,7 +94,10 @@ def : WriteRes<WriteLDHi, []> { let Latency = 4; }
//===----------------------------------------------------------------------===//
// Define generic 0 micro-op types
-def V3Write_0c : SchedWriteRes<[]> { let Latency = 0; }
+def V3Write_0c : SchedWriteRes<[]> {
+ let Latency = 0;
+ let NumMicroOps = 0;
+}
// Define generic 1 micro-op types
diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV3AE.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV3AE.td
index 0f1ec669a4e5e..19b56260387e1 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV3AE.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV3AE.td
@@ -89,7 +89,10 @@ def : WriteRes<WriteLDHi, []> { let Latency = 4; }
//===----------------------------------------------------------------------===//
// Define generic 0 micro-op types
-def V3AEWrite_0c : SchedWriteRes<[]> { let Latency = 0; }
+def V3AEWrite_0c : SchedWriteRes<[]> {
+ let Latency = 0;
+ let NumMicroOps = 0;
+}
// Define generic 1 micro-op types
diff --git a/llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s
index cf1cf0e98c801..d3343ab055887 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Neoverse/N2-basic-instructions.s
@@ -2508,14 +2508,14 @@ drps
# CHECK-NEXT: 1 2 0.50 bics x3, xzr, x3, lsl #1
# CHECK-NEXT: 1 2 0.50 tst w3, w7, lsl #31
# CHECK-NEXT: 1 2 0.50 tst x2, x20, asr #2
-# CHECK-NEXT: 1 1 0.25 mov x3, x6
-# CHECK-NEXT: 1 1 0.25 mov x3, xzr
-# CHECK-NEXT: 1 1 0.25 mov wzr, w2
-# CHECK-NEXT: 1 1 0.25 mov w3, w5
+# CHECK-NEXT: 0 0 0.00 mov x3, x6
+# CHECK-NEXT: 0 0 0.00 mov x3, xzr
+# CHECK-NEXT: 0 0 0.00 mov wzr, w2
+# CHECK-NEXT: 0 0 0.00 mov w3, w5
# CHECK-NEXT: 1 1 0.25 movz w2, #0, lsl #16
# CHECK-NEXT: 1 1 0.25 mov w2, #-1235
# CHECK-NEXT: 1 1 0.25 mov x2, #5299989643264
-# CHECK-NEXT: 1 1 0.25 mov x2, #0
+# CHECK-NEXT: 0 0 0.00 mov x2, #0
# CHECK-NEXT: 1 1 0.25 movk w3, #0
# CHECK-NEXT: 1 1 0.25 movz x4, #0, lsl #16
# CHECK-NEXT: 1 1 0.25 movk w5, #0, lsl #16
@@ -2557,7 +2557,7 @@ drps
# CHECK: Resource pressure per iteration:
# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2] [3.0] [3.1] [4] [5] [6.0] [6.1] [7] [8]
-# CHECK-NEXT: 11.00 11.00 33.00 33.00 87.33 151.33 151.33 517.00 251.00 162.50 162.50 215.50 85.50
+# CHECK-NEXT: 11.00 11.00 33.00 33.00 87.33 151.33 151.33 515.75 249.75 161.25 161.25 215.50 85.50
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2] [3.0] [3.1] [4] [5] [6.0] [6.1] [7] [8] Instructions:
@@ -3692,14 +3692,14 @@ drps
# CHECK-NEXT: - - - - - - - 0.50 0.50 - - - - bics x3, xzr, x3, lsl #1
# CHECK-NEXT: - - - - - - - 0.50 0.50 - - - - tst w3, w7, lsl #31
# CHECK-NEXT: - - - - - - - 0.50 0.50 - - - - tst x2, x20, asr #2
-# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - mov x3, x6
-# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - mov x3, xzr
-# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - mov wzr, w2
-# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - mov w3, w5
+# CHECK-NEXT: - - - - - - - - - - - - - mov x3, x6
+# CHECK-NEXT: - - - - - - - - - - - - - mov x3, xzr
+# CHECK-NEXT: - - - - - - - - - - - - - mov wzr, w2
+# CHECK-NEXT: - - - - - - - - - - - - - mov w3, w5
# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - movz w2, #0, lsl #16
# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - mov w2, #-1235
# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - mov x2, #5299989643264
-# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - mov x2, #0
+# CHECK-NEXT: - - - - - - - - - - - - - mov x2, #0
# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - movk w3, #0
# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - movz x4, #0, lsl #16
# CHECK-NEXT: - - - - - - - 0.25 0.25 0.25 0.25 - - movk w5, #0, lsl #16
diff --git a/llvm/test/tools/llvm-mca/AArch64/Neoverse/N3-basic-instructions.s b/llvm/test/tools/llvm-mca/AArch64/Neoverse/N3-basic-instructions.s
index b9758280e2491..f7311b5e41b2e 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Neoverse/N3-basic-instructions.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Neoverse/N3-basic-instructions.s
@@ -1888,7 +1888,7 @@ drps
# CHECK-NEXT: 1 2 0.50 fccmpe d31, d5, #7, ne
# CHECK-NEXT: 1 2 0.50 fcsel s3, s20, s9, pl
# CHECK-NEXT: 1 2 0.50 fcsel d9, d10, d11, mi
-# CHECK-NEXT: 1 2 0.50 fmov s0, s1
+# CHECK-NEXT: 0 0 0.00 fmov s0, s1
# CHECK-NEXT: 1 2 0.50 fabs s2, s3
# CHECK-NEXT: 1 2 0.50 fneg s4, s5
# CHECK-NEXT: 1 7 1.00 fsqrt s6, s7
@@ -1901,7 +1901,7 @@ drps
# CHECK-NEXT: 1 3 1.00 frinta s20, s21
# CHECK-NEXT: 1 3 1.00 frintx s22, s23
# CHECK-NEXT: 1 3 1.00 frinti s24, s25
-# CHECK-NEXT: 1 2 0.50 fmov d0, d1
+# CHECK-NEXT: 0 0 0.00 fmov d0, d1
# CHECK-NEXT: 1 2 0.50 fabs d2, d3
# CHECK-NEXT: 1 2 0.50 fneg d4, d5
# CHECK-NEXT: 1 12 1.00 fsqrt d6, d7
@@ -2508,14 +2508,14 @@ drps
# CHECK-NEXT: 1 2 0.50 bics x3, xzr, x3, lsl #1
# CHECK-NEXT: 1 2 0.50 tst w3, w7, lsl #31
# CHECK-NEXT: 1 2 0.50 tst x2, x20, asr #2
-# CHECK-NEXT: 1 1 0.25 mov x3, x6
-# CHECK-NEXT: 1 1 0.25 mov x3, xzr
-# CHECK-NEXT: 1 1 0.25 mov wzr, w2
-# CHECK-NEXT: 1 1 0.25 mov w3, w5
+# CHECK-NEXT: 0 0 0.00 mov x3, x6
+# CHECK-NEXT: 0 0 0.00 mov x3, xzr
+# CHECK-NEXT: 0 0 0.00 mov wzr, w2
+# CHECK-NEXT: 0 0 0.00 mov w3, w5
# CHECK-NEXT: 1 1 0.25 movz w2, #0, lsl #16
# CHECK-NEXT: 1 1 0.25 mov w2, #-1235
# CHECK-NEXT: 1 1 0.25 mov x2, #5299989643264
-# CHECK-NEXT: 1 1 0.25 mov x2, #0
+# CHECK-NEXT: 0 0 0.00 mov x2, #0
# CHECK-NEXT: 1 1 0.25 movk w3, #0
# CHECK-NEXT: 1 1 0.25 movz x4, #0, lsl #16
# CHECK-NEXT: 1 1 0.25 movk w5, #0, lsl #16
@@ -2557,7 +2557,7 @@ drps
# CHECK: Resource pressure per iteration:
# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2] [3.0] [3.1] [4] [5] [6.0] [6.1] [7] [8]
-# CHECK-NEXT: 11.00 11.00 33.00 33.00 99.33 163.33 163.33 357.75 212.75 156.25 156.25 184.50 64.50
+# CHECK-NEXT: 11.00 11.00 33.00 33.00 99.33 163.33 163.33 356.50 211.50 155.00 155.00 183.50 63.50
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0.0] [0.1] [1.0] [1.1] [2] [3.0] [3.1] [4] [5] [6.0] [6.1] [7] [8] Instructions:
@@ -3072,7 +3072,7 @@ drps
# CHECK-NEXT: - - - - - - - - - - - 0.50 0.50 fccmpe d31, d5, #7, ne
# CHECK-NEXT: - - - - - - - - - - - 0.50 0.50 fcsel s3, s20, s9, pl
# CHECK-NEXT: - - - - - - - - - - - 0.50 0.50 fcsel d9, d10, d11, mi
-# CHECK-NEXT: - - - - - - - - - - - 0.50 0.50 fmov s0, s1
+# CHECK-NEXT: - - - - - - - - - - - - - fmov s0, s1
# CHECK-NEXT: - - - - - - - - - - - 0.50 0.50 fabs s2, s3
# CHECK-NEXT: - - - - - - - - - - - 0.50 0.50 fneg s4, s5
# CHECK-NEXT: - - - - - - - - - - - 1.00 - fsqrt s6, s7
@@ -3085,7 +3085,7 @@ drps
# CHECK-NEXT: - - - - - - - - - - - 1.00 - frinta s20, s21
# CHECK-NEXT: - - - - - - - - - - - 1.00 - frintx s22, s23
# CHECK-NEXT: - - - - - - - - - - - 1.00 - frinti s24, s25
-# CHECK-NEXT: - - - - - - - - - - - 0.50 0.50 fmov d0, d1
+# CHECK-NEXT: - ...
[truncated]
|
…eoverse cores NeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and V3AE. Use NeoverseZeroMove for Neoverse-V1, N2, N3 in the same way, including these instructions: MOV Xd|Wd, #0|XZR|WZR For all Neoverse targets, the following instructions are also decoded as not utilizing the scheduling and execution resources of the machine: MOV Wd,Wn MOV Xd,Xn For Neoverse-N3 only, these instructions also have zero latency FMOV Dd, Dn FMOV Sd, Sn Change-Id: I955cfe3efc689bea305a708eb6d7259dced6fe04
rj-jesus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, the changes generally look good, but I'm not sure we should be modelling zero-latency moves with zero micro-ops? AFAIU these instructions still count as a micro-op (towards decode bandwidth, for example) despite not using execution resources. If you can share any references to the contrary, that would be much appreciated.
NeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and V3AE. Use NeoverseZeroMove for Neoverse-V1, N2, N3 in the same way, including these instructions: MOV Xd|Wd, #0|XZR|WZR For all Neoverse targets, the following instructions are also decoded as not utilizing the scheduling and execution resources of the machine: MOV Wd,Wn MOV Xd,Xn For Neoverse-N3 only, these instructions also have zero latency FMOV Dd, Dn FMOV Sd, Sn Change-Id: Ie6b1f5c3f4d74f26bdd4c67c5e6c5acf6a8e00cc
|
The categories of instructions with zero micro ops were previously added to the .td files for V1 (in 2023) and for N3 (in 2024). This patch extends this to N2, V2, V3, V3AE. I based this patch solely on my own reading of the Neoverse SWOGs, in particular section 4.15 (variously 4.12, 4.11) about zero latency instructions not using the scheduling and execution resources, and section 2.1 about macro ops proceeding through register renaming and dispatch. It looks like these zero-latency instructions still take up decode resources, which we don’t currently describe explicitly. |
I believe we implicitly model decode constraints in the "IssueWidth". My main concern with modelling these instructions with zero micro-ops is that it might trick the machine scheduler into assuming they can be scheduled freely, which AFAIU isn't true. Also, OP_RETIRED suggests these instructions do count as a micro-op. Do you have any performance data that suggests this is preferable for performance or at least neutral? |
On Neoverse-V2 the benchmarks that I ran reported zero change in performance. |
|
Consider this example, which issues 20 If you normalise the IPC perf computes by 21/22 (because the CMP+B is fused), you get exactly 6. If we make zero-latency moves zero micro-ops, then we'll have: (The IPC becomes bottlenecked on the MicroOpBufferSize.) It does seem to me that these instructions should be modelled as one micro-op... Unless I'm missing something or unless there's a compelling reason for us to make this change, I believe it would be better if we left it as is. What do you think? |
I see that if making this zero micro-op change in isolation leads to llvm-mca reporting unrealistic cycle counts then that would be unhelpful. We still do not model all of the dispatch constraints described in the SWOG Special Considerations section 4.1. Modelling the MOPS per cycle limit would be key to reporting sensible numbers for an instruction with zero uOPs. Thanks for supplying these examples, very useful. I tried modifying them to clarify to myself how the IPC is affected by MOPS limits and uOPs limits but was not able to provide a definite answer. I propose to update the patch and remove the zero-micro-op change. |
NeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and V3AE. Use NeoverseZeroMove for Neoverse-V1, N2, N3 in the same way, including these instructions: MOV Xd|Wd, #0|XZR|WZR For all Neoverse targets, the following instructions are also decoded as not utilizing the scheduling and execution resources of the machine: MOV Wd,Wn MOV Xd,Xn For Neoverse-N3 only, these instructions also have zero latency FMOV Dd, Dn FMOV Sd, Sn Change-Id: I7a6a971cf75c60d8f75b210f0529c4ad813775a3
NeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and V3AE. Use NeoverseZeroMove for Neoverse-V1, N2, N3 in the same way, including these instructions: MOV Xd|Wd, #0|XZR|WZR For all Neoverse targets, the following instructions are also decoded as not utilizing the scheduling and execution resources of the machine: MOV Wd,Wn MOV Xd,Xn For Neoverse-N3 only, these instructions also have zero latency FMOV Dd, Dn FMOV Sd, Sn Change-Id: I95c53d373f35bb0bea5174a16c7ab3ac25acf684
|
I reverted my changes for V2, V3, V3A and their corresponding tests, I updated the changes and tests for N2, N3, V1 so they match V2,V3,V3A. |
It would certainly be good to model those dispatch constraints (FYI, in case you haven't noticed them, this patch mentions slightly different dispatch constraints for the Neoverse V2 than those in the latest SWOG). There are a few other attributes that are currently not modelled in the Neoverse scheduling tables that could be worth looking at, for example |
NeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and V3AE. Use NeoverseZeroMove for Neoverse-V1, N2, N3 in the same way, including these instructions: MOV Xd|Wd, #0|XZR|WZR For all Neoverse targets, the following instructions are also decoded as not utilizing the scheduling and execution resources of the machine: MOV Wd,Wn MOV Xd,Xn For Neoverse-N3 only, these instructions also have zero latency FMOV Dd, Dn FMOV Sd, Sn Change-Id: Ibbc0ba1da02dd4bf5ca28b33164d8fa4e93958d6
rj-jesus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more suggestions but otherwise this is almost good to go.
Thanks for the patience. :)
|
|
||
| // Predicate set/initialize, set flags | ||
| def : InstRW<[N3Write_2c_1M], (instregex "^PTRUES_[BHSD]")>; | ||
| def : InstRW<[N3Write_0or2c_1M], (instregex "^PTRUES_[BHSD]")>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTRUES isn't listed in Section 4.11, but it is mentioned in Table 2-23, so I'll assume the latter to be correct (matching what you've implemented).
NeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and V3AE. Use NeoverseZeroMove for Neoverse-V1, N2, N3 in the same way, including these instructions: MOV Xd|Wd, #0|XZR|WZR For all Neoverse targets, the following instructions are also decoded as not utilizing the scheduling and execution resources of the machine: MOV Wd,Wn MOV Xd,Xn For Neoverse-N3 only, these instructions also have zero latency FMOV Dd, Dn FMOV Sd, Sn Change-Id: I2d51b0ee6736d14f8212583f234431c555cc2574
rj-jesus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, cheers. Just please fix the conflict in llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td and let the CI tests run. :)
🐧 Linux x64 Test Results
|
NeoverseZeroMove was introduced for Neoverse-V2 and was added to V3 and V3AE.
Use NeoverseZeroMove for Neoverse-V1, N2, N3 in the same way, including these instructions:
MOV Xd|Wd, #0|XZR|WZR
For all the above Neoverse targets, the following instructions are also decoded as not utilizing the scheduling and execution resources of the machine:
MOV Wd,Wn
MOV Xd,Xn
For Neoverse-N3 only, these instructions also have zero latency
FMOV Dd, Dn
FMOV Sd, Sn
MOV Vd, Vn (vector)
MOV Zd.D, Zn.D
PTRUE
PFALSE