Commit c9b595d
committed
Optimize AArch64 memset to use NEON DUP instruction for small sizes
This change improves memset code generation for non-zero values on AArch64
for sizes 4, 8, and 16 bytes by using NEON's DUP instruction instead of
the less efficient multiplication with 0x01010101 pattern.
Changes:
1. In SelectionDAG.cpp: For AArch64 targets, generate vector splats for
scalar i32/i64 memset operations, which are then efficiently lowered to
DUP instructions.
2. In AArch64ISelLowering.cpp: Modify getOptimalMemOpType and
getOptimalMemOpLLT to return v16i8 for non-zero memset operations of
any size when NEON is available (previously only for sizes >= 32 bytes).
3. Update test expectations to verify the new DUP-based code generation
for both NEON and GPR code paths.
The optimization is restricted to AArch64 only to avoid breaking RISCV
and X86 tests.
Signed-off-by: Osama Abdelkader <[email protected]>1 parent 269f264 commit c9b595d
File tree
4 files changed
+113
-41
lines changed- llvm
- lib
- CodeGen/SelectionDAG
- Target/AArch64
- test/CodeGen/AArch64
4 files changed
+113
-41
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8543 | 8543 | | |
8544 | 8544 | | |
8545 | 8545 | | |
| 8546 | + | |
| 8547 | + | |
| 8548 | + | |
| 8549 | + | |
| 8550 | + | |
| 8551 | + | |
| 8552 | + | |
| 8553 | + | |
| 8554 | + | |
| 8555 | + | |
| 8556 | + | |
| 8557 | + | |
| 8558 | + | |
| 8559 | + | |
8546 | 8560 | | |
8547 | 8561 | | |
8548 | 8562 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18328 | 18328 | | |
18329 | 18329 | | |
18330 | 18330 | | |
18331 | | - | |
| 18331 | + | |
18332 | 18332 | | |
18333 | 18333 | | |
18334 | | - | |
| 18334 | + | |
| 18335 | + | |
18335 | 18336 | | |
18336 | 18337 | | |
18337 | 18338 | | |
| |||
18341 | 18342 | | |
18342 | 18343 | | |
18343 | 18344 | | |
18344 | | - | |
18345 | | - | |
| 18345 | + | |
| 18346 | + | |
| 18347 | + | |
18346 | 18348 | | |
18347 | | - | |
| 18349 | + | |
| 18350 | + | |
18348 | 18351 | | |
18349 | 18352 | | |
18350 | 18353 | | |
| |||
18358 | 18361 | | |
18359 | 18362 | | |
18360 | 18363 | | |
18361 | | - | |
| 18364 | + | |
18362 | 18365 | | |
18363 | 18366 | | |
18364 | | - | |
| 18367 | + | |
| 18368 | + | |
18365 | 18369 | | |
18366 | 18370 | | |
18367 | 18371 | | |
| |||
18371 | 18375 | | |
18372 | 18376 | | |
18373 | 18377 | | |
18374 | | - | |
18375 | | - | |
| 18378 | + | |
| 18379 | + | |
| 18380 | + | |
18376 | 18381 | | |
18377 | | - | |
| 18382 | + | |
| 18383 | + | |
18378 | 18384 | | |
18379 | 18385 | | |
18380 | 18386 | | |
| |||
29702 | 29708 | | |
29703 | 29709 | | |
29704 | 29710 | | |
| 29711 | + | |
| 29712 | + | |
| 29713 | + | |
| 29714 | + | |
| 29715 | + | |
| 29716 | + | |
| 29717 | + | |
| 29718 | + | |
| 29719 | + | |
| 29720 | + | |
| 29721 | + | |
| 29722 | + | |
| 29723 | + | |
| 29724 | + | |
| 29725 | + | |
| 29726 | + | |
| 29727 | + | |
| 29728 | + | |
| 29729 | + | |
| 29730 | + | |
| 29731 | + | |
| 29732 | + | |
| 29733 | + | |
| 29734 | + | |
| 29735 | + | |
29705 | 29736 | | |
29706 | 29737 | | |
29707 | 29738 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
475 | 475 | | |
476 | 476 | | |
477 | 477 | | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
478 | 481 | | |
479 | 482 | | |
480 | 483 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
37 | 43 | | |
38 | 44 | | |
39 | 45 | | |
40 | 46 | | |
41 | 47 | | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
50 | 62 | | |
51 | 63 | | |
52 | 64 | | |
53 | 65 | | |
54 | 66 | | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
63 | 81 | | |
64 | 82 | | |
65 | 83 | | |
| |||
110 | 128 | | |
111 | 129 | | |
112 | 130 | | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | | - | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
121 | 145 | | |
122 | 146 | | |
123 | 147 | | |
| |||
0 commit comments