merge main into amd-staging #585

ronlieb · 2025-11-13T23:54:23Z

No description provided.

Tracing requires liboffload to be initialized, so calling isTracingEnabled() before olInit always returns false. This caused the first trace log to look like: ``` -> OL_SUCCESS ``` instead of: ``` ---> olInit() -> OL_SUCCESS ``` This patch moves the pre-call trace print for olInit so it is emitted only after initialization. It would be possible to add extra logic to detect whether liboffload is already initialized and only postpone the first pre-call print, but this would add unnecessary complexity, especially since this is tablegen code. The difference would matter only in the unlikely case of a crash during a second olInit call. --------- Co-authored-by: Joseph Huber <[email protected]>

Only the fortran source files in flang/test/Intrinsics have been modified. The other files in flang/test will be cleaned up in subsequent commits

- Adopt ifdef and namespace emitters in SubtargeEmitter. - To aid that, factor out emission of different sections of the code into individual helper functions.

…lvm#167896) Reverts llvm#163860

Prepare a 'this' for CXXDefaultInitExprs

…e reduction plans (llvm#165913) The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case. Fixes llvm#165359.

This commit adds optimized assembly versions of single-precision float multiplication and division. Both functions are implemented in a style that can be assembled as either of Arm and Thumb2; for multiplication, a separate implementation is provided for Thumb1. Also, extensive new tests are added for multiplication and division. These implementations can be removed from the build by defining the cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF. Outlying parts of the functionality which are not on the fast path, such as NaN handling and underflow, are handled in helper functions written in C. These can be shared between the Arm/Thumb2 and Thumb1 implementations, and also reused by other optimized assembly functions we hope to add in future.

…ve (llvm#167875) This prevents the backend from crashing for basic uses of __SVCount_t type (e.g., as function arguments), without +sve2p1 or +sme2. Fixes llvm#167462

Without this patch, SmallDenseMap::grow has two separate code paths to grow the bucket array. The code path to handle the small mode has its own traversal over the bucket array. This patch simplifies this logic as follows: 1. Allocate a temporary instance of SmallDenseMap. 2. Move valid key/value pairs to the temporary instance. 3. Move LargeRep to *this. Remarks: - This patch adds moveFromImpl to move key/value pairs. moveFromOldBuckets is updated to use the new helper function. - This patch adds a private constructor to SmallDenseMap that takes an exact number of buckets, accompanied by tag ExactBucketCount. - This patch adds a fast path to deallocateBuckets in case getLargeRep()->NumBuckets == 0, just like destroyAll. This path is used to destruct zombie instances after moves. - In somewhat rare cases, we "grow" from the small mode to the small mode when there are many tombstones in the inline storage. This is handled with another call to moveFrom.

…okupOrTrackRegister (llvm#167841) The LocID for registers is just the register ID. The getLocID function is supposed to hide this detail, but it wasn't being used consistently. This avoids a bunch of implicit casts from Register or MCRegister to unsigned.

… (NFC) (llvm#155262) CMN also has a function like this, we should do the same with CMP.

This commit adds a new `ValueMatcher` class that can be used in gtest matching contexts to match against `lldb_private::Value` objects. We always match against the values `value_type` and `context_type`. For HostAddress values we will also match against the expected host buffer contents. For Scalar, FileAddress, and LoadAddress values we match against an expected Scalar value. The matcher is used to improve the quality of the tests in the `DwarfExpressionTest.cpp` file. Previously, the local `Evaluate` function would return an `Expected<Scalar>` value which makes it hard to verify that we actually get a Value of the expected type without adding custom evaluation code. Now we return an `Expected<Value>` so that we can match against the full value contents. The resulting change improves the quality of the existing checks and in some cases eliminates the need for special code to explicitly check value types. I followed the gtest [guide](https://google.github.io/googletest/gmock_cook_book.html#writing-new-monomorphic-matchers) for writing a new value matcher.

The optimized version of xsgetn for basic_filebuf added in llvm#165223 has an issue where if the reads come from both the buffer and the filesystem it returns the wrong number of characters. This patch should address the issue.

…r zvfbfa (llvm#167819)

This proposal adds a `cl::opt` CLI flag `-bpf-allow-misaligned-mem-access` to BPF target that lets users enable allowing misaligned memory accesses. The motivation behind the proposal is user space eBPF VMs (interpreters or JITs running in user space) typically run on real CPUs where unaligned memory accesses are acceptable (or handled efficiently) and can be enabled to simplify lowering and improve performance. In contrast, kernel eBPF must obey verifier constraints and platform-specific alignment restrictions. A new CLI option keeps kernel behavior unchanged while giving userspace VMs an explicit opt-in to enable more permissive codegen. It supports both use-cases without diverging codebases.

…7763) As mentioned in comments for llvm#164913, the `if()` statements here can't be externally triggered, since these writeback registers are passed in from the caller. So they should really be `assert()`s so it's obvious we don't need testcases for them, and more optimal.

Reverts llvm#161546 One of the buildbots reported a cmake error I don't understand, and which I didn't get in my own test builds: ``` CMake Error at /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/compiler-rt/cmake/Modules/CheckAssemblerFlag.cmake:23 (try_compile): COMPILE_DEFINITIONS specified on a srcdir type TRY_COMPILE ``` My best guess is that the thing I did in `CheckAssemblerFlag.cmake` only works on some versions of cmake. But I don't understand the problem well enough to fix it quickly, so I'm reverting the whole patch and will reland it later.

Implement support for GNUNullExpr

…m#167540) In adopting `[[clang::nonblocking]]` there's been some user confusion. Changes to address `-Wfunction-effects` warnings are often pure annotation, with no runtime effect. Changes to avoid `-Wperf-constraint-implies-noexcept` warnings are risky: adding `noexcept` creates a new potential for the program to crash. In retrospect, `-Wperf-constraint-implies-noexcept` shouldn't have been made part of `-Wall`. --------- Co-authored-by: Doug Wyatt <[email protected]>

This changes muls by `3 << C` from `(X << C + 2) - (X << C)` to `(X << C + 1) + (X << C)`. If Zba is available, the output is not affected as we emit `(shl (sh1add X, X), C)` instead. There are two advantages: - ADD is more compressible - Often a reduced instruction count, by a heuristic that `(X << C + 1)` is more likely to have another use than `(X << C + 2)`

…7898) So that changing the type of the container (planned in a future patch) is less intrusive.

Upstream the basic support for the ExtVectorType element expr

…actor in the way of a glue. (llvm#167805) In the new test, we're trying to fold a load and a X86ISD::CALL. The call has a CopyToReg glued to it. The load and the call have different input chains so they need to be merged. This results in a TokenFactor that gets put between the CopyToReg and the final CALLm instruction. The DAG scheduler can't handle that. The load here was created by legalization of the extract_element using a stack temporary store and load. A normal IR load would be chained into call sequence by SelectionDAGBuilder. This would usually have the load chained in before the CopyToReg. The store/load created by legalization don't get chained into the rest of the DAG. Fixes llvm#63790

…llvm#167901)

AMDGPU: Start to use AV classes for unknown vector class Use AGPR+VGPR superclasses for gfx90a+. The type used for the class should be the broadest possible class, to be contextually restricted later. InstrEmitter clamps these to the common subclass of the context use instructions, so we're best off using the broadest possible class for all types. Note this does very little because we only use VGPR classes for FP types (though this doesn't particularly make any sense), and we legalize normal loads and stores to integer.

…agation.cpp (NFC)

XeGPU and XeVM dialect has assigned maintainers, but related folders currently lack code owners. Add charithaintc and Jianhui-Li as code owner for XeGPU related folders. Add silee2 as code owner for XeVM related folders. Note: charithaintc is current maintainer of XeGPU dialect. silee2 is current maintainer of XeVM dialect.

This patch updates various LLVM headers to properly add the `LLVM_ABI` and `LLVM_ABI_FOR_TEST` annotations to build LLVM as a DLL on Windows. This effort is tracked in llvm#109483.

Biasing towards the native `sqrt` not returning NaN. Issue llvm#147390

Test case for vectorizing std::find_if with builtin_assume_dereferenceable. Currently not vectorized. https://godbolt.org/z/6jbsd4EjT

…lvm#167788) If a caller has locked memory, then the madvise call will fail. In that case, zero the memory so that we don't return non-zeroed memory for calloc calls since we thought the memory had been released.

Reviewers: arsenm, RKSimon, paperchalice, phoebewang Reviewed By: arsenm, RKSimon Pull Request: llvm#167911

…67797) The fixes a TOCTOU bug in the code that initializes shadow memory in ASAN: https://github.com/llvm/llvm-project/blob/4b05581bae0e3432cfa514788418fb2fc2144904/compiler-rt/lib/asan/asan_shadow_setup.cpp#L66-L91 1. During initialization, we call `FindDynamicShadowStart` to search the memory mapping for enough space to dynamically allocate shadow memory. 2. We call `MemoryRangeIsAvailable(shadow_start, kHighShadowEnd);`, which goes into `MemoryMappingLayout`. 3. We actually map the shadow with `ReserveShadowMemoryRange`. In step 2, `MemoryMappingLayout` makes various allocations using the internal allocator. This can cause the allocator to map more memory! In some cases, this can actually allocate memory that overlaps with the shadow region returned by` FindDynamicShadowStart` in step 1. This is not actually fatal, but it memory corruption; MAP_FIXED is allowed to overlap other regions, and the effect is any overlapping memory is zeroed. ------ To address this, this PR implements `MemoryRangeIsAvailable` on Darwin without any heap allocations: - Move `IntervalsAreSeparate` into sanitizer_common.h - Guard existing sanitizer_posix implementation of `MemoryRangeIsAvailable` behind !SANITIZER_APPLE - `IsAddressInMappedRegion` in sanitizer_mac becomes `MemoryRangeIsAvailable`, which also checks for overlap with the DYLD shared cache. After this fix, it should be possible to re-land llvm#166005, which triggered this issue on the x86 iOS simulators. rdar://164208439

) The IR was not able to be roundtrip through mlir-opt. Update the assembly format and add round trip tests. ``` mlir-opt mlir/test/Target/LLVMIR/nvvm/barrier.mlir | mlir-opt <stdin>:6:5: error: cannot name an operation with no results %0 = nvvm.barrier <and> %arg2 -> i32 ```

Change in llvm#166148 caused breaks for some other types. Specifically this error was seen in a downstream project ``` _ods_ir.OpOperandList[_ods_ir.IntegerType]: TypeError: type 'iree.compiler._mlir_libs._mlir.ir.OpOperandList' is not subscriptable ``` This PR tries to make those changes not affect the other types --------- Signed-off-by: Nirvedh Meshram <[email protected]>

There's only a single RUN line in the test, use the more compact default CHECK.

…olding. (llvm#149042)" This reverts commit 62d1a08. This appears to be causing some runtime failures on RISCV https://lab.llvm.org/buildbot/#/builders/210/builds/5221

) rdar://164612831

The minimum supported SWIG version is 4.0 so there's no need for using a separate file anymore.

This PR upstreams the intrinsics `_mm_prefetch`, `_mm_(l|m)fenche`, `_mm_pause` and `_mm_clflush` from the incubator repository.

Move utility to helper for re-use in follow-up patches.

) This adds some minimal code to mark locations where handling is needed for Dtor_VectorDeleting type dtors, which were added in llvm#165598 This is not a comprehensive mark-up of the missing code, as some code will be needed in places where the surrounding function has larger missing pieces in CIR currently. This fixes a warning for an uncovered switch case that was causing CI builds to fail.

This reverts commit 1a86f0a.

…164458) before ```C++ void foo(int name, // name float name, // name int name) // name {} ``` after ```C++ void foo(int name, // name float name, // name int name) // name {} ``` Fixes llvm#85123. As the bug report explained, the procedure for aligning the function parameters previously failed to update `StartOfTokenColumn`.

z1-cciauto · 2025-11-13T23:55:31Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2798

…lvm#166483)" bad virtual register: clang-313307_simple_arr: Make Failed clang-313307_simple_spmd: Make Failed clang-337336: Make Failed clang-387196: Make Failed flang-sollve-bug1: Make Failed MasterBarrierO0: Make Failed This reverts commit c7019c7.

z1-cciauto · 2025-11-14T01:45:34Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2800

ronlieb · 2025-11-14T04:38:36Z

!PSDB

z1-cciauto · 2025-11-14T04:38:53Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2802

lplewa and others added 30 commits November 13, 2025 15:56

[flang][NFC] Strip trailing whitespace from tests (4 of N)

a12600c

Only the fortran source files in flang/test/Intrinsics have been modified. The other files in flang/test will be cleaned up in subsequent commits

[NFC][TableGen] Adopt CodeGenHelpers in SubtargetEmitter (llvm#163820)

e5c418f

- Adopt ifdef and namespace emitters in SubtargeEmitter. - To aid that, factor out emission of different sections of the code into individual helper functions.

Revert "[Flang][OpenMP] Update declare mapper lookup via use-module" (l…

e1324a9

…lvm#167896) Reverts llvm#163860

[CIR] Prepare a 'this' for CXXDefaultInitExprs (llvm#165994)

6a0ba8b

Prepare a 'this' for CXXDefaultInitExprs

[LV] Update LoopVectorizationPlanner::emitInvalidCostRemarks to handl…

a04c6b5

…e reduction plans (llvm#165913) The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case. Fixes llvm#165359.

[AArch64][SVE] Allow basic use of target("aarch64.svcount") with +s…

12322b2

…ve (llvm#167875) This prevents the backend from crashing for basic uses of __SVCount_t type (e.g., as function arguments), without +sve2p1 or +sme2. Fixes llvm#167462

[GISel][AArch64] Create emitCMP instead of cloning a virtual register…

d6703bb

… (NFC) (llvm#155262) CMN also has a function like this, we should do the same with CMP.

[libcxx] Fix xsgetn in basic_filebuf (llvm#167779)

ea16f7d

The optimized version of xsgetn for basic_filebuf added in llvm#165223 has an issue where if the reads come from both the buffer and the filesystem it returns the wrong number of characters. This patch should address the issue.

[RISCV][llvm] Handle INSERT_VECTOR_ELT, EXTRACT_VECTOR_ELT codegen fo…

e63a47d

…r zvfbfa (llvm#167819)

[CIR] Implement support for GNUNullExpr (llvm#167715)

de3d74a

Implement support for GNUNullExpr

[libc][NFC] Fix warnings in RPC server code

6b49e6a

[CodeGen] Hide SparseSet<LiveRegUnit> behind a typedef (NFC) (llvm#16…

98f9b54

…7898) So that changing the type of the container (planned in a future patch) is less intrusive.

[CIR] Upstream basic support for ExtVector element expr (llvm#167570)

9216e17

Upstream the basic support for the ExtVectorType element expr

[CodeGen] Add TRI::regunits() iterating over all register units (NFC) (…

d1cc137

…llvm#167901)

[bazel] Added ArithToAPFloat library to bazel (llvm#167916)

b49a847

[MLIR] Apply clang-tidy fixes for llvm-qualified-auto in ShardingProp…

965b338

…agation.cpp (NFC)

Add missing LLVM_ABI annotations (llvm#167718)

23f6a8a

This patch updates various LLVM headers to properly add the `LLVM_ABI` and `LLVM_ABI_FOR_TEST` annotations to build LLVM as a DLL on Windows. This effort is tracked in llvm#109483.

mtrofin and others added 22 commits November 13, 2025 12:46

[PILC][profcheck] Bias branch weights when optimizing sqrt (llvm#167742)

3d41cbb

Biasing towards the native `sqrt` not returning NaN. Issue llvm#147390

[LV] Add early-exit tests, where deref assumes are not in preheader.

6429549

Test case for vectorizing std::find_if with builtin_assume_dereferenceable. Currently not vectorized. https://godbolt.org/z/6jbsd4EjT

[LLDB] Use %clang_host instead of %clang in test (NFC)

e51163c

[LLDB] Use skipIf instead of expectedFail

c40779a

[X86][NewPM] Port X86 FP Stackifier Pass to NewPM

c44bd37

Reviewers: arsenm, RKSimon, paperchalice, phoebewang Reviewed By: arsenm, RKSimon Pull Request: llvm#167911

[LV] Drop verbose check-prefix from partial-reduce-incomplete-chains.ll.

79cd1b7

There's only a single RUN line in the test, use the more compact default CHECK.

merge main into amd-staging

eb8752f

Revert "[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-f…

a6edeed

…olding. (llvm#149042)" This reverts commit 62d1a08. This appears to be causing some runtime failures on RISCV https://lab.llvm.org/buildbot/#/builders/210/builds/5221

[clang][deps] Track VFS overlay files in file dependencies. (llvm#167824

513232f

) rdar://164612831

[lldb] Remove bindings/python/python-typemaps.h (llvm#167966)

36848a3

The minimum supported SWIG version is 4.0 so there's no need for using a separate file anymore.

[CIR] Upstream X86 builtin clflush, fence and pause (llvm#167401)

3ff3c4e

This PR upstreams the intrinsics `_mm_prefetch`, `_mm_(l|m)fenche`, `_mm_pause` and `_mm_clflush` from the incubator repository.

[VPlan] Add findComputeReductionResult helper. (NFC)

4e71530

Move utility to helper for re-use in follow-up patches.

merge main into amd-staging

da1534d

[RegAllocGreedy] Use MCRegister instead of MCPhysReg. NFC (llvm#167974)

388ef61

Revert "[Offload] Add device info for shared memory (llvm#167817)"

67a60f8

This reverts commit 1a86f0a.

merge main into amd-staging

4a54886

ronlieb requested review from a team and dpalermo November 13, 2025 23:54

dpalermo approved these changes Nov 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #585

merge main into amd-staging #585

Uh oh!

ronlieb commented Nov 13, 2025

Uh oh!

z1-cciauto commented Nov 13, 2025

Uh oh!

z1-cciauto commented Nov 14, 2025

Uh oh!

ronlieb commented Nov 14, 2025

Uh oh!

z1-cciauto commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

57 participants

merge main into amd-staging #585

Are you sure you want to change the base?

merge main into amd-staging #585

Uh oh!

Conversation

ronlieb commented Nov 13, 2025

Uh oh!

z1-cciauto commented Nov 13, 2025

Uh oh!

z1-cciauto commented Nov 14, 2025

Uh oh!

ronlieb commented Nov 14, 2025

Uh oh!

z1-cciauto commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

57 participants