merge main into amd-staging #738

ronlieb · 2025-12-03T01:51:19Z

No description provided.

…lvm#170313) Some of the AVX512 cases are already handled by llvm#170295

…vm#169917) New pass manager does not use TargetPassConfig. GlobalISel requires TargetPassConfig to reportGISelFailure, and it only actual use is to check if GlobalISelAbort is enabled. TargetPassConfig uses TargetMachine to check if GlobalISelAbort is enabled, but TargetMachine is also available from MachineFunction.

The test changes are mostly GlobalISel specific regressions. GlobalISel is still relying on isUniformMMO, but it doesn't really have an excuse for doing so. These should be avoidable with new regbankselect. There is an additional regression for addrspacecast for cov4. We probably ought to be using a separate PseudoSourceValue for the access of the queue pointer.

`ResourceDirectoryCache::findResourceDir` uses a `std::vector` when a `std::array` would do.

This requires an x86 build, otherwise the test will fail with: ``` Error running ThinLTO backend: No available targets are compatible with triple "x86_64-unknown-linux-gnu" ```

This patch addresses issues identified by the static analyzers, which appear to be legitimate problems. `FloatLoopCounterCheck.cpp`: "Dereferencing a pointer that might be `nullptr` FS when calling `getInc`". `ProBoundsAvoidUncheckedContainerAccessCheck.cpp`: "Dereferencing a pointer that might be `nullptr Callee` when calling `getBeginLoc`". `ExpandModularHeadersPPCallbacks.cpp`: Non-static class member `CurrentToken.Flags` is not initialized in this constructor nor in any functions that it calls. (line #101).

…lvm#170307) This is a partial fix for the rocm device-libs build. This was most likely broken by 423bdb2

…xtension (llvm#169137)" This reverts commit e7748e9. It broke the Windows build https://github.com/llvm/llvm-project/actions/runs/19842117405/job/56852610863 https://lab.llvm.org/buildbot/#/builders/166/builds/4535 After llvm#170142 fixed another issue, this was also the remaining reason for this buildbot to fail: https://lab.llvm.org/buildbot/#/builders/207/builds/10423

Part of llvm#167765

Vector registers have synthetic values for display purposes. This causes SBValue::GetExpressionPath to dispatch to ValueObjectSynthetic instead of ValueObjectRegister, producing incorrect results. Fixes llvm#147144

…70326) 4394aa6 introduced the test amdgcn_weak_alias, which is failing on the reverse iteration build, due to the the order of the aliasees being different. This failure is a test issue, not a bug, so the metadata checks are removed.

) Currently we try to hoist the transformed IV increment instruction to the header block to help with generation of postincrement instructions, but this only works if the user instruction is also in the header. We should instead be trying to insert it in the same block as the user.

…#169795) Similar to llvm#169156 again, this is mostly for denormal handling as there is no rounding step in a minnum/maxnum.

Add additional tests where extra no-alias checks are needed, as future extensions of llvm#168771.

This moves a few existing debug info flags that were floating in the general pool of unorganised flags over to the existing groups for debug info flags (so that they are presented together in documentation). As a tiny further tweak, this also fixes the spelling of "DWARF" in the flag docs for consistency with other flags.

Fix a comment post llvm#162167

One of the previous PRs llvm#169267 has reintroduced block count to layout propagation that was removed in llvm#168504. This PR patches the issue.

…llvm#170325) Address comment llvm#169904 (comment)

Adding llvm:Support dep since plugin started using llvm/ADT/...

Fixes part of llvm#168920

) In Debug builds, the names of adjusted pointers have a pointer-specific name prefix which doesn't exist in non-debug builds. This causes differences in output when looking at the output of SROA with a Debug or Release compiler. For most of our ongoing testing, we use essentially Release+Asserts build (basically release but without NDEBUG defined), however we ship a Release compiler. Therefore we want to say with reasonable confidence that building a large project with Release vs a Release+Asserts build gives us the same output when the same compiler version is used. This difference however, makes it difficult to prove that the output is the same if the only difference is the name when using LTO builds and looking at bitcode. Hence this change is being proposed.

… 63))) (llvm#170128) Follow up to llvm#71844 after shl implementation

I had a case where the frontend was generating a zero elem array in non-shader code so it was just crashing in a release build. Add a real error and make it not crash. --------- Signed-off-by: Nick Sarnie <[email protected]>

…ual (llvm#170191) This change makes StackFrame methods virtual to enable subclass overrides and introduces BorrowedStackFrame, a wrapper that presents an existing StackFrame with a different frame index. This enables creating synthetic frame views or renumbering frames without copying the underlying frame data, which is useful for frame manipulation scenarios. This also adds a new borrowed-info format entity to show what was the original frame index of the borrowed frame. Signed-off-by: Med Ismail Bennani <[email protected]>

This commit modifies the dwarf expression evaluator in how we handle the deref operation for register and implicit locations on the stack. For a typical memory location a deref operation will read the value from memory. For register and implicit locations the deref operation will read the value from the register or its implicit location. In lldb we eagerly read register and implicit values and push them on the stack so the deref operation for these becomes a "no-op" that leaves the value on the stack and updates the tracked location kind. The motivation for this change is to handle `DW_OP_deref*` operations on location descriptions as described by the heterogenious debugging [extensions](https://rocm.docs.amd.com/projects/llvm-project/en/latest/LLVM/llvm/html/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#a-2-5-4-4-4-register-location-description-operations). Specifically, for register locations it states > These operations obtain a register location. To fetch the contents of > a register, it is necessary to use DW_OP_regval_type, use one of the > DW_OP_breg* register-based addressing operations, or use DW_OP_deref* on > a register location description. My understanding is that this is the intended behavior from dwarf5 as well and is not a change in behavior.

The 'routine' construct just adds a acc.routine element to the global module, which contains all of the information about the directive. it contains a reference to the function, which also contains a reference to the acc.routine, which this generates. This handles both the implicit-func version (where the routine is spelled without parens, and just applies to the next function) and the explicit-func version (where the routine is spelled with the func name in parens). The AST stores the directive in an OpenACCRoutineDeclAttr in the implicit case, so we can emit that when we hit the function declaration. The explicit case is held in an OpenACCRoutineAnnotAttr on the function, however, when we emit the function we haven't necessarily seen the construct yet, so we can't depend on that attribute. Instead, we save up the list in Sema so that we can emit them all at the end. This results in the tests getting really hard to read (because ordering is a little awkward based on spelling, with no way to fix it), so we instead split the tests up based on topic. One last thing: Flang spends some time determining if the clause lists of two routines on the same function are identical, and omits the duplicates. However, it seems to do a poor job on this when the ordering isn't the same, or references are slightly different. This patch doesn't bother trying that, and instead emits all, trusting the ACC dialect to remove duplicates/handle duplicates gracefully. Note; This doesn't cause emission of functions that would otherwise not be emitted, but DOES emit routine references based on which function they are attached to.

This patch extends the OpenACC PointerLikeType interface with two new methods for generating load and store operations, enabling dialect-agnostic memory access patterns. New Interface Methods: - genLoad(builder, loc, srcPtr, valueType): Generates a load operation from a pointer-like value. Returns the loaded value. - genStore(builder, loc, valueToStore, destPtr): Generates a store operation to a pointer-like value. Implementations provided for FIR pointer-like types, memref type (rank-0 only), and LLVM pointer types. Extended TestPointerLikeTypeInterface.cpp with 'load' and 'store' test modes.

Reducing spurious diff in an upcoming change.

This moves a call inside an assert to avoid a warning about the result variable being unused in release builds.

This reverts commit e719e93. revert this since it caused regression in our internal CI. Deduction guide with host/device attrs have already been used in https://github.com/ROCm/rocm-libraries/blob/develop/projects/rocrand/library/src/rng/utils/cpp_utils.hpp#L249 ``` template<class V> __host__ __device__ vec_wrapper(V) -> vec_wrapper<V>; ```

…0358) These two are both incredibly similar and simple, basically identical to 'seq'. This patch adds them both together.

Adding the following dependencies to PluginScriptedProcess: - "//lldb:CoreHeaders", - "//lldb:SymbolHeaders", - "//llvm:Support", For c50802c

This upstreams the handler for the BI__builtin_constant_p function.

…160525) Co-authored-by: Alexander Kornienko <[email protected]> Co-authored-by: Louis Dionne <[email protected]>

@rocallahan

Commit b262785 introduced a separate `AnalysisFpExc` target to try to workaround the lack of a bazel equivalent of single source file properties. However, this introduces backref errors when `--warn-backrefs` is enabled. This change alternatively just adds the `-ftrapping-math` copt to the entire `Analysis` target. Fix suggested by @rocallahan.

…del (llvm#168270) The VPlan-based cost model assigns the forced cost once for a whole VPInterleaveRecipe. Update the legacy cost model to match this behavior. This fixes a cost-model divergence, and assigns the cost in a way that matches the generated code more accurately. PR: llvm#168270

This clause is pretty small/trivial and is a simple 'set a bool' value on the IR node, so its implementation is quite simple. We create the Operation with this as 'false', so the 'nohost' marks it as true always.

Remove a redundant duplicated computeCost call. NFC, just skipping an unneeded call.

Shared memory for TMA operation needs to be align to 16. Add ability to set an alignment on the cuf.shared_memory operation.

Add more tests for follow-up to llvm#169576.

…lvm#170350) Updates `InitializeRequestArguments` to correctly follow the spec, see https://microsoft.github.io/debug-adapter-protocol/specification#Requests_Initialize. This should correct which fields are tracked as optional and simplifies some of the types to make sure they're meaningful (e.g. an `optional<bool>` isn't anymore helpful than a `bool` since undefined and false are basically equivalent and it requires us to handle interpreting undefined as the default value in all the places we use the `optional<bool>`).

This change fixes couple of issues with static resources: - Enables assignment to static resource or resource array variables (fixes llvm#166458) - Initializes static resources and resource arrays with default constructor that sets the handle to poison

…m#170375) src and dst pointer needs to have an address cast

z1-cciauto · 2025-12-03T01:52:32Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3056

RKSimon and others added 30 commits December 2, 2025 16:08

[X86] Add tests showing failure to concat fcmp instructions together (l…

47d66bf

…lvm#170313) Some of the AVX512 cases are already handled by llvm#170295

[lldb][shell tests] Properly fix fallout from c8031c3

cdc4147

libclc: Fix build in atomic_def.inc (llvm#170306)

e07e60e

[clang][deps][NFC] Replace a vector with an array (llvm#169555)

c21fd44

`ResourceDirectoryCache::findResourceDir` uses a `std::vector` when a `std::array` would do.

[MemProf] Require x86 for memprof-pgho.cpp test (llvm#170321)

6984f94

This requires an x86 build, otherwise the test will fail with: ``` Error running ThinLTO backend: No available targets are compatible with triple "x86_64-unknown-linux-gnu" ```

[profcheck] update exclude list (llvm#170316)

4ff3d1c

clang/AMDGPU: Add missing __opencl_c_read_write_images feature macro (l…

669683a

…lvm#170307) This is a partial fix for the rocm device-libs build. This was most likely broken by 423bdb2

[CIR][X86] Add support for kunpck builtins (llvm#168757)

2209d33

Part of llvm#167765

[lldb] Fix GetExpressionPath for vector registers (llvm#169210)

5c552c5

Vector registers have synthetic values for display purposes. This causes SBValue::GetExpressionPath to dispatch to ValueObjectSynthetic instead of ValueObjectRegister, producing incorrect results. Fixes llvm#147144

[ARM] Introduce intrinsics for MVE minnm/maxnm under strict-fp. (llvm…

c037128

…#169795) Similar to llvm#169156 again, this is mostly for denormal handling as there is no rounding step in a minnum/maxnum.

[LV] Add predicated store sinking tests requiring further noalias checks

1e6476d

Add additional tests where extra no-alias checks are needed, as future extensions of llvm#168771.

[mlir][vector][test] Fix comment in test (nfc) (llvm#170336)

a8ef3c8

Fix a comment post llvm#162167

[MLIR][XeGPU][Quickfix] Disable block count in propagation (llvm#170304)

ea00593

One of the previous PRs llvm#169267 has reintroduced block count to layout propagation that was removed in llvm#168504. This PR patches the issue.

[ValueTracking] Support scalable vector splats in computeKnownFPClass (…

e0f3302

…llvm#170325) Address comment llvm#169904 (comment)

[bazel] feat: update bazel lldb for llvm:support dep (llvm#170344)

ac66ae4

Adding llvm:Support dep since plugin started using llvm/ADT/...

[LLDB] Add SBFrameExtensions Tests (llvm#169236)

5e5937c

Fixes part of llvm#168920

[NFC][LLVM] Namespace cleanup in ScalarEvolution (llvm#166620)

ef49c92

LTO: Remove unused TargetLibraryInfo include (llvm#170340)

2c38632

[lld] Fix typo in lld manpage, nfc (llvm#170299)

4587fe6

[WebAssembly] Optimize away mask of 63 for sra and srl( zext (and i32…

e0db7f3

… 63))) (llvm#170128) Follow up to llvm#71844 after shl implementation

dmpots and others added 20 commits December 2, 2025 11:13

[SCEV] Regenerate a subset of auto updated tests

49a9787

Reducing spurious diff in an upcoming change.

[CIR][NFC] Fix a release build warning (llvm#170359)

ca3de05

This moves a call inside an assert to avoid a warning about the result variable being unused in release builds.

[OpenACC][CIR] Add worker/vector clause lowering for Routine (llvm#17…

c910d82

…0358) These two are both incredibly similar and simple, basically identical to 'seq'. This patch adds them both together.

[bazel] update bazel build for PluginScriptedProcess (llvm#170364)

c77fe58

Adding the following dependencies to PluginScriptedProcess: - "//lldb:CoreHeaders", - "//lldb:SymbolHeaders", - "//llvm:Support", For c50802c

[CIR] Upstream support for builtin_constant_p (llvm#170354)

12ae727

This upstreams the handler for the BI__builtin_constant_p function.

[libc++] Fix the rest of __gnu_cxx::hash_XXX copy construction (llvm#…

d97746c

…160525) Co-authored-by: Alexander Kornienko <[email protected]> Co-authored-by: Louis Dionne <[email protected]>

[OpenACC][CIR] Implement 'nohost' lowering. (llvm#170369)

4006df9

This clause is pretty small/trivial and is a simple 'set a bool' value on the IR node, so its implementation is quite simple. We create the Operation with this as 'false', so the 'nohost' marks it as true always.

[VPlan] Remove duplicated computeCost call (NFC).

bd5fa63

Remove a redundant duplicated computeCost call. NFC, just skipping an unneeded call.

[flang][cuda] Add alignment to shared memory operation (llvm#170372)

d3256d9

Shared memory for TMA operation needs to be align to 16. Add ability to set an alignment on the cuf.shared_memory operation.

[SCEV] Add UDiv canonicalization tests with nested AddRecs.

41519b3

Add more tests for follow-up to llvm#169576.

[flang][cuda] Add address cast for src and dst in TMA operations (llv…

9885aed

…m#170375) src and dst pointer needs to have an address cast

merge main into amd-staging

6773883

ronlieb requested review from a team and dpalermo December 3, 2025 01:51

ronlieb requested review from krzysz00 and kuhar as code owners December 3, 2025 01:51

ronlieb removed request for krzysz00 and kuhar December 3, 2025 01:54

dpalermo approved these changes Dec 3, 2025

View reviewed changes

z1-cciauto merged commit 87c9379 into amd-staging Dec 3, 2025
20 checks passed

z1-cciauto deleted the amd/merge/upstream_merge_20251202185441 branch December 3, 2025 04:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #738

merge main into amd-staging #738

ronlieb commented Dec 3, 2025

Uh oh!

z1-cciauto commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

merge main into amd-staging #738

merge main into amd-staging #738

Conversation

ronlieb commented Dec 3, 2025

Uh oh!

z1-cciauto commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants