merge main into amd-staging #408

ronlieb · 2025-10-27T19:50:36Z

No description provided.

…iling with clang" (llvm#165268) Reverts llvm#152724 The PR was merged with broken pre-commit CI.

…ng (llvm#165030) This PR uses the upstream populateCastAwayVectorLeadingOneDimPatterns to remove leading unit dims from vector ops and then do the unrolling/blocking

Reviewers: jvoung, Xazax-hun Reviewed By: jvoung Pull Request: llvm#163894

This is important if the first use of a StatusOr (or Status) is in a conditional statement, we need a stable value for `ok` from outside of the conditional statement to make sure we don't use a different variable in every branch. Reviewers: jvoung, Xazax-hun Reviewed By: jvoung Pull Request: llvm#163898

This tool provides a harness for implementing different strategies that summarize many remarks (possibly from multiple translation units) into new summary remarks. The remark summaries can then be viewed using tools like `opt-viewer`. The first summary strategy is `--inline-callees`, which generates remarks that summarize the per-callee inline statistics for functions that appear in inling remarks. This is useful for troubleshooting inlining issues/regressions on large codebases. Pull Request: llvm#160549

…Us (llvm#164761) Temps needed for the allocatable reduction/privatization init regions are now allocated on the heap all the time. However, this is performance killer for GPUs since malloc calls are prohibitively expensive. Therefore, we should do these allocations on the stack for GPU reductions. This is similar to what we do for arrays. Additionally, I am working on getting reductions-by-ref to work on GPUs which is a bit of a challenge given the many involved steps (e.g. intra-warp and inter-warp reuctions, shuffling data from remote lanes, ...). But this is a prerequisite step.

Back in llvm#69493 the `-debug-info-correlate` LLVM flag was deprecated in favor of `-profile-correlate=debug-info`. Update all tests to use this new flag.

Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit general-purpose register instead of SystemZ::CC. --------- Co-authored-by: anoopkg6 <[email protected]> Co-authored-by: Ulrich Weigand <[email protected]>

Some machines have read-only vtables but this test expects to overwrite them. Use -no_data_const to ensure the vtable is writable

This reverts commit 9a0aa92.

These are the other options used in compiler-rt that we also need to support. Reviewers: arichardson, petrhosek, ilovepi Reviewed By: ilovepi, arichardson Pull Request: llvm#165122

Pre-commit test for PR: llvm#162580

…#164002) Add implementation and encoding tests for: - tlbiep - tlbieio - tlbsyncio - ptesyncio

When lowering spills / restores, we may end up partially lowering the spill via copies and the remaining portion with loads/stores. In this partial lowering case,the implicit-def operands added to the restore load clobber the preceding copies -- telling MachineCopyPropagation to delete them. By also attaching an implicit operand to the load, the COPYs have an artificial use and thus will not be deleted - this is the same strategy taken in llvm#115285 I'm not sure that we need implicit-def operands on any load restore, but I guess it may make sense if it needs to be split into multiple loads and some have been optimized out as containing undef elements. These implicit / implicit-def operands continue to cause correctness issues. A previous / ongoing long term plan to remove them is being addressed via: https://discourse.llvm.org/t/llvm-codegen-rfc-add-mo-lanemask-type-and-a-new-copy-lanemask-instruction/88021 llvm#151123 llvm#151124

This PR passes the VFS to LLVM's sanitizer passes from Clang, so that the configuration files can be loaded in the same way all other compiler inputs are.

The options -fbuiltin and -fno-builtin are not valid for Fortran. However, they are accepted by gfortran which emits a warning message but continues to compile the code. Both -fbuiltin and -fno-builtin have been enabled for flang. Specifying either will result in a warning message being shown but no other effects. Compilation will proceed normally after these warnings are shown. This brings flang's behavior in line with gfortran for these options. Fixes llvm#164766

…lvm#164905) Turns out there's a bug in the current lldb sources that if you fork, set the stdio file handles to close on exec and then exec lldb with some commands and the `--batch` flag, lldb will stall on exit. The first cause of the bug is that the Python session handler - and probably other places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles, and open and close and dup them as needed. NB: I am NOT trying to fix that bug. I'm not convinced running the lldb driver headless is worth a lot of effort, it's just as easy to redirect them to /dev/null, which does work. But I would like to keep lldb from stalling on the way out when this happens. The reason we stall is that we have a MainLoop waiting for signals, and we try to Interrupt it, but because stdio was closed, the interrupt pipe for the MainLoop gets the file descriptor 0, which gets closed by the Python session handler if you run some script command. So the Interrupt fails. We were running the Write to the interrupt pipe wrapped in `llvm::cantFail`, but in a no asserts build that just drops the error on the floor. So then lldb went on to call std::thread::join on the still active MainLoop, and that stalls I made Interrupt (and AddCallback & AddPendingCallback) return a bool for "interrupt success" instead. All the places where code was requesting termination, I added checks for that failure, and skip the std::thread::join call on the MainLoop thread, since that is almost certainly going to stall at this point. I didn't do the same for the Windows MainLoop, as I don't know if/when the WSASetEvent call can fail, so I always return true here. I also didn't turn the test off for Windows. According to the Python docs all the API's I used should work on Windows... If that turns out not to be true I'll make the test Darwin/Unix only.

…#164687) new ```C++ auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return; // }; auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return aaaaaaaaaaaaaaaaaaaaa; // }; ``` old ```C++ auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return; // }; auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return aaaaaaaaaaaaaaaaaaaaa; // }; ``` Aligning a line to another line involves keeping track of the tokens' positions. Previously the shift was incorrectly added to some tokens that did not move. Then the comments would end up in the wrong places.

…en size dimension value is 0 (llvm#164878) Previously, the runtime verification pass would insert assertion statements with conditions that always evaluate to false for semantically valid `tensor.extract_slice` operations where one of the dimensions had a size of 0. The `tensor.extract_slice` runtime verification logic was unconditionally generating checks for the position of the last element (`offset + (size - 1) * stride`). When `size` is 0, this causes the assertion condition to always be false, leading to runtime failures even though the operation is semantically valid. This patch fixes the issue by making the `lastPos` check conditional. The offset is always verified, but the endpoint check is only performed when `size > 0` to avoid generating spurious assert statements. This issue was discovered through LiteRT model, where a dynamic shape calculation resulted in a zero-sized dimension being passed to `tensor.extract_slice`. The following is a simplified IR snippet from the model. After running the runtime verification pass, an assertion that always fails is generated because the SSA value `%3` becomes 0. ```mlir func.func @simple_repro_from_liteRT_model(%arg0: tensor<10x4x1xf32>) -> tensor<?x?x?xf32> { %cst = arith.constant dense<0> : tensor<1xi32> %cst_0 = arith.constant dense<-1> : tensor<2xi32> %c-1 = arith.constant -1 : index %c0 = arith.constant 0 : index %c10 = arith.constant 10 : index %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c2 = arith.constant 2 : index %0 = tensor.empty() : tensor<3xi32> %inserted_slice = tensor.insert_slice %cst into %0[0] [1] [1] : tensor<1xi32> into tensor<3xi32> %inserted_slice_1 = tensor.insert_slice %cst_0 into %inserted_slice[1] [2] [1] : tensor<2xi32> into tensor<3xi32> %extracted = tensor.extract %inserted_slice_1[%c0] : tensor<3xi32> %1 = index.casts %extracted : i32 to index %2 = arith.cmpi eq, %1, %c-1 : index %3 = arith.select %2, %c10, %1 : index %extracted_2 = tensor.extract %inserted_slice_1[%c1] : tensor<3xi32> %4 = index.casts %extracted_2 : i32 to index %5 = arith.cmpi eq, %4, %c-1 : index %6 = arith.select %5, %c4, %4 : index %extracted_3 = tensor.extract %inserted_slice_1[%c2] : tensor<3xi32> %7 = index.casts %extracted_3 : i32 to index %8 = arith.cmpi eq, %7, %c-1 : index %9 = arith.select %8, %c1, %7 : index %extracted_slice = tensor.extract_slice %arg0[0, 0, 0] [%3, %6, %9] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32> return %extracted_slice : tensor<?x?x?xf32> } ``` The issue can be reproduced more simply with the following test case, where `dim_0` is `0`. When the runtime verification pass is applied to this code with `dim_0 = 0`, it generates an assertion that will always fail at runtime. ```mlir func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>, %dim_0: index, %dim_1: index, %dim_2: index) { %slice = tensor.extract_slice %arg0[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32> return } func.func @test_zero_size_extraction() { %input = arith.constant dense<1.0> : tensor<10x4x1xf32> // Define slice dimensions: 0x4x1 (zero-size in first dimension) %dim_0 = arith.constant 0 : index %dim_1 = arith.constant 4 : index %dim_2 = arith.constant 1 : index func.call @extract_slice_zero_size_dim(%input, %dim_0, %dim_1, %dim_2) : (tensor<10x4x1xf32>, index, index, index) -> () return } ``` P.S. We probably have a similar issue with `memref.subview`. I will check this and send a separate PR for the issue. --------- Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>

…dimension value is 0 (llvm#164897) Previously, the runtime verification pass would insert assertion statements with conditions that always evaluate to false for semantically valid `memref.subview` operations where one of the dimensions had a size of 0. The `memref.subview` runtime verification logic was unconditionally generating checks for the position of the last element (`offset + (size - 1) * stride`). When `size` is 0, this causes the assertion condition to always be false, leading to runtime failures even though the operation is semantically valid. This patch fixes the issue by making the `lastPos` check conditional. The offset is always verified, but the endpoint check is only performed when `size > 0` to avoid generating spurious assert statements. This issue was discovered through a LiteRT model, where a dynamic shape calculation resulted in a zero-sized dimension being passed to `memref.subview`. The following is a simplified IR snippet from the model. After running the runtime verification pass, an assertion that always fails is generated because the SSA value `%5` becomes 0. ```mlir module { memref.global "private" constant @__constant_2xi32 : memref<2xi32> = dense<-1> {alignment = 64 : i64} memref.global "private" constant @__constant_1xi32 : memref<1xi32> = dense<0> {alignment = 64 : i64} func.func @simpleRepro(%arg0: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>) -> memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> { %c2 = arith.constant 2 : index %c4 = arith.constant 4 : index %c1 = arith.constant 1 : index %c10 = arith.constant 10 : index %c0 = arith.constant 0 : index %c-1 = arith.constant -1 : index %0 = memref.get_global @__constant_1xi32 : memref<1xi32> %1 = memref.get_global @__constant_2xi32 : memref<2xi32> %alloca = memref.alloca() {alignment = 64 : i64} : memref<3xi32> %subview = memref.subview %alloca[0] [1] [1] : memref<3xi32> to memref<1xi32, strided<[1]>> memref.copy %0, %subview : memref<1xi32> to memref<1xi32, strided<[1]>> %subview_0 = memref.subview %alloca[1] [2] [1] : memref<3xi32> to memref<2xi32, strided<[1], offset: 1>> memref.copy %1, %subview_0 : memref<2xi32> to memref<2xi32, strided<[1], offset: 1>> %2 = memref.load %alloca[%c0] : memref<3xi32> %3 = index.casts %2 : i32 to index %4 = arith.cmpi eq, %3, %c-1 : index %5 = arith.select %4, %c10, %3 : index %6 = memref.load %alloca[%c1] : memref<3xi32> %7 = index.casts %6 : i32 to index %8 = arith.cmpi eq, %7, %c-1 : index %9 = arith.select %8, %c4, %7 : index %10 = memref.load %alloca[%c2] : memref<3xi32> %11 = index.casts %10 : i32 to index %12 = arith.cmpi eq, %11, %c-1 : index %13 = arith.select %12, %c1, %11 : index %subview_1 = memref.subview %arg0[0, 0, 0] [%5, %9, %13] [1, 1, 1] : memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> return %subview_1 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> } } ``` P.S. This is a similar issue to the one fixed for `tensor.extract_slice` in llvm#164878 --------- Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>

…LVM IR (llvm#165286) There's a couple of tests like this. This patch series renames these to something more descriptive and adjusts the tests to check IR. Currently the tests check raw assembly output (not even dwarfdump). Which most likely hid some bugs around property debug-info.

…164765) These tests not supported on AIX and z/OS, disable them to get the clang-ppc64-aix green

…lvm#165021) The type sizes of backedge taken counts for two loops can be different and this is to fix the crash in haveSameSD (llvm#165014). --------- Co-authored-by: Shimin Cui <[email protected]>

Changes test name to something more meaningful. In preparation to refactoring the test to check LLVM IR instead of assembly.

Currently ExecutionEngine tries to dump all functions declared in the module, even those which are "external" (i.e., linked/loaded at runtime). E.g. ```mlir func.func private @printF32(f32) func.func @supported_arg_types(%arg0: i32, %arg1: f32) { call @printF32(%arg1) : (f32) -> () return } ``` fails with ``` Could not compile printF32: Symbols not found: [ __mlir_printF32 ] Program aborted due to an unhandled Error: Symbols not found: [ __mlir_printF32 ] ``` even though `printF32` can be provided at final build time (i.e., when the object file is linked to some executable or shlib). E.g, if our own `libmlir_c_runner_utils` is linked. So just skip functions which have no bodies during dump (i.e., are decls without defns).

Adds `arm64-apple-darwin` support to `asm.py` matching and removes now invalidated `target-triple-mismatch` test (I dont have another triple supported by llc but not the autogenerator that make this test useful).

Don't rely on comparison to singular iterator, it's UB. Fixes bot crashes after llvm#164524.

z1-cciauto · 2025-10-27T19:51:20Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2472

z1-cciauto · 2025-10-27T22:28:05Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/2473

philnik777 and others added 30 commits October 27, 2025 17:08

Revert "[libcxx] Define _LIBCPP_HAS_C8RTOMB_MBRTOC8 to true if comp…

fc1f3f3

…iling with clang" (llvm#165268) Reverts llvm#152724 The PR was merged with broken pre-commit CI.

[MLIR][XeGPU] Remove leading unit dims from vector ops before unrolli…

07372fc

…ng (llvm#165030) This PR uses the upstream populateCastAwayVectorLeadingOneDimPatterns to remove leading unit dims from vector ops and then do the unrolling/blocking

[MLIR][XeGPU] Fix isEvenlyDistributable API in xegpu (llvm#164907)

c431ee7

[FlowSensitive] [StatusOr] [8/N] Support value ctor and assignment

430d0ed

Reviewers: jvoung, Xazax-hun Reviewed By: jvoung Pull Request: llvm#163894

[LLDB] Disable rosetta test on green dragon

9a0aa92

[InstrProf][NFC] Use -profile-correlate flag in tests (llvm#163299)

defe934

Back in llvm#69493 the `-debug-info-correlate` LLVM flag was deprecated in favor of `-profile-correlate=debug-info`. Update all tests to use this new flag.

Fix Linux kernel build failure for SytemZ. (llvm#165274)

242c716

Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit general-purpose register instead of SystemZ::CC. --------- Co-authored-by: anoopkg6 <[email protected]> Co-authored-by: Ulrich Weigand <[email protected]>

[lldb] Fix TestVTableValue.py test_overwrite_vtable test (llvm#164910)

e903494

Some machines have read-only vtables but this test expects to overwrite them. Use -no_data_const to ensure the vtable is writable

Revert "[LLDB] Disable rosetta test on green dragon"

a868e7e

This reverts commit 9a0aa92.

[LLDB] Disable rosetta test on green dragon

43f119b

[lit] Support more ulimit options

8f1c72d

These are the other options used in compiler-rt that we also need to support. Reviewers: arichardson, petrhosek, ilovepi Reviewed By: ilovepi, arichardson Pull Request: llvm#165122

[AMDGPU] Precommit test for sinking vector ops PR 162580 (llvm#165050)

bce7f7c

Pre-commit test for PR: llvm#162580

[PowerPC] Add Implementation and test for new eTCE instructions (llvm…

30c3a91

…#164002) Add implementation and encoding tests for: - tlbiep - tlbieio - tlbsyncio - ptesyncio

[llvm][clang] Explicitly pass the VFS to sanitizer passes (llvm#165267)

c1f6528

This PR passes the VFS to LLVM's sanitizer passes from Clang, so that the configuration files can be loaded in the same way all other compiler inputs are.

[LLDB] Add debug output to test to diagnose bot failure

d818434

[clang][DebugInfo] Disable objective-CXX tests on AIX and z/OS (llvm#…

90489ad

…164765) These tests not supported on AIX and z/OS, disable them to get the clang-ppc64-aix green

[DA] Fix crash when two loops have different type sizes of becount (l…

616f3b5

…lvm#165021) The type sizes of backedge taken counts for two loops can be different and this is to fix the crash in haveSameSD (llvm#165014). --------- Co-authored-by: Shimin Cui <[email protected]>

[clang][DebugInfo][test] Rename Objective-C test

267b5b8

Changes test name to something more meaningful. In preparation to refactoring the test to check LLVM IR instead of assembly.

[UpdateTestChecks][llc] Support arm64-apple-darwin (llvm#165092)

9abae17

Adds `arm64-apple-darwin` support to `asm.py` matching and removes now invalidated `target-triple-mismatch` test (I dont have another triple supported by llc but not the autogenerator that make this test useful).

vitalybuka and others added 2 commits October 27, 2025 12:30

[RadixTree] Use std::optional for Node::Value (llvm#165299)

dce8252

Don't rely on comparison to singular iterator, it's UB. Fixes bot crashes after llvm#164524.

merge main into amd-staging

7a24a63

ronlieb requested review from a team and dpalermo October 27, 2025 19:50

ronlieb requested review from nicolasvasilache and stellaraccident as code owners October 27, 2025 19:50

dpalermo approved these changes Oct 27, 2025

View reviewed changes

Regen llvm/test/CodeGen/AMDGPU/spill-restore-partial-copy.mir

b7ba98c

ronlieb merged commit 8b46380 into amd-staging Oct 28, 2025
11 checks passed

ronlieb deleted the amd/merge/upstream_merge_20251027143124 branch October 28, 2025 01:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge main into amd-staging #408

merge main into amd-staging #408

Uh oh!

ronlieb commented Oct 27, 2025

Uh oh!

z1-cciauto commented Oct 27, 2025

Uh oh!

z1-cciauto commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

27 participants

merge main into amd-staging #408

merge main into amd-staging #408

Uh oh!

Conversation

ronlieb commented Oct 27, 2025

Uh oh!

z1-cciauto commented Oct 27, 2025

Uh oh!

z1-cciauto commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

27 participants