forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 76
merge main into amd-staging #408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ronlieb
merged 33 commits into
amd-staging
from
amd/merge/upstream_merge_20251027143124
Oct 28, 2025
Merged
merge main into amd-staging #408
ronlieb
merged 33 commits into
amd-staging
from
amd/merge/upstream_merge_20251027143124
Oct 28, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…iling with clang" (llvm#165268) Reverts llvm#152724 The PR was merged with broken pre-commit CI.
…ng (llvm#165030) This PR uses the upstream populateCastAwayVectorLeadingOneDimPatterns to remove leading unit dims from vector ops and then do the unrolling/blocking
Reviewers: jvoung, Xazax-hun Reviewed By: jvoung Pull Request: llvm#163894
This is important if the first use of a StatusOr (or Status) is in a conditional statement, we need a stable value for `ok` from outside of the conditional statement to make sure we don't use a different variable in every branch. Reviewers: jvoung, Xazax-hun Reviewed By: jvoung Pull Request: llvm#163898
This tool provides a harness for implementing different strategies that summarize many remarks (possibly from multiple translation units) into new summary remarks. The remark summaries can then be viewed using tools like `opt-viewer`. The first summary strategy is `--inline-callees`, which generates remarks that summarize the per-callee inline statistics for functions that appear in inling remarks. This is useful for troubleshooting inlining issues/regressions on large codebases. Pull Request: llvm#160549
…Us (llvm#164761) Temps needed for the allocatable reduction/privatization init regions are now allocated on the heap all the time. However, this is performance killer for GPUs since malloc calls are prohibitively expensive. Therefore, we should do these allocations on the stack for GPU reductions. This is similar to what we do for arrays. Additionally, I am working on getting reductions-by-ref to work on GPUs which is a bit of a challenge given the many involved steps (e.g. intra-warp and inter-warp reuctions, shuffling data from remote lanes, ...). But this is a prerequisite step.
Back in llvm#69493 the `-debug-info-correlate` LLVM flag was deprecated in favor of `-profile-correlate=debug-info`. Update all tests to use this new flag.
Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit general-purpose register instead of SystemZ::CC. --------- Co-authored-by: anoopkg6 <[email protected]> Co-authored-by: Ulrich Weigand <[email protected]>
Some machines have read-only vtables but this test expects to overwrite them. Use -no_data_const to ensure the vtable is writable
This reverts commit 9a0aa92.
These are the other options used in compiler-rt that we also need to support. Reviewers: arichardson, petrhosek, ilovepi Reviewed By: ilovepi, arichardson Pull Request: llvm#165122
Pre-commit test for PR: llvm#162580
…#164002) Add implementation and encoding tests for: - tlbiep - tlbieio - tlbsyncio - ptesyncio
When lowering spills / restores, we may end up partially lowering the spill via copies and the remaining portion with loads/stores. In this partial lowering case,the implicit-def operands added to the restore load clobber the preceding copies -- telling MachineCopyPropagation to delete them. By also attaching an implicit operand to the load, the COPYs have an artificial use and thus will not be deleted - this is the same strategy taken in llvm#115285 I'm not sure that we need implicit-def operands on any load restore, but I guess it may make sense if it needs to be split into multiple loads and some have been optimized out as containing undef elements. These implicit / implicit-def operands continue to cause correctness issues. A previous / ongoing long term plan to remove them is being addressed via: https://discourse.llvm.org/t/llvm-codegen-rfc-add-mo-lanemask-type-and-a-new-copy-lanemask-instruction/88021 llvm#151123 llvm#151124
This PR passes the VFS to LLVM's sanitizer passes from Clang, so that the configuration files can be loaded in the same way all other compiler inputs are.
The options -fbuiltin and -fno-builtin are not valid for Fortran. However, they are accepted by gfortran which emits a warning message but continues to compile the code. Both -fbuiltin and -fno-builtin have been enabled for flang. Specifying either will result in a warning message being shown but no other effects. Compilation will proceed normally after these warnings are shown. This brings flang's behavior in line with gfortran for these options. Fixes llvm#164766
…lvm#164905) Turns out there's a bug in the current lldb sources that if you fork, set the stdio file handles to close on exec and then exec lldb with some commands and the `--batch` flag, lldb will stall on exit. The first cause of the bug is that the Python session handler - and probably other places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles, and open and close and dup them as needed. NB: I am NOT trying to fix that bug. I'm not convinced running the lldb driver headless is worth a lot of effort, it's just as easy to redirect them to /dev/null, which does work. But I would like to keep lldb from stalling on the way out when this happens. The reason we stall is that we have a MainLoop waiting for signals, and we try to Interrupt it, but because stdio was closed, the interrupt pipe for the MainLoop gets the file descriptor 0, which gets closed by the Python session handler if you run some script command. So the Interrupt fails. We were running the Write to the interrupt pipe wrapped in `llvm::cantFail`, but in a no asserts build that just drops the error on the floor. So then lldb went on to call std::thread::join on the still active MainLoop, and that stalls I made Interrupt (and AddCallback & AddPendingCallback) return a bool for "interrupt success" instead. All the places where code was requesting termination, I added checks for that failure, and skip the std::thread::join call on the MainLoop thread, since that is almost certainly going to stall at this point. I didn't do the same for the Windows MainLoop, as I don't know if/when the WSASetEvent call can fail, so I always return true here. I also didn't turn the test off for Windows. According to the Python docs all the API's I used should work on Windows... If that turns out not to be true I'll make the test Darwin/Unix only.
…#164687) new ```C++ auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return; // }; auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return aaaaaaaaaaaaaaaaaaaaa; // }; ``` old ```C++ auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return; // }; auto aaaaaaaaaaaaaaaaaaaaa = {}; // auto b = [] { // return aaaaaaaaaaaaaaaaaaaaa; // }; ``` Aligning a line to another line involves keeping track of the tokens' positions. Previously the shift was incorrectly added to some tokens that did not move. Then the comments would end up in the wrong places.
…en size dimension value is 0 (llvm#164878) Previously, the runtime verification pass would insert assertion statements with conditions that always evaluate to false for semantically valid `tensor.extract_slice` operations where one of the dimensions had a size of 0. The `tensor.extract_slice` runtime verification logic was unconditionally generating checks for the position of the last element (`offset + (size - 1) * stride`). When `size` is 0, this causes the assertion condition to always be false, leading to runtime failures even though the operation is semantically valid. This patch fixes the issue by making the `lastPos` check conditional. The offset is always verified, but the endpoint check is only performed when `size > 0` to avoid generating spurious assert statements. This issue was discovered through LiteRT model, where a dynamic shape calculation resulted in a zero-sized dimension being passed to `tensor.extract_slice`. The following is a simplified IR snippet from the model. After running the runtime verification pass, an assertion that always fails is generated because the SSA value `%3` becomes 0. ```mlir func.func @simple_repro_from_liteRT_model(%arg0: tensor<10x4x1xf32>) -> tensor<?x?x?xf32> { %cst = arith.constant dense<0> : tensor<1xi32> %cst_0 = arith.constant dense<-1> : tensor<2xi32> %c-1 = arith.constant -1 : index %c0 = arith.constant 0 : index %c10 = arith.constant 10 : index %c1 = arith.constant 1 : index %c4 = arith.constant 4 : index %c2 = arith.constant 2 : index %0 = tensor.empty() : tensor<3xi32> %inserted_slice = tensor.insert_slice %cst into %0[0] [1] [1] : tensor<1xi32> into tensor<3xi32> %inserted_slice_1 = tensor.insert_slice %cst_0 into %inserted_slice[1] [2] [1] : tensor<2xi32> into tensor<3xi32> %extracted = tensor.extract %inserted_slice_1[%c0] : tensor<3xi32> %1 = index.casts %extracted : i32 to index %2 = arith.cmpi eq, %1, %c-1 : index %3 = arith.select %2, %c10, %1 : index %extracted_2 = tensor.extract %inserted_slice_1[%c1] : tensor<3xi32> %4 = index.casts %extracted_2 : i32 to index %5 = arith.cmpi eq, %4, %c-1 : index %6 = arith.select %5, %c4, %4 : index %extracted_3 = tensor.extract %inserted_slice_1[%c2] : tensor<3xi32> %7 = index.casts %extracted_3 : i32 to index %8 = arith.cmpi eq, %7, %c-1 : index %9 = arith.select %8, %c1, %7 : index %extracted_slice = tensor.extract_slice %arg0[0, 0, 0] [%3, %6, %9] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32> return %extracted_slice : tensor<?x?x?xf32> } ``` The issue can be reproduced more simply with the following test case, where `dim_0` is `0`. When the runtime verification pass is applied to this code with `dim_0 = 0`, it generates an assertion that will always fail at runtime. ```mlir func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>, %dim_0: index, %dim_1: index, %dim_2: index) { %slice = tensor.extract_slice %arg0[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32> return } func.func @test_zero_size_extraction() { %input = arith.constant dense<1.0> : tensor<10x4x1xf32> // Define slice dimensions: 0x4x1 (zero-size in first dimension) %dim_0 = arith.constant 0 : index %dim_1 = arith.constant 4 : index %dim_2 = arith.constant 1 : index func.call @extract_slice_zero_size_dim(%input, %dim_0, %dim_1, %dim_2) : (tensor<10x4x1xf32>, index, index, index) -> () return } ``` P.S. We probably have a similar issue with `memref.subview`. I will check this and send a separate PR for the issue. --------- Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>
…dimension value is 0 (llvm#164897) Previously, the runtime verification pass would insert assertion statements with conditions that always evaluate to false for semantically valid `memref.subview` operations where one of the dimensions had a size of 0. The `memref.subview` runtime verification logic was unconditionally generating checks for the position of the last element (`offset + (size - 1) * stride`). When `size` is 0, this causes the assertion condition to always be false, leading to runtime failures even though the operation is semantically valid. This patch fixes the issue by making the `lastPos` check conditional. The offset is always verified, but the endpoint check is only performed when `size > 0` to avoid generating spurious assert statements. This issue was discovered through a LiteRT model, where a dynamic shape calculation resulted in a zero-sized dimension being passed to `memref.subview`. The following is a simplified IR snippet from the model. After running the runtime verification pass, an assertion that always fails is generated because the SSA value `%5` becomes 0. ```mlir module { memref.global "private" constant @__constant_2xi32 : memref<2xi32> = dense<-1> {alignment = 64 : i64} memref.global "private" constant @__constant_1xi32 : memref<1xi32> = dense<0> {alignment = 64 : i64} func.func @simpleRepro(%arg0: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>) -> memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> { %c2 = arith.constant 2 : index %c4 = arith.constant 4 : index %c1 = arith.constant 1 : index %c10 = arith.constant 10 : index %c0 = arith.constant 0 : index %c-1 = arith.constant -1 : index %0 = memref.get_global @__constant_1xi32 : memref<1xi32> %1 = memref.get_global @__constant_2xi32 : memref<2xi32> %alloca = memref.alloca() {alignment = 64 : i64} : memref<3xi32> %subview = memref.subview %alloca[0] [1] [1] : memref<3xi32> to memref<1xi32, strided<[1]>> memref.copy %0, %subview : memref<1xi32> to memref<1xi32, strided<[1]>> %subview_0 = memref.subview %alloca[1] [2] [1] : memref<3xi32> to memref<2xi32, strided<[1], offset: 1>> memref.copy %1, %subview_0 : memref<2xi32> to memref<2xi32, strided<[1], offset: 1>> %2 = memref.load %alloca[%c0] : memref<3xi32> %3 = index.casts %2 : i32 to index %4 = arith.cmpi eq, %3, %c-1 : index %5 = arith.select %4, %c10, %3 : index %6 = memref.load %alloca[%c1] : memref<3xi32> %7 = index.casts %6 : i32 to index %8 = arith.cmpi eq, %7, %c-1 : index %9 = arith.select %8, %c4, %7 : index %10 = memref.load %alloca[%c2] : memref<3xi32> %11 = index.casts %10 : i32 to index %12 = arith.cmpi eq, %11, %c-1 : index %13 = arith.select %12, %c1, %11 : index %subview_1 = memref.subview %arg0[0, 0, 0] [%5, %9, %13] [1, 1, 1] : memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> return %subview_1 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> } } ``` P.S. This is a similar issue to the one fixed for `tensor.extract_slice` in llvm#164878 --------- Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>
…LVM IR (llvm#165286) There's a couple of tests like this. This patch series renames these to something more descriptive and adjusts the tests to check IR. Currently the tests check raw assembly output (not even dwarfdump). Which most likely hid some bugs around property debug-info.
…164765) These tests not supported on AIX and z/OS, disable them to get the clang-ppc64-aix green
…lvm#165021) The type sizes of backedge taken counts for two loops can be different and this is to fix the crash in haveSameSD (llvm#165014). --------- Co-authored-by: Shimin Cui <[email protected]>
Changes test name to something more meaningful. In preparation to refactoring the test to check LLVM IR instead of assembly.
Currently ExecutionEngine tries to dump all functions declared in the
module, even those which are "external" (i.e., linked/loaded at
runtime). E.g.
```mlir
func.func private @printF32(f32)
func.func @supported_arg_types(%arg0: i32, %arg1: f32) {
call @printF32(%arg1) : (f32) -> ()
return
}
```
fails with
```
Could not compile printF32:
Symbols not found: [ __mlir_printF32 ]
Program aborted due to an unhandled Error:
Symbols not found: [ __mlir_printF32 ]
```
even though `printF32` can be provided at final build time (i.e., when
the object file is linked to some executable or shlib). E.g, if our own
`libmlir_c_runner_utils` is linked.
So just skip functions which have no bodies during dump (i.e., are decls
without defns).
Adds `arm64-apple-darwin` support to `asm.py` matching and removes now invalidated `target-triple-mismatch` test (I dont have another triple supported by llc but not the autogenerator that make this test useful).
Don't rely on comparison to singular iterator, it's UB. Fixes bot crashes after llvm#164524.
Collaborator
dpalermo
approved these changes
Oct 27, 2025
Collaborator
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.