Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Oct 27, 2025

No description provided.

philnik777 and others added 30 commits October 27, 2025 17:08
…iling with clang" (llvm#165268)

Reverts llvm#152724

The PR was merged with broken pre-commit CI.
…ng (llvm#165030)

This PR uses the upstream populateCastAwayVectorLeadingOneDimPatterns to
remove leading unit dims from vector ops and then do the
unrolling/blocking
Reviewers: jvoung, Xazax-hun

Reviewed By: jvoung

Pull Request: llvm#163894
This is important if the first use of a StatusOr (or Status) is in a
conditional statement, we need a stable value for `ok` from outside of
the conditional statement to make sure we don't use a different variable
in every branch.

Reviewers: jvoung, Xazax-hun

Reviewed By: jvoung

Pull Request: llvm#163898
This tool provides a harness for implementing different strategies that
summarize many remarks (possibly from multiple translation units) into
new summary remarks. The remark summaries can then be viewed using tools
like `opt-viewer`.

The first summary strategy is `--inline-callees`, which generates
remarks that summarize the per-callee inline statistics for functions
that appear in inling remarks. This is useful for troubleshooting
inlining issues/regressions on large codebases.

Pull Request: llvm#160549
…Us (llvm#164761)

Temps needed for the allocatable reduction/privatization init regions
are now allocated on the heap all the time. However, this is performance
killer for GPUs since malloc calls are prohibitively expensive.
Therefore, we should do these allocations on the stack for GPU
reductions.

This is similar to what we do for arrays. Additionally, I am working on
getting reductions-by-ref to work on GPUs which is a bit of a challenge
given the many involved steps (e.g. intra-warp and inter-warp reuctions,
shuffling data from remote lanes, ...). But this is a prerequisite step.
Back in llvm#69493 the
`-debug-info-correlate` LLVM flag was deprecated in favor of
`-profile-correlate=debug-info`. Update all tests to use this new flag.
Linux kernel build fails for SystemZ as output of INLINEASM was GR32Bit
general-purpose register instead of SystemZ::CC.

---------

Co-authored-by: anoopkg6 <[email protected]>
Co-authored-by: Ulrich Weigand <[email protected]>
Some machines have read-only vtables but this test expects to overwrite
them. Use -no_data_const to ensure the vtable is writable
These are the other options used in compiler-rt that we also need to
support.

Reviewers: arichardson, petrhosek, ilovepi

Reviewed By: ilovepi, arichardson

Pull Request: llvm#165122
…#164002)

Add implementation and encoding tests for:
- tlbiep
-  tlbieio
- tlbsyncio
- ptesyncio
When lowering spills / restores, we may end up partially lowering the
spill via copies and the remaining portion with loads/stores. In this
partial lowering case,the implicit-def operands added to the restore
load clobber the preceding copies -- telling MachineCopyPropagation to
delete them. By also attaching an implicit operand to the load, the
COPYs have an artificial use and thus will not be deleted - this is the
same strategy taken in llvm#115285

I'm not sure that we need implicit-def operands on any load restore, but
I guess it may make sense if it needs to be split into multiple loads
and some have been optimized out as containing undef elements.

These implicit / implicit-def operands continue to cause correctness
issues. A previous / ongoing long term plan to remove them is being
addressed via:


https://discourse.llvm.org/t/llvm-codegen-rfc-add-mo-lanemask-type-and-a-new-copy-lanemask-instruction/88021
llvm#151123
llvm#151124
This PR passes the VFS to LLVM's sanitizer passes from Clang, so that
the configuration files can be loaded in the same way all other compiler
inputs are.
The options -fbuiltin and -fno-builtin are not valid for Fortran. However, they
are accepted by gfortran which emits a warning message but continues to compile
the code. Both -fbuiltin and -fno-builtin have been enabled for flang.
Specifying either will result in a warning message being shown but no other
effects. Compilation will proceed normally after these warnings are shown. This
brings flang's behavior in line with gfortran for these options.

Fixes llvm#164766
…lvm#164905)

Turns out there's a bug in the current lldb sources that if you fork,
set the stdio file handles to close on exec and then exec lldb with some
commands and the `--batch` flag, lldb will stall on exit. The first
cause of the bug is that the Python session handler - and probably other
places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles,
and open and close and dup them as needed. NB: I am NOT trying to fix
that bug. I'm not convinced running the lldb driver headless is worth a
lot of effort, it's just as easy to redirect them to /dev/null, which
does work.

But I would like to keep lldb from stalling on the way out when this
happens. The reason we stall is that we have a MainLoop waiting for
signals, and we try to Interrupt it, but because stdio was closed, the
interrupt pipe for the MainLoop gets the file descriptor 0, which gets
closed by the Python session handler if you run some script command. So
the Interrupt fails.

We were running the Write to the interrupt pipe wrapped in
`llvm::cantFail`, but in a no asserts build that just drops the error on
the floor. So then lldb went on to call std::thread::join on the still
active MainLoop, and that stalls

I made Interrupt (and AddCallback & AddPendingCallback) return a bool
for "interrupt success" instead. All the places where code was
requesting termination, I added checks for that failure, and skip the
std::thread::join call on the MainLoop thread, since that is almost
certainly going to stall at this point.

I didn't do the same for the Windows MainLoop, as I don't know if/when
the WSASetEvent call can fail, so I always return true here. I also
didn't turn the test off for Windows. According to the Python docs all
the API's I used should work on Windows... If that turns out not to be
true I'll make the test Darwin/Unix only.
…#164687)

new

```C++
auto aaaaaaaaaaaaaaaaaaaaa = {};  //
auto b                     = [] { //
  return;                         //
};

auto aaaaaaaaaaaaaaaaaaaaa = {};  //
auto b                     = [] { //
  return aaaaaaaaaaaaaaaaaaaaa;   //
};
```

old

```C++
auto aaaaaaaaaaaaaaaaaaaaa = {};  //
auto b                     = [] { //
  return;     //
};

auto aaaaaaaaaaaaaaaaaaaaa = {};                    //
auto b                     = [] {                   //
  return aaaaaaaaaaaaaaaaaaaaa; //
};
```

Aligning a line to another line involves keeping track of the tokens'
positions. Previously the shift was incorrectly added to some tokens
that did not move. Then the comments would end up in the wrong places.
…en size dimension value is 0 (llvm#164878)

Previously, the runtime verification pass would insert assertion
statements with conditions that always evaluate to false for
semantically valid `tensor.extract_slice` operations where one of the
dimensions had a size of 0.

The `tensor.extract_slice` runtime verification logic was
unconditionally generating checks for the position of the last element
(`offset + (size - 1) * stride`). When `size` is 0, this causes the
assertion condition to always be false, leading to runtime failures even
though the operation is semantically valid.

This patch fixes the issue by making the `lastPos` check conditional.
The offset is always verified, but the endpoint check is only performed
when `size > 0` to avoid generating spurious assert statements.

This issue was discovered through LiteRT model, where a dynamic shape
calculation resulted in a zero-sized dimension being passed to
`tensor.extract_slice`.

The following is a simplified IR snippet from the model. After running
the runtime verification pass, an assertion that always fails is
generated because the SSA value `%3` becomes 0.

```mlir
func.func @simple_repro_from_liteRT_model(%arg0: tensor<10x4x1xf32>) -> tensor<?x?x?xf32> {
  %cst = arith.constant dense<0> : tensor<1xi32>
  %cst_0 = arith.constant dense<-1> : tensor<2xi32>
  %c-1 = arith.constant -1 : index
  %c0 = arith.constant 0 : index
  %c10 = arith.constant 10 : index
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %c2 = arith.constant 2 : index
  %0 = tensor.empty() : tensor<3xi32>
  %inserted_slice = tensor.insert_slice %cst into %0[0] [1] [1] : tensor<1xi32> into tensor<3xi32>
  %inserted_slice_1 = tensor.insert_slice %cst_0 into %inserted_slice[1] [2] [1] : tensor<2xi32> into tensor<3xi32>
  %extracted = tensor.extract %inserted_slice_1[%c0] : tensor<3xi32>
  %1 = index.casts %extracted : i32 to index
  %2 = arith.cmpi eq, %1, %c-1 : index
  %3 = arith.select %2, %c10, %1 : index
  %extracted_2 = tensor.extract %inserted_slice_1[%c1] : tensor<3xi32>
  %4 = index.casts %extracted_2 : i32 to index
  %5 = arith.cmpi eq, %4, %c-1 : index
  %6 = arith.select %5, %c4, %4 : index
  %extracted_3 = tensor.extract %inserted_slice_1[%c2] : tensor<3xi32>
  %7 = index.casts %extracted_3 : i32 to index
  %8 = arith.cmpi eq, %7, %c-1 : index
  %9 = arith.select %8, %c1, %7 : index
  %extracted_slice = tensor.extract_slice %arg0[0, 0, 0] [%3, %6, %9] [1, 1, 1] : tensor<10x4x1xf32> to tensor<?x?x?xf32>
  return %extracted_slice : tensor<?x?x?xf32>
}
```

The issue can be reproduced more simply with the following test case,
where `dim_0` is `0`. When the runtime verification pass is applied to
this code with `dim_0 = 0`, it generates an assertion that will always
fail at runtime.

```mlir
func.func @extract_slice_zero_size_dim(%arg0: tensor<10x4x1xf32>,
                                      %dim_0: index,
                                      %dim_1: index,
                                      %dim_2: index) {
  %slice = tensor.extract_slice %arg0[0, 0, 0] [%dim_0, %dim_1, %dim_2] [1, 1, 1]
    : tensor<10x4x1xf32> to tensor<?x?x?xf32>
  return
}

func.func @test_zero_size_extraction() {
  %input = arith.constant dense<1.0> : tensor<10x4x1xf32>
  // Define slice dimensions: 0x4x1 (zero-size in first dimension)
  %dim_0 = arith.constant 0 : index
  %dim_1 = arith.constant 4 : index
  %dim_2 = arith.constant 1 : index
  func.call @extract_slice_zero_size_dim(%input, %dim_0, %dim_1, %dim_2)
    : (tensor<10x4x1xf32>, index, index, index) -> ()
  return
}
```

P.S. We probably have a similar issue with `memref.subview`. I will
check this and send a separate PR for the issue.

---------

Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>
…dimension value is 0 (llvm#164897)

Previously, the runtime verification pass would insert assertion
statements with conditions that always evaluate to false for
semantically valid `memref.subview` operations where one of the
dimensions had a size of 0.

The `memref.subview` runtime verification logic was unconditionally
generating checks for the position of the last element (`offset + (size
- 1) * stride`). When `size` is 0, this causes the assertion condition
to always be false, leading to runtime failures even though the
operation is semantically valid.

This patch fixes the issue by making the `lastPos` check conditional.
The offset is always verified, but the endpoint check is only performed
when `size > 0` to avoid generating spurious assert statements.

This issue was discovered through a LiteRT model, where a dynamic shape
calculation resulted in a zero-sized dimension being passed to
`memref.subview`. The following is a simplified IR snippet from the
model. After running the runtime verification pass, an assertion that
always fails is generated because the SSA value `%5` becomes 0.

```mlir
module {
  memref.global "private" constant @__constant_2xi32 : memref<2xi32> = dense<-1> {alignment = 64 : i64}
  memref.global "private" constant @__constant_1xi32 : memref<1xi32> = dense<0> {alignment = 64 : i64}
  func.func @simpleRepro(%arg0: memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>>) -> memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> {
    %c2 = arith.constant 2 : index
    %c4 = arith.constant 4 : index
    %c1 = arith.constant 1 : index
    %c10 = arith.constant 10 : index
    %c0 = arith.constant 0 : index
    %c-1 = arith.constant -1 : index
    %0 = memref.get_global @__constant_1xi32 : memref<1xi32>
    %1 = memref.get_global @__constant_2xi32 : memref<2xi32>
    %alloca = memref.alloca() {alignment = 64 : i64} : memref<3xi32>
    %subview = memref.subview %alloca[0] [1] [1] : memref<3xi32> to memref<1xi32, strided<[1]>>
    memref.copy %0, %subview : memref<1xi32> to memref<1xi32, strided<[1]>>
    %subview_0 = memref.subview %alloca[1] [2] [1] : memref<3xi32> to memref<2xi32, strided<[1], offset: 1>>
    memref.copy %1, %subview_0 : memref<2xi32> to memref<2xi32, strided<[1], offset: 1>>
    %2 = memref.load %alloca[%c0] : memref<3xi32>
    %3 = index.casts %2 : i32 to index
    %4 = arith.cmpi eq, %3, %c-1 : index
    %5 = arith.select %4, %c10, %3 : index
    %6 = memref.load %alloca[%c1] : memref<3xi32>
    %7 = index.casts %6 : i32 to index
    %8 = arith.cmpi eq, %7, %c-1 : index
    %9 = arith.select %8, %c4, %7 : index
    %10 = memref.load %alloca[%c2] : memref<3xi32>
    %11 = index.casts %10 : i32 to index
    %12 = arith.cmpi eq, %11, %c-1 : index
    %13 = arith.select %12, %c1, %11 : index
    %subview_1 = memref.subview %arg0[0, 0, 0] [%5, %9, %13] [1, 1, 1] : memref<10x4x1xf32, strided<[?, ?, ?], offset: ?>> to memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
    return %subview_1 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
  }
}
```

P.S. This is a similar issue to the one fixed for `tensor.extract_slice`
in llvm#164878

---------

Co-authored-by: Hanumanth Hanumantharayappa <[email protected]>
…LVM IR (llvm#165286)

There's a couple of tests like this.

This patch series renames these to something more descriptive and
adjusts the tests to check IR. Currently the tests check raw assembly
output (not even dwarfdump). Which most likely hid some bugs around
property debug-info.
…164765)

These tests not supported on AIX and z/OS, disable them to get the
clang-ppc64-aix green
…lvm#165021)

The type sizes of backedge taken counts for two loops can be different
and this is to fix the crash in haveSameSD
(llvm#165014).

---------

Co-authored-by: Shimin Cui <[email protected]>
Changes test name to something more meaningful. In preparation to
refactoring the test to check LLVM IR instead of assembly.
Currently ExecutionEngine tries to dump all functions declared in the
module, even those which are "external" (i.e., linked/loaded at
runtime). E.g.

```mlir
func.func private @printF32(f32)
func.func @supported_arg_types(%arg0: i32, %arg1: f32) {
  call @printF32(%arg1) : (f32) -> ()
  return
}
```
fails with
```
Could not compile printF32:
  Symbols not found: [ __mlir_printF32 ]
Program aborted due to an unhandled Error:
Symbols not found: [ __mlir_printF32 ]
```
even though `printF32` can be provided at final build time (i.e., when
the object file is linked to some executable or shlib). E.g, if our own
`libmlir_c_runner_utils` is linked.

So just skip functions which have no bodies during dump (i.e., are decls
without defns).
Adds `arm64-apple-darwin` support to `asm.py` matching and removes now
invalidated `target-triple-mismatch` test (I dont have another triple
supported by llc but not the autogenerator that make this test useful).
vitalybuka and others added 2 commits October 27, 2025 12:30
Don't rely on comparison to singular iterator, it's UB.

Fixes bot crashes after
llvm#164524.
@z1-cciauto
Copy link
Collaborator

@z1-cciauto
Copy link
Collaborator

@ronlieb ronlieb merged commit 8b46380 into amd-staging Oct 28, 2025
11 checks passed
@ronlieb ronlieb deleted the amd/merge/upstream_merge_20251027143124 branch October 28, 2025 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.