Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 13, 2025

No description provided.

lplewa and others added 30 commits November 13, 2025 15:56
Tracing requires liboffload to be initialized, so calling
isTracingEnabled() before olInit always returns false. This caused the
first trace log to look like:
```
-> OL_SUCCESS
```
instead of:
```
---> olInit() -> OL_SUCCESS
```
This patch moves the pre-call trace print for olInit so it is emitted
only after initialization.

It would be possible to add extra logic to detect whether liboffload is
already initialized and only postpone the first pre-call print, but this
would add unnecessary complexity, especially since this is tablegen
code. The difference would matter only in the unlikely case of a crash
during a second olInit call.

---------

Co-authored-by: Joseph Huber <[email protected]>
Only the fortran source files in flang/test/Intrinsics have been modified. The
other files in flang/test will be cleaned up in subsequent commits
- Adopt ifdef and namespace emitters in SubtargeEmitter.
- To aid that, factor out emission of different sections of the code
into individual helper functions.
Prepare a 'this' for CXXDefaultInitExprs
…e reduction plans (llvm#165913)

The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case.

Fixes llvm#165359.
This commit adds optimized assembly versions of single-precision float
multiplication and division. Both functions are implemented in a style
that can be assembled as either of Arm and Thumb2; for multiplication, a
separate implementation is provided for Thumb1. Also, extensive new
tests are added for multiplication and division.

These implementations can be removed from the build by defining the
cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF.

Outlying parts of the functionality which are not on the fast path, such
as NaN handling and underflow, are handled in helper functions written
in C. These can be shared between the Arm/Thumb2 and Thumb1
implementations, and also reused by other optimized assembly functions
we hope to add in future.
…ve (llvm#167875)

This prevents the backend from crashing for basic uses of __SVCount_t
type (e.g., as function arguments), without +sve2p1 or +sme2.
    
Fixes llvm#167462
Without this patch, SmallDenseMap::grow has two separate code paths to
grow the bucket array.  The code path to handle the small mode has its
own traversal over the bucket array.  This patch simplifies this logic
as follows:

1. Allocate a temporary instance of SmallDenseMap.
2. Move valid key/value pairs to the temporary instance.
3. Move LargeRep to *this.

Remarks:

- This patch adds moveFromImpl to move key/value pairs.
  moveFromOldBuckets is updated to use the new helper function.

- This patch adds a private constructor to SmallDenseMap that takes an
  exact number of buckets, accompanied by tag ExactBucketCount.

- This patch adds a fast path to deallocateBuckets in case
  getLargeRep()->NumBuckets == 0, just like destroyAll.  This path is
  used to destruct zombie instances after moves.

- In somewhat rare cases, we "grow" from the small mode to the small
  mode when there are many tombstones in the inline storage.  This is
  handled with another call to moveFrom.
…okupOrTrackRegister (llvm#167841)

The LocID for registers is just the register ID. The getLocID function
is supposed to hide this detail, but it wasn't being used consistently.

This avoids a bunch of implicit casts from Register or MCRegister to
unsigned.
… (NFC) (llvm#155262)

CMN also has a function like this, we should do the same with CMP.
This commit adds a new `ValueMatcher` class that can be used in gtest
matching contexts to match against `lldb_private::Value` objects. We
always match against the values `value_type` and `context_type`. For
HostAddress values we will also match against the expected host buffer
contents. For Scalar, FileAddress, and LoadAddress values we match
against an expected Scalar value.

The matcher is used to improve the quality of the tests in the
`DwarfExpressionTest.cpp` file. Previously, the local `Evaluate`
function would return an `Expected<Scalar>` value which makes it hard to
verify that we actually get a Value of the expected type without adding
custom evaluation code. Now we return an `Expected<Value>` so that we
can match against the full value contents.

The resulting change improves the quality of the existing checks and in
some cases eliminates the need for special code to explicitly check
value types.

I followed the gtest
[guide](https://google.github.io/googletest/gmock_cook_book.html#writing-new-monomorphic-matchers)
for writing a new value matcher.
The optimized version of xsgetn for basic_filebuf added in llvm#165223 has
an issue where if the reads come from both the buffer and the
filesystem it returns the wrong number of characters. This patch should
address the issue.
This proposal adds a `cl::opt` CLI flag
`-bpf-allow-misaligned-mem-access` to BPF target that lets users enable
allowing misaligned memory accesses.

The motivation behind the proposal is user space eBPF VMs (interpreters
or JITs running in user space) typically run on real CPUs where
unaligned memory accesses are acceptable (or handled efficiently) and
can be enabled to simplify lowering and improve performance. In
contrast, kernel eBPF must obey verifier constraints and
platform-specific alignment restrictions.

A new CLI option keeps kernel behavior unchanged while giving userspace
VMs an explicit opt-in to enable more permissive codegen. It supports
both use-cases without diverging codebases.
…7763)

As mentioned in comments for llvm#164913, the `if()` statements here
can't be externally triggered, since these writeback registers are
passed in from the caller. So they should really be `assert()`s so
it's obvious we don't need testcases for them, and more optimal.
Reverts llvm#161546

One of the buildbots reported a cmake error I don't understand, and
which I didn't get in my own test builds:
```
CMake Error at /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/compiler-rt/cmake/Modules/CheckAssemblerFlag.cmake:23 (try_compile):
  COMPILE_DEFINITIONS specified on a srcdir type TRY_COMPILE
```

My best guess is that the thing I did in `CheckAssemblerFlag.cmake` only
works on some versions of cmake. But I don't understand the problem well
enough to fix it quickly, so I'm reverting the whole patch and will
reland it later.
…m#167540)

In adopting `[[clang::nonblocking]]` there's been some user confusion.
Changes to address `-Wfunction-effects` warnings are often pure
annotation, with no runtime effect. Changes to avoid
`-Wperf-constraint-implies-noexcept` warnings are risky: adding
`noexcept` creates a new potential for the program to crash. In
retrospect, `-Wperf-constraint-implies-noexcept` shouldn't have been
made part of `-Wall`.

---------

Co-authored-by: Doug Wyatt <[email protected]>
This changes muls by `3 << C` from `(X << C + 2) - (X << C)`
to `(X << C + 1) + (X << C)`.
If Zba is available, the output is not affected as we emit
`(shl (sh1add X, X), C)` instead.

There are two advantages:
- ADD is more compressible
- Often a reduced instruction count, by a heuristic that
  `(X << C + 1)` is more likely to have another use than `(X << C + 2)`
…7898)

So that changing the type of the container (planned in a future patch)
is less intrusive.
Upstream the basic support for the ExtVectorType element expr
…actor in the way of a glue. (llvm#167805)

In the new test, we're trying to fold a load and a X86ISD::CALL. The
call has a CopyToReg glued to it. The load and the call have different
input chains so they need to be merged. This results in a TokenFactor
that gets put between the CopyToReg and the final CALLm instruction. The
DAG scheduler can't handle that.

The load here was created by legalization of the extract_element using a
stack temporary store and load. A normal IR load would be chained into
call sequence by SelectionDAGBuilder. This would usually have the load
chained in before the CopyToReg. The store/load created by legalization
don't get chained into the rest of the DAG.

Fixes llvm#63790
AMDGPU: Start to use AV classes for unknown vector class

Use AGPR+VGPR superclasses for gfx90a+. The type used
for the class should be the broadest possible class, to
be contextually restricted later. InstrEmitter clamps these
to the common subclass of the context use instructions, so we're
best off using the broadest possible class for all types.

Note this does very little because we only use VGPR classes
for FP types (though this doesn't particularly make any sense),
and we legalize normal loads and stores to integer.
XeGPU and XeVM dialect has assigned maintainers, but related folders
currently lack code owners.
Add charithaintc and Jianhui-Li as code owner for XeGPU related folders.
Add silee2 as code owner for XeVM related folders.
Note:
charithaintc is current maintainer of XeGPU dialect.
silee2 is current maintainer of XeVM dialect.
This patch updates various LLVM headers to properly add the `LLVM_ABI`
and `LLVM_ABI_FOR_TEST` annotations to build LLVM as a DLL on Windows.

This effort is tracked in llvm#109483.
mtrofin and others added 22 commits November 13, 2025 12:46
Biasing towards the native `sqrt`​ not returning NaN.  


Issue llvm#147390
Test case for vectorizing std::find_if with
builtin_assume_dereferenceable. Currently not vectorized.

https://godbolt.org/z/6jbsd4EjT
…lvm#167788)

If a caller has locked memory, then the madvise call will fail. In that
case, zero the memory so that we don't return non-zeroed memory for
calloc calls since we thought the memory had been released.
Reviewers: arsenm, RKSimon, paperchalice, phoebewang

Reviewed By: arsenm, RKSimon

Pull Request: llvm#167911
…67797)

The fixes a TOCTOU bug in the code that initializes shadow memory in
ASAN:


https://github.com/llvm/llvm-project/blob/4b05581bae0e3432cfa514788418fb2fc2144904/compiler-rt/lib/asan/asan_shadow_setup.cpp#L66-L91

1. During initialization, we call `FindDynamicShadowStart` to search the
memory mapping for enough space to dynamically allocate shadow memory.
2. We call `MemoryRangeIsAvailable(shadow_start, kHighShadowEnd);`,
which goes into `MemoryMappingLayout`.
3. We actually map the shadow with `ReserveShadowMemoryRange`.

In step 2, `MemoryMappingLayout` makes various allocations using the
internal allocator. This can cause the allocator to map more memory! In
some cases, this can actually allocate memory that overlaps with the
shadow region returned by` FindDynamicShadowStart` in step 1. This is
not actually fatal, but it memory corruption; MAP_FIXED is allowed to
overlap other regions, and the effect is any overlapping memory is
zeroed.

------

To address this, this PR implements `MemoryRangeIsAvailable` on Darwin
without any heap allocations:

- Move `IntervalsAreSeparate` into sanitizer_common.h
- Guard existing sanitizer_posix implementation of
`MemoryRangeIsAvailable` behind !SANITIZER_APPLE
- `IsAddressInMappedRegion` in sanitizer_mac becomes
`MemoryRangeIsAvailable`, which also checks for overlap with the DYLD
shared cache.

After this fix, it should be possible to re-land llvm#166005, which
triggered this issue on the x86 iOS simulators.

rdar://164208439
)

The IR was not able to be roundtrip through mlir-opt. Update the
assembly format and add round trip tests.

```
mlir-opt mlir/test/Target/LLVMIR/nvvm/barrier.mlir | mlir-opt
<stdin>:6:5: error: cannot name an operation with no results
    %0 = nvvm.barrier <and> %arg2 -> i32
```
Change in llvm#166148 caused breaks
for some other types.
Specifically this error was seen in a downstream project
```
 _ods_ir.OpOperandList[_ods_ir.IntegerType]:

TypeError: type 'iree.compiler._mlir_libs._mlir.ir.OpOperandList' is not subscriptable

```
This PR tries to make those changes not affect the other types

---------

Signed-off-by: Nirvedh Meshram <[email protected]>
There's only a single RUN line in the test, use the more compact default CHECK.
The minimum supported SWIG version is 4.0 so there's no need for using a
separate file anymore.
This PR upstreams the intrinsics `_mm_prefetch`, `_mm_(l|m)fenche`,
`_mm_pause` and `_mm_clflush` from the incubator repository.
Move utility to helper for re-use in follow-up patches.
)

This adds some minimal code to mark locations where handling is needed
for Dtor_VectorDeleting type dtors, which were added in
llvm#165598

This is not a comprehensive mark-up of the missing code, as some code
will be needed in places where the surrounding function has larger
missing pieces in CIR currently.

This fixes a warning for an uncovered switch case that was causing CI
builds to fail.
…164458)

before

```C++
void foo(int   name, // name
         float name, // name
         int   name)   // name
{}
```

after

```C++
void foo(int   name, // name
         float name, // name
         int   name) // name
{}
```

Fixes llvm#85123.

As the bug report explained, the procedure for aligning the function
parameters previously failed to update `StartOfTokenColumn`.
@ronlieb ronlieb requested review from a team and dpalermo November 13, 2025 23:54
@z1-cciauto
Copy link
Collaborator

…lvm#166483)"

bad virtual register:
clang-313307_simple_arr: Make Failed
clang-313307_simple_spmd: Make Failed
clang-337336: Make Failed
clang-387196: Make Failed
flang-sollve-bug1: Make Failed
MasterBarrierO0: Make Failed

This reverts commit c7019c7.
@z1-cciauto
Copy link
Collaborator

@ronlieb
Copy link
Collaborator Author

ronlieb commented Nov 14, 2025

!PSDB

@z1-cciauto
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.