forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 75
merge main into amd-staging #585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ronlieb
wants to merge
82
commits into
amd-staging
Choose a base branch
from
amd/merge/upstream_merge_20251113172504
base: amd-staging
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Tracing requires liboffload to be initialized, so calling isTracingEnabled() before olInit always returns false. This caused the first trace log to look like: ``` -> OL_SUCCESS ``` instead of: ``` ---> olInit() -> OL_SUCCESS ``` This patch moves the pre-call trace print for olInit so it is emitted only after initialization. It would be possible to add extra logic to detect whether liboffload is already initialized and only postpone the first pre-call print, but this would add unnecessary complexity, especially since this is tablegen code. The difference would matter only in the unlikely case of a crash during a second olInit call. --------- Co-authored-by: Joseph Huber <[email protected]>
Only the fortran source files in flang/test/Intrinsics have been modified. The other files in flang/test will be cleaned up in subsequent commits
- Adopt ifdef and namespace emitters in SubtargeEmitter. - To aid that, factor out emission of different sections of the code into individual helper functions.
Prepare a 'this' for CXXDefaultInitExprs
…e reduction plans (llvm#165913) The TypeSwitch for extracting the Opcode now handles the `VPReductionRecipe` case. Fixes llvm#165359.
This commit adds optimized assembly versions of single-precision float multiplication and division. Both functions are implemented in a style that can be assembled as either of Arm and Thumb2; for multiplication, a separate implementation is provided for Thumb1. Also, extensive new tests are added for multiplication and division. These implementations can be removed from the build by defining the cmake variable COMPILER_RT_ARM_OPTIMIZED_FP=OFF. Outlying parts of the functionality which are not on the fast path, such as NaN handling and underflow, are handled in helper functions written in C. These can be shared between the Arm/Thumb2 and Thumb1 implementations, and also reused by other optimized assembly functions we hope to add in future.
…ve (llvm#167875) This prevents the backend from crashing for basic uses of __SVCount_t type (e.g., as function arguments), without +sve2p1 or +sme2. Fixes llvm#167462
Without this patch, SmallDenseMap::grow has two separate code paths to grow the bucket array. The code path to handle the small mode has its own traversal over the bucket array. This patch simplifies this logic as follows: 1. Allocate a temporary instance of SmallDenseMap. 2. Move valid key/value pairs to the temporary instance. 3. Move LargeRep to *this. Remarks: - This patch adds moveFromImpl to move key/value pairs. moveFromOldBuckets is updated to use the new helper function. - This patch adds a private constructor to SmallDenseMap that takes an exact number of buckets, accompanied by tag ExactBucketCount. - This patch adds a fast path to deallocateBuckets in case getLargeRep()->NumBuckets == 0, just like destroyAll. This path is used to destruct zombie instances after moves. - In somewhat rare cases, we "grow" from the small mode to the small mode when there are many tombstones in the inline storage. This is handled with another call to moveFrom.
…okupOrTrackRegister (llvm#167841) The LocID for registers is just the register ID. The getLocID function is supposed to hide this detail, but it wasn't being used consistently. This avoids a bunch of implicit casts from Register or MCRegister to unsigned.
… (NFC) (llvm#155262) CMN also has a function like this, we should do the same with CMP.
This commit adds a new `ValueMatcher` class that can be used in gtest matching contexts to match against `lldb_private::Value` objects. We always match against the values `value_type` and `context_type`. For HostAddress values we will also match against the expected host buffer contents. For Scalar, FileAddress, and LoadAddress values we match against an expected Scalar value. The matcher is used to improve the quality of the tests in the `DwarfExpressionTest.cpp` file. Previously, the local `Evaluate` function would return an `Expected<Scalar>` value which makes it hard to verify that we actually get a Value of the expected type without adding custom evaluation code. Now we return an `Expected<Value>` so that we can match against the full value contents. The resulting change improves the quality of the existing checks and in some cases eliminates the need for special code to explicitly check value types. I followed the gtest [guide](https://google.github.io/googletest/gmock_cook_book.html#writing-new-monomorphic-matchers) for writing a new value matcher.
The optimized version of xsgetn for basic_filebuf added in llvm#165223 has an issue where if the reads come from both the buffer and the filesystem it returns the wrong number of characters. This patch should address the issue.
This proposal adds a `cl::opt` CLI flag `-bpf-allow-misaligned-mem-access` to BPF target that lets users enable allowing misaligned memory accesses. The motivation behind the proposal is user space eBPF VMs (interpreters or JITs running in user space) typically run on real CPUs where unaligned memory accesses are acceptable (or handled efficiently) and can be enabled to simplify lowering and improve performance. In contrast, kernel eBPF must obey verifier constraints and platform-specific alignment restrictions. A new CLI option keeps kernel behavior unchanged while giving userspace VMs an explicit opt-in to enable more permissive codegen. It supports both use-cases without diverging codebases.
…7763) As mentioned in comments for llvm#164913, the `if()` statements here can't be externally triggered, since these writeback registers are passed in from the caller. So they should really be `assert()`s so it's obvious we don't need testcases for them, and more optimal.
Reverts llvm#161546 One of the buildbots reported a cmake error I don't understand, and which I didn't get in my own test builds: ``` CMake Error at /var/lib/buildbot/fuchsia-x86_64-linux/llvm-project/compiler-rt/cmake/Modules/CheckAssemblerFlag.cmake:23 (try_compile): COMPILE_DEFINITIONS specified on a srcdir type TRY_COMPILE ``` My best guess is that the thing I did in `CheckAssemblerFlag.cmake` only works on some versions of cmake. But I don't understand the problem well enough to fix it quickly, so I'm reverting the whole patch and will reland it later.
Implement support for GNUNullExpr
…m#167540) In adopting `[[clang::nonblocking]]` there's been some user confusion. Changes to address `-Wfunction-effects` warnings are often pure annotation, with no runtime effect. Changes to avoid `-Wperf-constraint-implies-noexcept` warnings are risky: adding `noexcept` creates a new potential for the program to crash. In retrospect, `-Wperf-constraint-implies-noexcept` shouldn't have been made part of `-Wall`. --------- Co-authored-by: Doug Wyatt <[email protected]>
This changes muls by `3 << C` from `(X << C + 2) - (X << C)` to `(X << C + 1) + (X << C)`. If Zba is available, the output is not affected as we emit `(shl (sh1add X, X), C)` instead. There are two advantages: - ADD is more compressible - Often a reduced instruction count, by a heuristic that `(X << C + 1)` is more likely to have another use than `(X << C + 2)`
…7898) So that changing the type of the container (planned in a future patch) is less intrusive.
Upstream the basic support for the ExtVectorType element expr
…actor in the way of a glue. (llvm#167805) In the new test, we're trying to fold a load and a X86ISD::CALL. The call has a CopyToReg glued to it. The load and the call have different input chains so they need to be merged. This results in a TokenFactor that gets put between the CopyToReg and the final CALLm instruction. The DAG scheduler can't handle that. The load here was created by legalization of the extract_element using a stack temporary store and load. A normal IR load would be chained into call sequence by SelectionDAGBuilder. This would usually have the load chained in before the CopyToReg. The store/load created by legalization don't get chained into the rest of the DAG. Fixes llvm#63790
AMDGPU: Start to use AV classes for unknown vector class Use AGPR+VGPR superclasses for gfx90a+. The type used for the class should be the broadest possible class, to be contextually restricted later. InstrEmitter clamps these to the common subclass of the context use instructions, so we're best off using the broadest possible class for all types. Note this does very little because we only use VGPR classes for FP types (though this doesn't particularly make any sense), and we legalize normal loads and stores to integer.
…agation.cpp (NFC)
XeGPU and XeVM dialect has assigned maintainers, but related folders currently lack code owners. Add charithaintc and Jianhui-Li as code owner for XeGPU related folders. Add silee2 as code owner for XeVM related folders. Note: charithaintc is current maintainer of XeGPU dialect. silee2 is current maintainer of XeVM dialect.
This patch updates various LLVM headers to properly add the `LLVM_ABI` and `LLVM_ABI_FOR_TEST` annotations to build LLVM as a DLL on Windows. This effort is tracked in llvm#109483.
Biasing towards the native `sqrt` not returning NaN. Issue llvm#147390
Test case for vectorizing std::find_if with builtin_assume_dereferenceable. Currently not vectorized. https://godbolt.org/z/6jbsd4EjT
…lvm#167788) If a caller has locked memory, then the madvise call will fail. In that case, zero the memory so that we don't return non-zeroed memory for calloc calls since we thought the memory had been released.
Reviewers: arsenm, RKSimon, paperchalice, phoebewang Reviewed By: arsenm, RKSimon Pull Request: llvm#167911
…67797) The fixes a TOCTOU bug in the code that initializes shadow memory in ASAN: https://github.com/llvm/llvm-project/blob/4b05581bae0e3432cfa514788418fb2fc2144904/compiler-rt/lib/asan/asan_shadow_setup.cpp#L66-L91 1. During initialization, we call `FindDynamicShadowStart` to search the memory mapping for enough space to dynamically allocate shadow memory. 2. We call `MemoryRangeIsAvailable(shadow_start, kHighShadowEnd);`, which goes into `MemoryMappingLayout`. 3. We actually map the shadow with `ReserveShadowMemoryRange`. In step 2, `MemoryMappingLayout` makes various allocations using the internal allocator. This can cause the allocator to map more memory! In some cases, this can actually allocate memory that overlaps with the shadow region returned by` FindDynamicShadowStart` in step 1. This is not actually fatal, but it memory corruption; MAP_FIXED is allowed to overlap other regions, and the effect is any overlapping memory is zeroed. ------ To address this, this PR implements `MemoryRangeIsAvailable` on Darwin without any heap allocations: - Move `IntervalsAreSeparate` into sanitizer_common.h - Guard existing sanitizer_posix implementation of `MemoryRangeIsAvailable` behind !SANITIZER_APPLE - `IsAddressInMappedRegion` in sanitizer_mac becomes `MemoryRangeIsAvailable`, which also checks for overlap with the DYLD shared cache. After this fix, it should be possible to re-land llvm#166005, which triggered this issue on the x86 iOS simulators. rdar://164208439
Change in llvm#166148 caused breaks for some other types. Specifically this error was seen in a downstream project ``` _ods_ir.OpOperandList[_ods_ir.IntegerType]: TypeError: type 'iree.compiler._mlir_libs._mlir.ir.OpOperandList' is not subscriptable ``` This PR tries to make those changes not affect the other types --------- Signed-off-by: Nirvedh Meshram <[email protected]>
There's only a single RUN line in the test, use the more compact default CHECK.
…olding. (llvm#149042)" This reverts commit 62d1a08. This appears to be causing some runtime failures on RISCV https://lab.llvm.org/buildbot/#/builders/210/builds/5221
The minimum supported SWIG version is 4.0 so there's no need for using a separate file anymore.
This PR upstreams the intrinsics `_mm_prefetch`, `_mm_(l|m)fenche`, `_mm_pause` and `_mm_clflush` from the incubator repository.
Move utility to helper for re-use in follow-up patches.
) This adds some minimal code to mark locations where handling is needed for Dtor_VectorDeleting type dtors, which were added in llvm#165598 This is not a comprehensive mark-up of the missing code, as some code will be needed in places where the surrounding function has larger missing pieces in CIR currently. This fixes a warning for an uncovered switch case that was causing CI builds to fail.
This reverts commit 1a86f0a.
…164458) before ```C++ void foo(int name, // name float name, // name int name) // name {} ``` after ```C++ void foo(int name, // name float name, // name int name) // name {} ``` Fixes llvm#85123. As the bug report explained, the procedure for aligning the function parameters previously failed to update `StartOfTokenColumn`.
Collaborator
dpalermo
approved these changes
Nov 14, 2025
…lvm#166483)" bad virtual register: clang-313307_simple_arr: Make Failed clang-313307_simple_spmd: Make Failed clang-337336: Make Failed clang-387196: Make Failed flang-sollve-bug1: Make Failed MasterBarrierO0: Make Failed This reverts commit c7019c7.
Collaborator
Collaborator
Author
|
!PSDB |
Collaborator
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.