Skip to content

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Nov 11, 2025

No description provided.

RKSimon and others added 30 commits November 11, 2025 17:14
…nt matching and inference and create clusters (llvm#165868)

Adding Matching and Inference Functionality to Propeller. For detailed
information, please refer to the following RFC:
https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238.
This is the fourth PR, which is used to implement matching and inference
and create the clusters. The associated PRs are:
PR1: llvm#160706
PR2: llvm#162963
PR3: llvm#164223

co-authors: lifengxiang1025
[[email protected]](mailto:[email protected]); zcfh
[[email protected]](mailto:[email protected])

Co-authored-by: lifengxiang1025 <[email protected]>
Co-authored-by: zcfh <[email protected]>
This patch adds a new `FramePointerKind::NonLeafNoReserve` and makes it
the default for `-momit-leaf-frame-pointer`.

It also adds a new commandline option `-m[no-]reserve-frame-pointer-reg`.

This should fix llvm#154379, the main impact of this patch can be found in
`clang/lib/Driver/ToolChains/CommonArgs.cpp`.
llvm#165198)

Asan test `ThreadedStressStackReuseTest ` fails on AIX due to smaller
default thread stack size. Set thread stack size to a minimum of 128KB
to ensure reliable test behavior across platforms (platforms with
smaller default thread stack size).

---------

Co-authored-by: Riyaz Ahmad <[email protected]>
Simplify `createReadOrMaskedRead` to only require _one_ argument to
specify the vector type to read (passed as `VectorType`) instead of
passing vector-sizes and scalable-flags independently (i.e. _two_
arguments).

A simple overload is provided for users that wouldn't re-use the
corresponding `VectorType` (and hence there's no point for them
to create). While there are no users upstream for this overload,
it's been helpful downstream.
These tests fail in the profcheck configuration because profinject gets
added to the pipeline and adds metadata that changes the input PGO
information.
…vm#166901)

Now that llvm#166517 has landed and
[Writer](https://github.com/llvm/llvm-project/blob/main/libc/src/stdio/printf_core/writer.h#L130)
has been refactored to track bytes written as size_t, strftime can be
refactored as well to handle size_t return values.

Can't think of a proper way to test this without creating a 2GB+ string,
but existing tests cover most cases.
Closes llvm#161461
- This is my first time contributing to libc's POSIX, so for reference I
used `clock_gettime` implementation for Linux. For convenience, here is
the description of `clock_settime` function
[behavior](https://www.man7.org/linux/man-pages/man3/clock_settime.3.html)
…vm#167405)

The code for v16 of the shared cache objc class layout was copy/pasted
from the previous versions incorrectly. Namely, the wrong class offset
list was used and the class_infos index was never updated.
    
rdar://164430695
…llvm#167379)

According to the
[spec](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_function_pointers.asciidoc),
it is illegal to addrspacecast to the generic AS, so use the function
pointer AS for null constants.

"It is illegal to use Function Pointer as 'Pointer' argument of
OpPtrCastToGeneric."

This was found when compiling the OpenMP Device RTL for SPIR-V.

Signed-off-by: Nick Sarnie <[email protected]>
This patch adds `asin` to the entry points for Arm and AArch64.

Tests have been run using Arm Toolchain for Embedded, a downstream
toolchain.
…th optimizations possible (llvm#165464)

the patch 

[Add strictfp attribute to prevent unwanted optimizations of libm
calls](https://reviews.llvm.org/D34163)


  add `I.isStrictFP()` into 
```
  if (!I.isNoBuiltin() && !I.isStrictFP() && !F->hasLocalLinkage() &&
        F->hasName() && LibInfo->getLibFunc(*F, Func) &&
        LibInfo->hasOptimizedCodeGen(Func)) 
```

it prevents the backend from optimizing even non-math libcalls such as
`strlen` and `memcmp` if a call has the strict floating-point attribute.
For example, it prevent converting strlen and memcmp to milicode call
__strlen and __memcmp.
… Implement matching and inference and create clusters" (llvm#167559)

Reverts llvm#165868 due to buildbot failures

Co-authored-by: spupyrev <[email protected]>
The existing function is LT but most of the uses are better
expressed as GE
Fixed stub relocation test. Just need to check 32-bit.

---------

Co-authored-by: anoopkg6 <[email protected]>
…66883)

parameters when defining the scripting interfaces.

We try to count the parameters to make sure the user has defined them
correctly, but this throws the counting off.

I'm not adding a test for this because then it would seem like we
thought this was a good idea. I'd actually rather not support it
altogether, but we added the parameter checking pretty recently so there
are extant implementations that we broke. I only want to support them,
not suggest anyone else do this going forward.
…lvm#167534)

These selects are dependent on values live into the CHRScope that we
cannot infer anything about, so mark the branch weights unknown. These
selects usually also just get folded down into a icmps, so the profile
information ends up being kind of redundant.
…nds (llvm#165295)

Reasoning behind proposed change. This helps us move away from selecting
v_alignbits for fshr with uniform operands.

V_ALIGNBIT is defined in the ISA as:
D0.u32 = 32'U(({ S0.u32, S1.u32 } >> S2.u32[4 : 0]) & 0xffffffffLL)
Note: S0 carries the MSBs and S1 carries the LSBs of the value being
aligned.

I interpret that as : concat (s0, s1) >> S2, and use the 0X1F mask to
return the lower 32 bits.

fshr:

fshr i32 %src0, i32 %src1, i32 %src2
Where:
concat(%src0, %src1) represents the 64-bit value formed by %src0 as the
high 32 bits and %src1 as the low 32 bits.
%src2 is the shift amount.
Only the lower 32 bits are returned.
So these two are identical.

So, I can expand the V_ALIGNBIT through bit manipulation as:
Concat: S1 | (S0 << 32)
Shift: ((S1 | (S0 << 32)) >> S2)
Break the shift: (S1>>S2) | (S0 << (32 – S2)
The proposed pattern does exactly this.

Additionally, src2 in the fshr pattern should be:

* must be 0–31.
* If the shift is ≥32, hardware semantics differ; you must handle it
with extra instructions.

The extra S_ANDs limit the selection only to the last 5 bits
The call-graph-section-assembly.ll tests in CodeGen/X86 and
CodeGen/Aarch64 bot fail under LLVM_REVERSE_ITERATION. These sets should
use SetVector to avoid non-determinism in the ouput.
XChy and others added 20 commits November 12, 2025 04:33
…tTrunc (llvm#167165)

Fixes llvm#165438
With `simd128` enabled, we may meet vector type truncation in FastISel.
To respect llvm#138479, this patch merely bails out on non-integer IR types,
though I prefer bailing out for all non-simple types as most targets
(X86, AArch64) do.
We want the premerge advisor to write out comments, and we need the
issue-write workflow to trigger on it in order for this to work. Landing
this before the rest of llvm#166609 to enable testing that given this needs
to be in repo due to permissions issues.
Add helper to make it easier to retrieve the single user of a VPUser.
…167221)

`asm()` on function declarations is used for specifying the mangling.
But that specific spelling is a GNU extension unlike `__asm()`.

Found by building with `-std=c2y` in Clang's C frontend's config file.
Adjust the frame setup code for Windows ARM64 to attempt to align
pair-wise spills to 16-byte boundaries. This enables us to properly emit
the spills for custom clang calling convensions such as preserve most
which spills r9-r15 which are normally nonvolatile registers. Even when
using the ARM64EC opcodes for the unwinding, we cannot represent the
spill if it is unaligned.
…lvm#166475)

Allow widening up to 128-bit registers or if the new register class
is at least as large as one of the existing register classes.

This was artificially limiting. In particular this was doing the wrong
thing with sequences involving copies between VGPRs and AV registers.
Nearly all test changes are improvements.

The coalescer does not just widen registers out of nowhere. If it's
trying
to "widen" a register, it's generally packing a register into an
existing
register tuple, or in a situation where the constraints imply the wider
class anyway. 067a110 addressed the allocation failure concern by
rejecting coalescing if there are no available registers. The original
change in a4e63ea didn't include a realistic testcase to judge if
this is harmful for pressure. I would expect any issues from this to
be of garden variety subreg handling issue. We could use more dynamic
state information here if it really is an issue.

I get the best results by removing this override completely. This is
a smaller step for patch splitting purposes.
`<stdbool.h>` is provided by the compiler and both Clang and GCC provide
C++-aware versions of these headers, making our own wrapper header
entirely unnecessary.
Classof for most recipes directly supports VPValue, so there is no need
to call getDefiningRecipe when using isa/cast/dyn_cast.
Merge chasing latest versions of bulk test updates
When there are more than 255 sections, MachO object writer allows
creation of object files which are potentially malformed. Currently,
there are assertions in object writer code that prevents this behavior. 
But for distributions where assertions are turned off this still results
in
creation of malformed object files. Turning assertions into explicit
errors.
…vm#167025)

Ran into a use case where we had a MachO object file with a section
symbol which did not have a section associated with it segfaults during
linking. This patch aims to handle such cases gracefully and avoid the
linker from crashing.

---------

Co-authored-by: Ellis Hoag <[email protected]>
Windows doesn't support `pthread_attr`, which was introduced to
asan_test.cpp in llvm#165198, so this change `#ifdef`s out the changes made
in that PR.

Originally reported by Chrome as https://crbug.com/459880605.
These should always use TargetConstant
Adds test coverage with loops where the same loads get executed under
complementary predicates and can be hoisted, together with a set of
negative test cases.
 (llvm#159884)

This eliminates the pseudo registerclasses used to hack the
wave register class, which are now replaced with RegClassByHwMode,
so most of the diff is from register class ID renumbering.
@ronlieb ronlieb requested review from a team and dpalermo November 11, 2025 23:59
@z1-cciauto
Copy link
Collaborator

@z1-cciauto
Copy link
Collaborator

@z1-cciauto z1-cciauto merged commit c1f09d1 into amd-staging Nov 12, 2025
15 checks passed
@z1-cciauto z1-cciauto deleted the amd/merge/upstream_merge_20251111173022 branch November 12, 2025 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.