Skip to content

Conversation

@michaelselehov
Copy link

AMDGPU: reflect lack of complex addressing in LSR; fix regressions after SROA copy-load promotion

Background

Recent SROA changes promote copy-load style IR more aggressively. As a side effect, LoopStrengthReduce (LSR) now triggers on loops where it previously stayed quiet. On AMDGPU (no rich base+index addressing modes), generic LSR heuristics could pick plans that looked cheaper but expanded into extra address formation work in the loop body, causing performance regressions.

Problem

  • A scaled index with a base register is not free on AMDGPU; it becomes an extra per-iteration add.
  • LSR’s Insns metric did not count this address work, so plans could “win” by 1 instruction in theory but actually add a v_add in the loop.
  • On GFX9+ we care more about per-iteration work than about small, one-time preheader setup.

Solution

  • Implement getScalingFactorCost in GCNTTIImpl to model scaled addressing:
    If HasBaseReg && Scale != 0, return cost 1 (one extra per-iteration instruction).
  • Use Effective Insns as the top metric in isLSRCostLess (GFX9+):
    Effective Insns = Insns + ScaleCost
    This makes scaled address formation count as real per-iteration work.

Priority order on GFX9+

  1. Effective Insns (Insns + ScaleCost)
  2. NumIVMuls (penalize mul/mul_hi/addc IV chains)
  3. AddRecCost (prefer fewer IV updates)
  4. NumBaseAdds
  5. SetupCost
  6. ImmCost
  7. NumRegs
  • Pre-GFX9 behavior is unchanged (fall back to base implementation).

Rationale

  • AMDGPU lacks hardware complex addressing. Base + (iv * scale) requires an extra instruction; modeling that as ScaleCost = 1 and folding into Effective Insns fixes false “wins”.
  • Keeping NumIVMuls high in the order preserves the important penalty for wide IV mul chains (problematic on gfx942).
  • Preheader work remains a lower priority; we accept modest setup if per-iteration work is not worse (often better).

Validation

  • Micro cases that previously regressed now choose pointer IV or a single IV with honest accounting of address work; per-iteration VALU pressure is reduced or unchanged.
  • A GlobalISel case that relies on fewer IV updates still selects the intended plan (no scale involved, so Effective Insns ties and AddRecCost wins).
  • AMDGPU lit tests with autogenerated checks were reviewed and selectively regenerated where code shape improved or shifted benignly. Intent-based tests (no crashes, correct sinking, correct divergence handling) remain valid.
  • Bench on gfx942 remains stable; no increase in actual loop work.

Notes

  • No LSR algorithms were changed; this is TTI cost modeling and plan ordering for AMDGPU.
  • Scope is limited to GFX9+.
  • Further tuning can be added later if we find unrolled-shape outliers, but current data shows this approach addresses the original regressions without harming other cases.

P.S. The commented-out debug messages will stay there till the final form (waiting for CQE extensive perf testing), then I'll remove them.

clayborg and others added 30 commits November 4, 2025 09:36
Mach-o has 32 bit file offsets in the MachO::section_64 structs. dSYM
files can contain sections whose start offset exceeds UINT32_MAX, which
means the MachO::section_64.offset will get truncated. We can calculate
when this happens and properly adjust the section offset to be 64 bit
safe. This means tools can get the correct section contents for large
dSYM files and allows tools that parse DWARF, like llvm-gsymutil, to be
able to load and convert these files correctly.
Upstream the CXXDefaultArgExpr support for AggregateExpr
Add faccessat entrypoints for aarch64 and riscv linux. Entrypoints are
removed if faccessat2 syscall is not available.
ConstantRange uses `[-1, -1)` as the canonical form of a full set.
Therefore, the `for (APInt I = Lower; I != Upper; ++I)` idiom doesn't
work for full ranges. This patch fixes the value enumeration in
`ConstantComparesGatherer` to prevent missing values for full sets.

Closes llvm#166369.
…no-gpu-bundle-output flag (llvm#163834)

Currently, the command `clang -c -emit-llvm --no-gpu-bundle-output
--offload-arch=gfx900,gfx1030 -O3 -x hip square.hip` will lead to a
bundled output:
```
❯ ../bin/clang -c -emit-llvm --no-gpu-bundle-output --offload-arch=gfx900,gfx1030 -O3 -x hip square.hip
❯ ls
square.hip
square.bc
```

This doesn't match my expectation of the behavior of
`--no-gpu-bundle-output`, so this adds a check into
OffloadingActionBuilder for the flag when replacing the host compile
action for a bundling action.
PR llvm#159163's probability computation for epilogue loops does not handle
the possibility of an original loop probability of one. Runtime loop
unrolling does not make sense for such an infinite loop, and a division
by zero results. This patch works around that case.

Issue llvm#165998.
Fix Python virtual environment paths for Windows in the Lifetime Safety
Analysis benchmark

### What changed?

- Added conditional path setting for the Python executable in the
virtual environment based on the platform
- For Windows, use `Scripts/python` path
- For other platforms, use `bin/python` path
- Updated the commands that use the Python virtual environment to use
the platform-specific path

### How to test?

`ninja benchmark_lifetime_safety_analysis`

Fixes llvm#166143
Omitting `-o /dev/null` may try to write output to the current dir,
which may not have write permissions on some build systems.

This fixes the test added by llvm#165737
…llvm#166312)

This is consistent with some other nodes that require a constant.
Particularly intrinsics with ImmArg.
…66290)

Eventually this should be generated by tablegen for all functions.
For now add a manually implementation for sincos_stret, which I
have an immediate use for. This will allow pulling repeated code
across targets into shared call sequence code.

Also add sqrt just to make sure we can handle adding return attributes
on the declaration.
Avoid weird lambda returning function pointer and sink the libcall
logic to where the operation is handled. This allows chaining the
libcall logic to try sincos_stret and fallback to sincos. The resulting
cost seems too low.
This avoids AArch64 legality rules depending on libcall
availability.

ARM, AArch64, and X86 all had custom lowering of fsincos which
all were just to emit calls to sincos_stret / sincosf_stret. This
messes with the cost heuristics around legality, because really
it's an expand/libcall cost and not a favorable custom.

This is a bit ugly, because we're emitting code trying to match the
C ABI lowered IR type for the aggregate return type. This now also
gives an easy way to lift the unhandled x86_32 darwin case, since
ARM already handled the return as sret case.
Previously, `clang -march=znver5 -O3` would emit the following for
`shl`, `lshr` and `ashr <64 x i8>`:
```asm
.LCPI0_2:
	.byte	8
	.byte	4
	.byte	2
	.byte	1
	.byte	0
	.byte	0
	.byte	0
	.byte	0
.LCPI0_3:
	.byte	32
	.byte	16
	.byte	8
	.byte	4
	.byte	2
	.byte	1
	.byte	0
	.byte	0
shl:
	vpsllw	zmm1, zmm1, 5
	vpmovb2m	k1, zmm1
	vpaddb	zmm1, zmm1, zmm1
	vgf2p8affineqb	zmm0 {k1}, zmm0, qword ptr [rip + .LCPI0_2]{1to8}, 0
	vpmovb2m	k1, zmm1
	vpaddb	zmm1, zmm1, zmm1
	vgf2p8affineqb	zmm0 {k1}, zmm0, qword ptr [rip + .LCPI0_3]{1to8}, 0
	vpmovb2m	k1, zmm1
	vpaddb	zmm0 {k1}, zmm0, zmm0
	ret

.LCPI1_3:
	.byte	0
	.byte	0
	.byte	0
	.byte	0
	.byte	128
	.byte	64
	.byte	32
	.byte	16
.LCPI1_4:
	.byte	0
	.byte	0
	.byte	128
	.byte	64
	.byte	32
	.byte	16
	.byte	8
	.byte	4
.LCPI1_5:
	.byte	0
	.byte	128
	.byte	64
	.byte	32
	.byte	16
	.byte	8
	.byte	4
	.byte	2
lshr:
	vpsllw	zmm1, zmm1, 5
	vpmovb2m	k1, zmm1
	vpaddb	zmm1, zmm1, zmm1
	vgf2p8affineqb	zmm0 {k1}, zmm0, qword ptr [rip + .LCPI1_3]{1to8}, 0
	vpmovb2m	k1, zmm1
	vpaddb	zmm1, zmm1, zmm1
	vgf2p8affineqb	zmm0 {k1}, zmm0, qword ptr [rip + .LCPI1_4]{1to8}, 0
	vpmovb2m	k1, zmm1
	vgf2p8affineqb	zmm0 {k1}, zmm0, qword ptr [rip + .LCPI1_5]{1to8}, 0
	ret

ashr:
	vpsllw	zmm1, zmm1, 5
	vpunpckhbw	zmm2, zmm0, zmm0
	vpunpckhbw	zmm4, zmm1, zmm1
	vpsraw	zmm3, zmm2, 4
	vpunpcklbw	zmm0, zmm0, zmm0
	vpmovb2m	k1, zmm4
	vpaddw	zmm4, zmm4, zmm4
	vpunpcklbw	zmm1, zmm1, zmm1
	vmovdqu8	zmm2 {k1}, zmm3
	vpmovb2m	k1, zmm4
	vpsraw	zmm3, zmm2, 2
	vpaddw	zmm4, zmm4, zmm4
	vmovdqu8	zmm2 {k1}, zmm3
	vpsraw	zmm3, zmm2, 1
	vpmovb2m	k1, zmm4
	vmovdqu8	zmm2 {k1}, zmm3
	vpmovb2m	k1, zmm1
	vpsraw	zmm3, zmm0, 4
	vpaddw	zmm1, zmm1, zmm1
	vpsrlw	zmm2, zmm2, 8
	vmovdqu8	zmm0 {k1}, zmm3
	vpmovb2m	k1, zmm1
	vpsraw	zmm3, zmm0, 2
	vpaddw	zmm1, zmm1, zmm1
	vmovdqu8	zmm0 {k1}, zmm3
	vpsraw	zmm3, zmm0, 1
	vpmovb2m	k1, zmm1
	vmovdqu8	zmm0 {k1}, zmm3
	vpsrlw	zmm0, zmm0, 8
	vpackuswb	zmm0, zmm0, zmm2
	ret
```

With this commit, the generated assembly becomes this:

```asm
.LCPI0_2:
	.byte	0
	.byte	255
	.byte	0
	.byte	255
.LCPI0_3:
	.byte	255
	.byte	0
	.byte	255
	.byte	0
shl:
	vpsrlw	zmm2, zmm1, 8
	vpandd	zmm3, zmm0, dword ptr [rip + .LCPI0_2]{1to16}
	vpandd	zmm1, zmm1, dword ptr [rip + .LCPI0_3]{1to16}
	movabs	rax, -6148914691236517206
	kmovq	k1, rax
	vpsllvw	zmm2, zmm3, zmm2
	vpsllvw	zmm0, zmm0, zmm1
	vmovdqu8	zmm0 {k1}, zmm2
	ret

.LCPI1_0:
	.byte	255
	.byte	0
lshr:
	vpbroadcastw	zmm2, word ptr [rip + .LCPI1_0]
	movabs	rax, -6148914691236517206
	kmovq	k1, rax
	vpandq	zmm3, zmm1, zmm2
	vpandq	zmm2, zmm0, zmm2
	vpsrlw	zmm1, zmm1, 8
	vpsrlvw	zmm2, zmm2, zmm3
	vpsrlvw	zmm0, zmm0, zmm1
	vmovdqu8	zmm2 {k1}, zmm0
	vmovdqa64	zmm0, zmm2
	ret

.LCPI2_1:
	.byte	255
	.byte	0
	.byte	255
	.byte	0
ashr:
	vpsrlw	zmm2, zmm1, 8
	vpandd	zmm1, zmm1, dword ptr [rip + .LCPI2_1]{1to16}
	movabs	rax, -6148914691236517206
	vpsravw	zmm2, zmm0, zmm2
	vpsllw	zmm0, zmm0, 8
	kmovq	k1, rax
	vpsraw	zmm0, zmm0, 8
	vpsravw	zmm0, zmm0, zmm1
	vmovdqu8	zmm0 {k1}, zmm2
	ret
```

While I don't have AVX512 hardware, llvm-mca suggests significant
speedups, and I've done some simple correctness tests on random inputs
using the Intel Software Development Emulator.
Update the verifier to first check if the number of incoming values
matches the number of predecessors, before using
incoming_values_and_blocks. We unfortunately need also check here, as
this may be called before verifyPhiRecipes runs.

Also update the verifier unit tests, to actually fail for the expected
recipes.
…irst` (llvm#166384)

This paves the way to make `--stack-first` the default.

See: llvm#151015
When running UTC on SLU/guards.ll (without LLVM changes), there are a number of changes in the UTC-generated checks. Submitting those first to simplify the diff of PR llvm#164271, as most of the changes in the latter were actually these.
…66260)

MachineFunctionSplitter was missing a skipFunction() check, causing it
to incorrectly split functions that should be skipped (e.g., functions
with optnone attribute).

This patch adds an early skipFunction() check in runOnMachineFunction()
to ensure these functions are never split, regardless of profile data
availability or other splitting conditions.
…ction (llvm#165774)

Updated the conditions for generating the cvt.ftz.f32.bf16 instruction
to include sm90 and isa7.8, so that ftz is only generated when it is
supported.

---------

Co-authored-by: Justin Fargnoli <[email protected]>
Broken by 831e79a, though
presubmit was somehow green.
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of
a device on a given system. (Not necessarily a UUID.) Since it is not
guaranteed that the (U)UIDs defined by the device vendor libraries, such
as HSA, do not overlap with those of other vendors, the device UIDs in
offload are always combined with the offload plugin name. In case the
vendor library does not specify any device UID for a given device, we
fall back to the offload-internal device ID.
The device UID can be retrieved using the `llvm-offload-device-info`
tool.
From
llvm#148554 (comment)
- this adds an option for API tests to be run with the both PDB readers
on Windows. As there are a lot of failures with PDB, this is an opt-in
per test.

To get PDB, `-g -gcodeview` has to be used on Clang. `-gcodeview` alone
isn't enough, because it won't cause clang to pass `-debug` to the
linker.

llvm#149498 tracks the (currently) failing tests.
Ensure that the nested native build uses the same python interpreter as
the main build, in case the python that CMake detects first is not the
python that the user has specified explicitly.

For example, if the person building LLVM wants to use a different python
interpreter to build (eg, testing the build with `python3.14` when
`python3` is a link to `python3.8`, or the default python doesn't have
development headers available) then they could add
`-DPYTHON_EXECUTABLE=python3.14` when invoking CMake. This should be
forwarded to the native CMake build to ensure that the same python is
used.

Original fix by Anuj Mittal <[email protected]>.
Reviewers: paperchalice, phoebewang, arsenm

Reviewed By: arsenm

Pull Request: llvm#165113
There are no tests that specifically stop/start at
x86-partial-reduction, so no test cases have been updated for this
patch.

Reviewers: phoebewang, paperchalice, RKSimon, arsenm

Reviewed By: arsenm, RKSimon

Pull Request: llvm#166048
Currently, the jump table entry contains labels only. For example, the
following is one example:

  BPF.JT.0.0:
        .quad   LBB0_1
        .quad   LBB0_2
        .size   BPF.JT.0.0, 16

Since the jump table entry contains a label, the relocation is necessary
so linker can resolve the label value. The relocation looks like below:

  Relocation section '.rel.jumptables' at offset 0x160 contains 2 entries:
      Offset             Info             Type               Symbol's Value  Symbol's Name
  0000000000000000  0000000200000002 R_BPF_64_ABS64         0000000000000000 .text
  0000000000000008  0000000200000002 R_BPF_64_ABS64         0000000000000000 .text

You can see that the symbol value is 0 which makes .rel.jumptables not
very useful.

Instead of having the label itself in the jump table entry, use the
difference of label and the section begin symbol. This can avoid the
relocation and the eventual jumptable entries in object file remains the
same as before.

  Hex dump of section '.jumptables':
  0x00000000 68000000 00000000 78000000 00000000 h.......x.......
mgudim and others added 27 commits November 5, 2025 15:33
The register values between `2 << 30` (inclusive) and `2 << 31`
(exclusive) correspond to frame indices. To obtain the frame index from
the given register value we interpret first 30 bits as an unsigned
integer. Thus, currently only non-negative frame indices can be
represented.

However, we should also be able to represent negative frame indices as
register values as well. This is used by reaching definitions analysis
for example.

In order to do that, we interpret the first 30 bits of the register
value as a signed integer.

---------

Co-authored-by: Mikhail Gudim <[email protected]>
Co-authored-by: Petr Penzin <[email protected]>
This patch introduces `SBFrameList`, a new SBAPI class that allows
iterating over stack frames lazily without calling
`SBThread::GetFrameAtIndex` in a loop.

The new `SBThread::GetFrames()` method returns an `SBFrameList` that
supports Python iteration (`for frame in frame_list:`), indexing
(`frame_list[0]`, `frame_list[-1]`), and length queries (`len()`).

The implementation uses `StackFrameListSP` as the opaque pointer,
sharing the thread's underlying frame list to ensure frames are
materialized on-demand.

This is particularly useful for ScriptedFrameProviders, where user
scripts will be to iterate, filter, and replace frames lazily without
materializing the entire stack upfront.

Signed-off-by: Med Ismail Bennani <[email protected]>
This new test case breaks the buildbot starting
https://lab.llvm.org/buildbot/#/builders/201/builds/6934. The
corresponding clang test case sets the target triple to avoid failures.
…6670)

I was looking at the calls to `usleep` in debugserver and noticed that
these default arguments are never overwritten. I converted them to
constants in the function, which makes it easier to reason about.
This is step 1 of exposing WASM `ref.func` to LLVM.

This PR only handles creating the instruction and a test for assembling
it.
From the OpenACC 3.4 Specification:
```
1924 2.7.9 copyout clause
1925 The copyout clause may appear on structured data and compute constructs, on declare di
1926 rectives, and on exit data directives. The clause may optionally have a zero modifier if the
1927 copyout clause appears on a structured data or compute construct.
1928 Only the following modifiers may appear in the optional modifier-list: always, alwaysin or zero.
1929 Additionally, on structured data and compute constructs capture modifier may appear
```
`readonly` is not a legal modifier for the `copyout` clause. The call to
`genDataOperandOperationsWithModifier` should be checking the parsed
modifier for the `copyout` clause against the `Zero` modifier.
Revealed in PR llvm#166102, which itself doesn't _cause_ the profile being
dropped. Referencing if it makes debugging easier.
…ead exits (llvm#164714)

As noted in the doc comment of `handleDeadExits`​, the dummy switch is just an artifact of the constraints placed by the fact that we operate in a loop pass. Adding weights here is unnecessary, but the complexity is low, and it helps keep things easy for profile propagation verification (in a sense, the overall complexity would be higher if we special-cased this somehow).

Issue llvm#147390
This was very hackily patched in
16ef893 to not use env -u. The internal
shell does not support unset, but does supprt env -u. Disable the test
for now so we can enable the internal shell with a TODO to enable it
after the internal shell landing has stuck.

Reviewers: fmayer, vitalybuka, w2yehia, daltenty, mingmingl-llvm, madanial0

Reviewed By: mingmingl-llvm

Pull Request: llvm#166637
This fixes a regression exposed after
4454152.
This introduces a few small regressions for true16. There are more cases
where the value can propagate through subregister extracts which need
new handling. They're also small enough that perhaps there's a way to
avoid needing to deal with this case in the first place.
Removing skip after confirming builds pass locally. Upgraded workers to
clang 20

Fixes llvm#162516
LSR on AMDGPU was selecting plans that reduce Setup/AddRec cost while
increasing per-iteration base additions (`NumBaseAdds`) in the loop
body.With minimal addressing modes, each extra `add i32` becomes a
separate VALU op and hurts throughput (observed on gfx942).

Teach GCNTTIImpl to compare LSR plans with `NumBaseAdds` and then
`Insns` as dominant criteria, ahead of preheader costs (`SetupCost`,
`AddRecCost`). Also return false from isNumRegsMajorCostOfLSR()
and true from shouldDropLSRSolutionIfLessProfitable(). This steers
LSR away from “-setup +more loop adds” trades; the reduced repro
now keeps baseline and a large kernel regains performance.
- Fixed a few lit-tests by limiting to GFX9 and above
- Regenerated no-crash tests
- Reorder GCNTTIImpl::isLSRCostLess() for GFX9+ to prioritize per-iteration work:
  1) Insns (lower is better)
  2) NumBaseAdds (tie-breaker to Insns)
  3) NumIVMuls (penalize mul/mul_hi/addc IV chains)
  4) AddRecCost, then SetupCost (preheader costs only if per-iter ties)
  5) Minor keys: ScaleCost, then NumRegs
- Keep pre-GFX9 behavior unchanged (fall back to BaseT::isLSRCostLess).
- Rationale: AMDGPU lacks rich addressing modes; we must avoid LSR “wins” that
  reduce setup but add per-iteration base-adds / wide IV mul chains. This ordering
  consistently favors plans with less per-iteration work and reduces VALU pressure
  on gfx9+ (e.g., gfx942). In practice LSR now often drops its “improvement” in
  favor of the baseline plan (as seen in -debug-only=lsr logs).

Tests:
- Regenerate `llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll` checks
  (kernel `introduced_copy_to_sgpr`). Assembly changed in a beneficial way:
  fewer VALU ops in the hot loops and lower register usage:
    num_vgpr: 28 → 25
    numbered_sgpr: 26 → 20
  No functional change to the test’s intent (no RA asserts; copy/RA behavior intact).
  Checks updated via `utils/update_llc_test_checks.py`.
… base-add count

Adjust the LSR cost comparison order for GFX9+ to prioritize per-iteration
IV efficiency over minimizing base-adds.

Previous order had NumBaseAdds checked early, causing the cost model to
prefer baseline solutions with multiple IVs (NumBaseAdds=0) over optimized
LSR solutions with fewer IVs but one base-add (NumBaseAdds=1). This resulted
in extra register moves between IVs in the loop body, increasing VALU pressure.

Changes to priority order for GFX9+:
1. Insns (total per-iteration instructions)
2. NumIVMuls (IV multiplication chains) <- moved above NumBaseAdds
3. AddRecCost (per-iteration IV updates) <- moved above NumBaseAdds
4. NumBaseAdds (base address adds) <- moved down
5. SetupCost (preheader costs)
6. ScaleCost, NumRegs (tie-breakers)

Rationale: AddRecCost directly measures per-iteration IV update cost
(fewer IVs = lower cost), which better reflects actual loop body work
than NumBaseAdds alone. This fixes cases where baseline with 2 IVs
generated more VALU instructions than LSR solution with 1 IV + base-add.
This patch refines the AMDGPU LSR cost model to better reflect the lack
of rich addressing modes on the architecture.

Key changes:

1. Implement getScalingFactorCost() to return 1 for base+scale*index
   addressing modes. While AMDGPU's isLegalAddressingMode() reports such
   modes as "legal", they require a separate ADD instruction unlike
   architectures with hardware-supported complex addressing.

2. Update isLSRCostLess() to use EffInsns = Insns + ScaleCost for the
   primary comparison. This ensures that LSR solutions using scaled
   addressing are properly penalized for the actual per-iteration work
   on AMDGPU.

New priority order for GFX9+:
  1. EffInsns (Insns + ScaleCost) - true per-iteration cost
  2. NumIVMuls - IV multiplication chains
  3. AddRecCost - IV update cost
  4. NumBaseAdds - base address additions
  5. SetupCost - preheader costs
  6. ImmCost, NumRegs - tie-breakers

Lit tests were carefully analyzed before regeneration to ensure:
- No functional regressions (all test objectives preserved)
- No incorrect machine sinking or divergence handling issues
- Acceptable trade-offs (preheader overhead vs per-iteration efficiency)
- No register spills or unnecessary barriers introduced
@michaelselehov michaelselehov deleted the amd/dev/mselehov/amdgpu-lsr-cost-model branch November 6, 2025 15:26
Comment on lines +14 to +106
if: github.event.pull_request.draft == false
runs-on:
group: compiler-generic-runners
env:
svc_acc_org_secret: ${{secrets.CI_GITHUB_TOKEN}}
input_sha: ${{ github.event.pull_request.head.sha != '' && github.event.pull_request.head.sha || github.sha }}
input_pr_num: ${{ github.event.pull_request.number != '' && github.event.pull_request.number || 0 }}
input_pr_url: ${{ github.event.pull_request.html_url != '' && github.event.pull_request.html_url || '' }}
input_pr_title: ${{ github.event.pull_request.title != '' && github.event.pull_request.title || '' }}
# set the pipeline name here based on branch name
pipeline_name: ${{secrets.CI_JENKINS_JOB_NAME}}
JENKINS_URL: ${{secrets.CI_JENKINS_URL}}
CONTAINER_IMAGE: ${{ secrets.JENKINS_TRIGGER_DOCKER_IMAGE }}

# Steps represent a sequence of tasks that will be executed as part of the job
steps:

# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- name: Set environment variable for container image
run: |
echo "CONTAINER_IMAGE=${{ secrets.JENKINS_TRIGGER_DOCKER_IMAGE }}" >> $GITHUB_ENV
echo "CONTAINER_NAME=my_container_${{ github.run_id }}" >> $GITHUB_ENV


- name: Pull container image
run: docker pull "${{env.CONTAINER_IMAGE}}"


- name: Run container
run: |
docker run -d --name "${{env.CONTAINER_NAME}}" $CONTAINER_IMAGE sleep infinity
#docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "git clone ${{secrets.CI_UTILS_REPO}} ."
docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "echo 'Running commands inside the container'"

- name: Escape pull request title
run: |
import json
import os
import shlex
with open('${{ github.event_path }}') as fh:
event = json.load(fh)
escaped = event['pull_request']['title']
with open(os.environ['GITHUB_ENV'], 'a') as fh:
print(f'PR_TITLE={escaped}', file=fh)
shell: python3 {0}

- name: Run Jenkins Cancel Script
env:
JENKINS_URL: ${{secrets.CI_JENKINS_URL}}
JENKINS_USER: ${{secrets.CI_JENKINS_USER}}
JENKINS_API_TOKEN: ${{secrets.CI_JENKINS_TOKEN}}
JENKINS_JOB_NAME: ${{secrets.CI_JENKINS_JOB_NAME}}
PR_NUMBER: ${{ github.event.pull_request.number }}
COMMIT_HASH: ${{ github.event.after }}
run: |
docker exec -e JENKINS_JOB_NAME=${{secrets.CI_JENKINS_JOB_NAME}} -e PR_NUMBER=${{ github.event.pull_request.number }} -e COMMIT_HASH=${{ github.event.after }} -e JENKINS_URL=${{secrets.CI_JENKINS_URL}} -e JENKINS_USER=${{secrets.CI_JENKINS_USER}} -e JENKINS_API_TOKEN=${{secrets.CI_JENKINS_TOKEN}} "${{env.CONTAINER_NAME}}" /bin/bash -c "PYTHONHTTPSVERIFY=0 python3 cancel_previous_build.py"


# Runs a set of commands using the runners shell
- name: Getting Event Details
run: |
echo $(pwd)
echo $GITHUB_ENV
echo $GITHUB_REPOSITORY
echo $GITHUB_SERVER_URL
echo "GITHUB_SHA is: $GITHUB_SHA"
echo "GITHUB_WORKFLOW_SHA is: $GITHUB_WORKFLOW_SHA"
echo "GITHUB_BASE_REF is: $GITHUB_BASE_REF"
echo "GITHUB_REF_NAME is: $GITHUB_REF_NAME"
echo "github.event.pull_request.id is: ${{github.event.pull_request.id}}"
echo "github.event.pull_request.html_url is: ${{github.event.pull_request.html_url}}"
echo "github.event.pull_request.number is: ${{github.event.pull_request.number}}"
echo "github.event.pull_request.url is: ${{github.event.pull_request.url}}"
echo "github.event.pull_request.issue_url is: ${{github.event.pull_request.issue_url}}"
echo "github.event.pull_request.head.sha is: ${{github.event.pull_request.head.sha}}"
echo "github.event.pull_request.base.ref is: ${{github.event.pull_request.base.ref}}"
echo "github.event.pull_request.merge_commit_sha is: ${{github.event.pull_request.merge_commit_sha}}"
echo "github.event.pull_request is: ${{github.event.pull_request}}"


- name: Trigger Jenkins Pipeline
if: steps.check_changes.outcome != 'failure'
run: |
echo "--Running jenkins_api.py with input sha - $input_sha for pull request - $input_pr_url"
docker exec -e GITHUB_REPOSITORY="$GITHUB_REPOSITORY" -e svc_acc_org_secret="$svc_acc_org_secret" -e input_sha="$input_sha" -e input_pr_url="$input_pr_url" -e pipeline_name="$pipeline_name" \
-e input_pr_num="$input_pr_num" -e PR_TITLE="$PR_TITLE" -e JENKINS_URL="$JENKINS_URL" -e GITHUB_PAT="$svc_acc_org_secret" "${{env.CONTAINER_NAME}}" \
/bin/bash -c 'echo \"PR NUM: "$input_pr_num"\" && PYTHONHTTPSVERIFY=0 python3 jenkins_api.py -s \"${JENKINS_URL}\" -jn "$pipeline_name" -ghr "$GITHUB_REPOSITORY" -ghsha "$input_sha" -ghprn "$input_pr_num" -ghpru "$input_pr_url" -ghprt "$PR_TITLE" -ghpat="$svc_acc_org_secret"'

- name: Stop and remove container
if: always()
run: |
docker stop "${{env.CONTAINER_NAME}}"
docker rm "${{env.CONTAINER_NAME}}"

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {}

Copilot Autofix

AI 9 days ago

To fix the problem, explicitly declare a permissions: block at the root of the workflow YAML file (line 2, immediately after the name: and before on:) to limit the GITHUB_TOKEN supplied to jobs. As a minimal baseline, set contents: read, which allows only read access to repository contents. If future steps need further access (such as posting checks, updating issues, etc.), they can be added to the permissions block, but for now, limiting to contents: read is the secure default. No methods, imports, or code logic change is required — only the YAML workflow file must be updated.


Suggested changeset 1
.github/workflows/PSDB-amd-staging.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/PSDB-amd-staging.yml b/.github/workflows/PSDB-amd-staging.yml
--- a/.github/workflows/PSDB-amd-staging.yml
+++ b/.github/workflows/PSDB-amd-staging.yml
@@ -1,4 +1,6 @@
 name: Compiler CI PSDB trigger on amd-staging branch
+permissions:
+  contents: read
 
 # Controls when the workflow will run
 on:
EOF
@@ -1,4 +1,6 @@
name: Compiler CI PSDB trigger on amd-staging branch
permissions:
contents: read

# Controls when the workflow will run
on:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +11 to +135
if: github.event.pull_request.draft == false
runs-on:
group: compiler-generic-runners
env:
PR_SHA: ${{ github.event.pull_request.head.sha != '' && github.event.pull_request.head.sha || github.sha }}
PR_NUMBER: ${{ github.event.pull_request.number != '' && github.event.pull_request.number || 0 }}
PR_URL: ${{ github.event.pull_request.html_url != '' && github.event.pull_request.html_url || '' }}
PR_TITLE: ${{ github.event.pull_request.title != '' && github.event.pull_request.title || '' }}
BASE_BRANCH: ${{ github.event.pull_request.base.ref != '' && github.event.pull_request.base.ref || '' }}
GITHUB_TOKEN: ${{secrets.CI_GITHUB_TOKEN}}

steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- name: Set environment variable for container image
run: |
echo "CONTAINER_IMAGE=${{ secrets.BUILDBOT_DOCKER_IMAGE }}" >> $GITHUB_ENV
echo "CONTAINER_NAME=my_container_${{ github.run_id }}" >> $GITHUB_ENV

- name: Pull container image
run: docker pull "${{env.CONTAINER_IMAGE}}"

- name: Run container
run: |
docker run -d --name "${{env.CONTAINER_NAME}}" $CONTAINER_IMAGE sleep infinity
docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "echo 'Running commands inside the container'"

- name: Escape pull request title
run: |
import json
import os
import shlex
with open('${{ github.event_path }}') as fh:
event = json.load(fh)
escaped = event['pull_request']['title']
with open(os.environ['GITHUB_ENV'], 'a') as fh:
print(f'PR_TITLE={escaped}', file=fh)
shell: python3 {0}

- name: Trigger Buildbot Build
run: |
echo "${{ secrets.BUILDBOT_HOST }}:${{ secrets.BUILDBOT_WORKER_PORT }}"
docker exec -e PR_TITLE="$PR_TITLE" "${{env.CONTAINER_NAME}}" /bin/bash -c 'buildbot sendchange -W ${{ secrets.BUILDBOT_USER }} -a ${{secrets.BUILDBOT_USER}}:${{secrets.BUILDBOT_PWD}} --master="${{ secrets.BUILDBOT_HOST }}:${{ secrets.BUILDBOT_WORKER_PORT }}" --branch=${{ env.BASE_BRANCH }} --revision=${{ env.PR_SHA }} -p PR_NUMBER:${{ env.PR_NUMBER }} -p PR_TITLE:"$PR_TITLE" -p PR_URL:${{ env.PR_URL }} -p SHA:${{ env.PR_SHA }}'

- name: Set Initial Status to Pending
run: |
docker exec -e PR_SHA=$PR_SHA -e GITHUB_TOKEN=$GITHUB_TOKEN "${{env.CONTAINER_NAME}}" /bin/bash -c "python3 -c \"
import os
import requests
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')
TARGET_SHA = os.getenv('PR_SHA')
print('debug', TARGET_SHA)
api_url = f'https://api.github.com/repos/AMD-Lightning-Internal/llvm-project/statuses/{TARGET_SHA}'
headers = {
'Authorization': f'token {GITHUB_TOKEN}',
'Content-Type': 'application/json'
}
payload = {
'state': 'pending',
'context': 'buildbot',
'description': 'Build is in queue'
}
response = requests.post(api_url, json=payload, headers=headers)
if response.status_code == 201:
print('Status set to pending successfully.')
else:
print(f'Failed to set status: {response.status_code} {response.text}')
\""

- name: Poll Buildbot build status
run: |
python3 -c "
import os
import time
import requests
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')
BUILD_URL = 'http://${{ secrets.BUILDBOT_HOST }}:${{ secrets.BUILDBOT_MASTER_PORT }}/api/v2/builds'
TARGET_SHA = os.getenv('PR_SHA')
print('debug', TARGET_SHA)
MAX_RETRIES = 10
RETRY_INTERVAL = 30 # seconds

def get_build_properties(build_id):
build_properties_url = f'http://${{ secrets.BUILDBOT_HOST }}:${{ secrets.BUILDBOT_MASTER_PORT }}/api/v2/builds/{build_id}/properties'
response = requests.get(build_properties_url, headers={'Accept': 'application/json', 'Authorization': f'token {GITHUB_TOKEN}'})
return response.json()

for i in range(MAX_RETRIES):
response = requests.get(BUILD_URL, headers={'Accept': 'application/json'})
response_json = response.json()
print(f'Attempt {i + 1}: Buildbot response:', response_json)

# Check if any build has the target SHA
builds = response_json.get('builds', [])
print (builds)
build_with_sha = None
for build in builds:
build_id = build['buildid']
properties = get_build_properties(build_id)
#print(properties)
#prop = properties.get('revision', [])

if 'properties' in properties:
print (properties['properties'])
if 'revision' in properties['properties'][0]:
print(properties['properties'][0])
if 'revision' in properties['properties'][0] and properties['properties'][0]['revision'] [0] == TARGET_SHA:
build_with_sha = build
break

if build_with_sha:
print('Build started successfully for SHA:', TARGET_SHA)
break
else:
print('Build for SHA not started yet, retrying in', RETRY_INTERVAL, 'seconds')
time.sleep(RETRY_INTERVAL)
else:
print('Build did not start for SHA:', TARGET_SHA, 'after maximum retries')
exit(1)
"

- name: Stop and remove container
if: always()
run: |
docker stop "${{env.CONTAINER_NAME}}"
docker rm "${{env.CONTAINER_NAME}}"

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {}

Copilot Autofix

AI 9 days ago

To fix this issue, an explicit permissions block should be added to the workflow YAML file. This block should be placed at the root of the workflow (i.e., before jobs:), so it apples to all jobs unless further overridden per-job. The permissions should be as minimal as required by the workflow. From the workflow code provided, the statuses: write permission is needed (the workflow sets commit status via API), and possibly contents: read if it needs to fetch the repository contents. Unless there's evidence the workflow needs other write-level permissions, the block
should specify:

permissions:
  contents: read
  statuses: write

This must be added on a new line after (or before) the name: and before the on: block. No additional imports or complex changes are needed; only a change to the YAML file.


Suggested changeset 1
.github/workflows/buildbot-psdb-trigger.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/buildbot-psdb-trigger.yml b/.github/workflows/buildbot-psdb-trigger.yml
--- a/.github/workflows/buildbot-psdb-trigger.yml
+++ b/.github/workflows/buildbot-psdb-trigger.yml
@@ -1,4 +1,7 @@
 name: Trigger amd-debug Buildbot Build
+permissions:
+  contents: read
+  statuses: write
 on:
   workflow_dispatch:
   pull_request:
EOF
@@ -1,4 +1,7 @@
name: Trigger amd-debug Buildbot Build
permissions:
contents: read
statuses: write
on:
workflow_dispatch:
pull_request:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +11 to +79
runs-on:
group: compiler-generic-runners

steps:
- name: Set environment variable for container image
run: |
echo "CONTAINER_IMAGE=${{ secrets.JENKINS_TRIGGER_DOCKER_IMAGE }}" >> $GITHUB_ENV
echo "CONTAINER_NAME=my_container_${{ github.run_id }}" >> $GITHUB_ENV

- name: Pull container image
run: docker pull "${{env.CONTAINER_IMAGE}}"

- name: Run container
run: |
docker run -d --name "${{env.CONTAINER_NAME}}" $CONTAINER_IMAGE sleep infinity
docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "echo 'Running commands inside the container'"

- name: Trigger compute-rocm-dkms-afar job
run: |
docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "python -c \"
import requests
import time
from requests.auth import HTTPBasicAuth

jenkins_user = '${{ secrets.CI_JENKINS_USER }}'
jenkins_token = '${{ secrets.ROCM_JENKINS_CI_TOKEN }}'
jenkins_host = '${{ secrets.ROCM_JENKINS_HOST }}'
jenkins_job = '${{ secrets.ROCM_JENKINS_OSDB_JOB }}'

jenkins_url = f'{jenkins_host}/job/{jenkins_job}/buildWithParameters'

response = requests.post(jenkins_url, auth=HTTPBasicAuth(jenkins_user, jenkins_token))

if response.status_code == 201:
print('Jenkins job triggered successfully!')
queue_url = response.headers.get('Location')
if queue_url:
print(f'Queue URL: {queue_url}')
print(f'Getting build URL(max 5 attempts with 10seconds interval)...')
# Poll the queue item to get the build number, limited to 5 attempts
max_attempts = 5
attempts = 0
while attempts < max_attempts:
queue_response = requests.get(queue_url + 'api/json', auth=HTTPBasicAuth(jenkins_user, jenkins_token))
queue_data = queue_response.json()
if 'executable' in queue_data:
build_number = queue_data['executable']['number']
build_url = f'{jenkins_host}/job/{jenkins_job}/{build_number}/'
print(f'Build URL: {build_url}')
break
attempts += 1
time.sleep(10) # Wait for 10 seconds before polling again
else:
print('Exceeded maximum attempts to get the build URL. The trigger happened, so not failing the workflow')
else:
print('Build URL not found in the response headers.')

elif response.status_code == 200:
print('Request was successful, but check the response content for details.')
print(response.text)
else:
print(f'Failed to trigger Jenkins job. Status code: {response.status_code}')
\""

- name: Stop and remove container
if: always()
run: |
docker stop "${{env.CONTAINER_NAME}}"
docker rm "${{env.CONTAINER_NAME}}"

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {}

Copilot Autofix

AI 9 days ago

To fix this, add a permissions block at the top-level of the workflow YAML (recommended), so it applies to all jobs unless overridden. The workflow does not perform any actions that require write privileges (e.g., modifying contents, issues, metadata, etc.). Therefore, the most restrictive and safe default is to set contents: read, which prevents the token from being used to make changes to the repository. This should be inserted directly after the name: declaration, before the on: block. No additional methods, imports, or code changes are needed—just an update to the YAML structure in .github/workflows/compute-rocm-dkmd-afar-trigger.yml.

Suggested changeset 1
.github/workflows/compute-rocm-dkmd-afar-trigger.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/compute-rocm-dkmd-afar-trigger.yml b/.github/workflows/compute-rocm-dkmd-afar-trigger.yml
--- a/.github/workflows/compute-rocm-dkmd-afar-trigger.yml
+++ b/.github/workflows/compute-rocm-dkmd-afar-trigger.yml
@@ -1,4 +1,6 @@
 name: Trigger compute-rocm-dkms-afar job on push
+permissions:
+  contents: read
 
 on:
   push: # This triggers the workflow on push events
EOF
@@ -1,4 +1,6 @@
name: Trigger compute-rocm-dkms-afar job on push
permissions:
contents: read

on:
push: # This triggers the workflow on push events
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.