-
Notifications
You must be signed in to change notification settings - Fork 75
[AMDGPU] TTI: Prioritize per-iter base adds in LSR plan comparison #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Mach-o has 32 bit file offsets in the MachO::section_64 structs. dSYM files can contain sections whose start offset exceeds UINT32_MAX, which means the MachO::section_64.offset will get truncated. We can calculate when this happens and properly adjust the section offset to be 64 bit safe. This means tools can get the correct section contents for large dSYM files and allows tools that parse DWARF, like llvm-gsymutil, to be able to load and convert these files correctly.
This reverts commit df1d786.
Upstream the CXXDefaultArgExpr support for AggregateExpr
Add faccessat entrypoints for aarch64 and riscv linux. Entrypoints are removed if faccessat2 syscall is not available.
ConstantRange uses `[-1, -1)` as the canonical form of a full set. Therefore, the `for (APInt I = Lower; I != Upper; ++I)` idiom doesn't work for full ranges. This patch fixes the value enumeration in `ConstantComparesGatherer` to prevent missing values for full sets. Closes llvm#166369.
…no-gpu-bundle-output flag (llvm#163834) Currently, the command `clang -c -emit-llvm --no-gpu-bundle-output --offload-arch=gfx900,gfx1030 -O3 -x hip square.hip` will lead to a bundled output: ``` ❯ ../bin/clang -c -emit-llvm --no-gpu-bundle-output --offload-arch=gfx900,gfx1030 -O3 -x hip square.hip ❯ ls square.hip square.bc ``` This doesn't match my expectation of the behavior of `--no-gpu-bundle-output`, so this adds a check into OffloadingActionBuilder for the flag when replacing the host compile action for a bundling action.
PR llvm#159163's probability computation for epilogue loops does not handle the possibility of an original loop probability of one. Runtime loop unrolling does not make sense for such an infinite loop, and a division by zero results. This patch works around that case. Issue llvm#165998.
Fix Python virtual environment paths for Windows in the Lifetime Safety Analysis benchmark ### What changed? - Added conditional path setting for the Python executable in the virtual environment based on the platform - For Windows, use `Scripts/python` path - For other platforms, use `bin/python` path - Updated the commands that use the Python virtual environment to use the platform-specific path ### How to test? `ninja benchmark_lifetime_safety_analysis` Fixes llvm#166143
Omitting `-o /dev/null` may try to write output to the current dir, which may not have write permissions on some build systems. This fixes the test added by llvm#165737
…llvm#166312) This is consistent with some other nodes that require a constant. Particularly intrinsics with ImmArg.
…66290) Eventually this should be generated by tablegen for all functions. For now add a manually implementation for sincos_stret, which I have an immediate use for. This will allow pulling repeated code across targets into shared call sequence code. Also add sqrt just to make sure we can handle adding return attributes on the declaration.
Upstream the Builtin ExpOp
Avoid weird lambda returning function pointer and sink the libcall logic to where the operation is handled. This allows chaining the libcall logic to try sincos_stret and fallback to sincos. The resulting cost seems too low.
This avoids AArch64 legality rules depending on libcall availability. ARM, AArch64, and X86 all had custom lowering of fsincos which all were just to emit calls to sincos_stret / sincosf_stret. This messes with the cost heuristics around legality, because really it's an expand/libcall cost and not a favorable custom. This is a bit ugly, because we're emitting code trying to match the C ABI lowered IR type for the aggregate return type. This now also gives an easy way to lift the unhandled x86_32 darwin case, since ARM already handled the return as sret case.
Previously, `clang -march=znver5 -O3` would emit the following for
`shl`, `lshr` and `ashr <64 x i8>`:
```asm
.LCPI0_2:
.byte 8
.byte 4
.byte 2
.byte 1
.byte 0
.byte 0
.byte 0
.byte 0
.LCPI0_3:
.byte 32
.byte 16
.byte 8
.byte 4
.byte 2
.byte 1
.byte 0
.byte 0
shl:
vpsllw zmm1, zmm1, 5
vpmovb2m k1, zmm1
vpaddb zmm1, zmm1, zmm1
vgf2p8affineqb zmm0 {k1}, zmm0, qword ptr [rip + .LCPI0_2]{1to8}, 0
vpmovb2m k1, zmm1
vpaddb zmm1, zmm1, zmm1
vgf2p8affineqb zmm0 {k1}, zmm0, qword ptr [rip + .LCPI0_3]{1to8}, 0
vpmovb2m k1, zmm1
vpaddb zmm0 {k1}, zmm0, zmm0
ret
.LCPI1_3:
.byte 0
.byte 0
.byte 0
.byte 0
.byte 128
.byte 64
.byte 32
.byte 16
.LCPI1_4:
.byte 0
.byte 0
.byte 128
.byte 64
.byte 32
.byte 16
.byte 8
.byte 4
.LCPI1_5:
.byte 0
.byte 128
.byte 64
.byte 32
.byte 16
.byte 8
.byte 4
.byte 2
lshr:
vpsllw zmm1, zmm1, 5
vpmovb2m k1, zmm1
vpaddb zmm1, zmm1, zmm1
vgf2p8affineqb zmm0 {k1}, zmm0, qword ptr [rip + .LCPI1_3]{1to8}, 0
vpmovb2m k1, zmm1
vpaddb zmm1, zmm1, zmm1
vgf2p8affineqb zmm0 {k1}, zmm0, qword ptr [rip + .LCPI1_4]{1to8}, 0
vpmovb2m k1, zmm1
vgf2p8affineqb zmm0 {k1}, zmm0, qword ptr [rip + .LCPI1_5]{1to8}, 0
ret
ashr:
vpsllw zmm1, zmm1, 5
vpunpckhbw zmm2, zmm0, zmm0
vpunpckhbw zmm4, zmm1, zmm1
vpsraw zmm3, zmm2, 4
vpunpcklbw zmm0, zmm0, zmm0
vpmovb2m k1, zmm4
vpaddw zmm4, zmm4, zmm4
vpunpcklbw zmm1, zmm1, zmm1
vmovdqu8 zmm2 {k1}, zmm3
vpmovb2m k1, zmm4
vpsraw zmm3, zmm2, 2
vpaddw zmm4, zmm4, zmm4
vmovdqu8 zmm2 {k1}, zmm3
vpsraw zmm3, zmm2, 1
vpmovb2m k1, zmm4
vmovdqu8 zmm2 {k1}, zmm3
vpmovb2m k1, zmm1
vpsraw zmm3, zmm0, 4
vpaddw zmm1, zmm1, zmm1
vpsrlw zmm2, zmm2, 8
vmovdqu8 zmm0 {k1}, zmm3
vpmovb2m k1, zmm1
vpsraw zmm3, zmm0, 2
vpaddw zmm1, zmm1, zmm1
vmovdqu8 zmm0 {k1}, zmm3
vpsraw zmm3, zmm0, 1
vpmovb2m k1, zmm1
vmovdqu8 zmm0 {k1}, zmm3
vpsrlw zmm0, zmm0, 8
vpackuswb zmm0, zmm0, zmm2
ret
```
With this commit, the generated assembly becomes this:
```asm
.LCPI0_2:
.byte 0
.byte 255
.byte 0
.byte 255
.LCPI0_3:
.byte 255
.byte 0
.byte 255
.byte 0
shl:
vpsrlw zmm2, zmm1, 8
vpandd zmm3, zmm0, dword ptr [rip + .LCPI0_2]{1to16}
vpandd zmm1, zmm1, dword ptr [rip + .LCPI0_3]{1to16}
movabs rax, -6148914691236517206
kmovq k1, rax
vpsllvw zmm2, zmm3, zmm2
vpsllvw zmm0, zmm0, zmm1
vmovdqu8 zmm0 {k1}, zmm2
ret
.LCPI1_0:
.byte 255
.byte 0
lshr:
vpbroadcastw zmm2, word ptr [rip + .LCPI1_0]
movabs rax, -6148914691236517206
kmovq k1, rax
vpandq zmm3, zmm1, zmm2
vpandq zmm2, zmm0, zmm2
vpsrlw zmm1, zmm1, 8
vpsrlvw zmm2, zmm2, zmm3
vpsrlvw zmm0, zmm0, zmm1
vmovdqu8 zmm2 {k1}, zmm0
vmovdqa64 zmm0, zmm2
ret
.LCPI2_1:
.byte 255
.byte 0
.byte 255
.byte 0
ashr:
vpsrlw zmm2, zmm1, 8
vpandd zmm1, zmm1, dword ptr [rip + .LCPI2_1]{1to16}
movabs rax, -6148914691236517206
vpsravw zmm2, zmm0, zmm2
vpsllw zmm0, zmm0, 8
kmovq k1, rax
vpsraw zmm0, zmm0, 8
vpsravw zmm0, zmm0, zmm1
vmovdqu8 zmm0 {k1}, zmm2
ret
```
While I don't have AVX512 hardware, llvm-mca suggests significant
speedups, and I've done some simple correctness tests on random inputs
using the Intel Software Development Emulator.
Update the verifier to first check if the number of incoming values matches the number of predecessors, before using incoming_values_and_blocks. We unfortunately need also check here, as this may be called before verifyPhiRecipes runs. Also update the verifier unit tests, to actually fail for the expected recipes.
…irst` (llvm#166384) This paves the way to make `--stack-first` the default. See: llvm#151015
When running UTC on SLU/guards.ll (without LLVM changes), there are a number of changes in the UTC-generated checks. Submitting those first to simplify the diff of PR llvm#164271, as most of the changes in the latter were actually these.
…66260) MachineFunctionSplitter was missing a skipFunction() check, causing it to incorrectly split functions that should be skipped (e.g., functions with optnone attribute). This patch adds an early skipFunction() check in runOnMachineFunction() to ensure these functions are never split, regardless of profile data availability or other splitting conditions.
…ction (llvm#165774) Updated the conditions for generating the cvt.ftz.f32.bf16 instruction to include sm90 and isa7.8, so that ftz is only generated when it is supported. --------- Co-authored-by: Justin Fargnoli <[email protected]>
Broken by 831e79a, though presubmit was somehow green.
Introduced in OpenMP 6.0, the device UID shall be a unique identifier of a device on a given system. (Not necessarily a UUID.) Since it is not guaranteed that the (U)UIDs defined by the device vendor libraries, such as HSA, do not overlap with those of other vendors, the device UIDs in offload are always combined with the offload plugin name. In case the vendor library does not specify any device UID for a given device, we fall back to the offload-internal device ID. The device UID can be retrieved using the `llvm-offload-device-info` tool.
From llvm#148554 (comment) - this adds an option for API tests to be run with the both PDB readers on Windows. As there are a lot of failures with PDB, this is an opt-in per test. To get PDB, `-g -gcodeview` has to be used on Clang. `-gcodeview` alone isn't enough, because it won't cause clang to pass `-debug` to the linker. llvm#149498 tracks the (currently) failing tests.
Ensure that the nested native build uses the same python interpreter as the main build, in case the python that CMake detects first is not the python that the user has specified explicitly. For example, if the person building LLVM wants to use a different python interpreter to build (eg, testing the build with `python3.14` when `python3` is a link to `python3.8`, or the default python doesn't have development headers available) then they could add `-DPYTHON_EXECUTABLE=python3.14` when invoking CMake. This should be forwarded to the native CMake build to ensure that the same python is used. Original fix by Anuj Mittal <[email protected]>.
Reviewers: paperchalice, phoebewang, arsenm Reviewed By: arsenm Pull Request: llvm#165113
There are no tests that specifically stop/start at x86-partial-reduction, so no test cases have been updated for this patch. Reviewers: phoebewang, paperchalice, RKSimon, arsenm Reviewed By: arsenm, RKSimon Pull Request: llvm#166048
Currently, the jump table entry contains labels only. For example, the
following is one example:
BPF.JT.0.0:
.quad LBB0_1
.quad LBB0_2
.size BPF.JT.0.0, 16
Since the jump table entry contains a label, the relocation is necessary
so linker can resolve the label value. The relocation looks like below:
Relocation section '.rel.jumptables' at offset 0x160 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name
0000000000000000 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text
0000000000000008 0000000200000002 R_BPF_64_ABS64 0000000000000000 .text
You can see that the symbol value is 0 which makes .rel.jumptables not
very useful.
Instead of having the label itself in the jump table entry, use the
difference of label and the section begin symbol. This can avoid the
relocation and the eventual jumptable entries in object file remains the
same as before.
Hex dump of section '.jumptables':
0x00000000 68000000 00000000 78000000 00000000 h.......x.......
The register values between `2 << 30` (inclusive) and `2 << 31` (exclusive) correspond to frame indices. To obtain the frame index from the given register value we interpret first 30 bits as an unsigned integer. Thus, currently only non-negative frame indices can be represented. However, we should also be able to represent negative frame indices as register values as well. This is used by reaching definitions analysis for example. In order to do that, we interpret the first 30 bits of the register value as a signed integer. --------- Co-authored-by: Mikhail Gudim <[email protected]> Co-authored-by: Petr Penzin <[email protected]>
This patch introduces `SBFrameList`, a new SBAPI class that allows iterating over stack frames lazily without calling `SBThread::GetFrameAtIndex` in a loop. The new `SBThread::GetFrames()` method returns an `SBFrameList` that supports Python iteration (`for frame in frame_list:`), indexing (`frame_list[0]`, `frame_list[-1]`), and length queries (`len()`). The implementation uses `StackFrameListSP` as the opaque pointer, sharing the thread's underlying frame list to ensure frames are materialized on-demand. This is particularly useful for ScriptedFrameProviders, where user scripts will be to iterate, filter, and replace frames lazily without materializing the entire stack upfront. Signed-off-by: Med Ismail Bennani <[email protected]>
This new test case breaks the buildbot starting https://lab.llvm.org/buildbot/#/builders/201/builds/6934. The corresponding clang test case sets the target triple to avoid failures.
…6670) I was looking at the calls to `usleep` in debugserver and noticed that these default arguments are never overwritten. I converted them to constants in the function, which makes it easier to reason about.
This is step 1 of exposing WASM `ref.func` to LLVM. This PR only handles creating the instruction and a test for assembling it.
From the OpenACC 3.4 Specification: ``` 1924 2.7.9 copyout clause 1925 The copyout clause may appear on structured data and compute constructs, on declare di 1926 rectives, and on exit data directives. The clause may optionally have a zero modifier if the 1927 copyout clause appears on a structured data or compute construct. 1928 Only the following modifiers may appear in the optional modifier-list: always, alwaysin or zero. 1929 Additionally, on structured data and compute constructs capture modifier may appear ``` `readonly` is not a legal modifier for the `copyout` clause. The call to `genDataOperandOperationsWithModifier` should be checking the parsed modifier for the `copyout` clause against the `Zero` modifier.
Revealed in PR llvm#166102, which itself doesn't _cause_ the profile being dropped. Referencing if it makes debugging easier.
…ead exits (llvm#164714) As noted in the doc comment of `handleDeadExits`, the dummy switch is just an artifact of the constraints placed by the fact that we operate in a loop pass. Adding weights here is unnecessary, but the complexity is low, and it helps keep things easy for profile propagation verification (in a sense, the overall complexity would be higher if we special-cased this somehow). Issue llvm#147390
This was very hackily patched in 16ef893 to not use env -u. The internal shell does not support unset, but does supprt env -u. Disable the test for now so we can enable the internal shell with a TODO to enable it after the internal shell landing has stuck. Reviewers: fmayer, vitalybuka, w2yehia, daltenty, mingmingl-llvm, madanial0 Reviewed By: mingmingl-llvm Pull Request: llvm#166637
This fixes a regression exposed after 4454152. This introduces a few small regressions for true16. There are more cases where the value can propagate through subregister extracts which need new handling. They're also small enough that perhaps there's a way to avoid needing to deal with this case in the first place.
Removing skip after confirming builds pass locally. Upgraded workers to clang 20 Fixes llvm#162516
LSR on AMDGPU was selecting plans that reduce Setup/AddRec cost while increasing per-iteration base additions (`NumBaseAdds`) in the loop body.With minimal addressing modes, each extra `add i32` becomes a separate VALU op and hurts throughput (observed on gfx942). Teach GCNTTIImpl to compare LSR plans with `NumBaseAdds` and then `Insns` as dominant criteria, ahead of preheader costs (`SetupCost`, `AddRecCost`). Also return false from isNumRegsMajorCostOfLSR() and true from shouldDropLSRSolutionIfLessProfitable(). This steers LSR away from “-setup +more loop adds” trades; the reduced repro now keeps baseline and a large kernel regains performance.
- Fixed a few lit-tests by limiting to GFX9 and above - Regenerated no-crash tests
- Reorder GCNTTIImpl::isLSRCostLess() for GFX9+ to prioritize per-iteration work:
1) Insns (lower is better)
2) NumBaseAdds (tie-breaker to Insns)
3) NumIVMuls (penalize mul/mul_hi/addc IV chains)
4) AddRecCost, then SetupCost (preheader costs only if per-iter ties)
5) Minor keys: ScaleCost, then NumRegs
- Keep pre-GFX9 behavior unchanged (fall back to BaseT::isLSRCostLess).
- Rationale: AMDGPU lacks rich addressing modes; we must avoid LSR “wins” that
reduce setup but add per-iteration base-adds / wide IV mul chains. This ordering
consistently favors plans with less per-iteration work and reduces VALU pressure
on gfx9+ (e.g., gfx942). In practice LSR now often drops its “improvement” in
favor of the baseline plan (as seen in -debug-only=lsr logs).
Tests:
- Regenerate `llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll` checks
(kernel `introduced_copy_to_sgpr`). Assembly changed in a beneficial way:
fewer VALU ops in the hot loops and lower register usage:
num_vgpr: 28 → 25
numbered_sgpr: 26 → 20
No functional change to the test’s intent (no RA asserts; copy/RA behavior intact).
Checks updated via `utils/update_llc_test_checks.py`.
… base-add count Adjust the LSR cost comparison order for GFX9+ to prioritize per-iteration IV efficiency over minimizing base-adds. Previous order had NumBaseAdds checked early, causing the cost model to prefer baseline solutions with multiple IVs (NumBaseAdds=0) over optimized LSR solutions with fewer IVs but one base-add (NumBaseAdds=1). This resulted in extra register moves between IVs in the loop body, increasing VALU pressure. Changes to priority order for GFX9+: 1. Insns (total per-iteration instructions) 2. NumIVMuls (IV multiplication chains) <- moved above NumBaseAdds 3. AddRecCost (per-iteration IV updates) <- moved above NumBaseAdds 4. NumBaseAdds (base address adds) <- moved down 5. SetupCost (preheader costs) 6. ScaleCost, NumRegs (tie-breakers) Rationale: AddRecCost directly measures per-iteration IV update cost (fewer IVs = lower cost), which better reflects actual loop body work than NumBaseAdds alone. This fixes cases where baseline with 2 IVs generated more VALU instructions than LSR solution with 1 IV + base-add.
This patch refines the AMDGPU LSR cost model to better reflect the lack of rich addressing modes on the architecture. Key changes: 1. Implement getScalingFactorCost() to return 1 for base+scale*index addressing modes. While AMDGPU's isLegalAddressingMode() reports such modes as "legal", they require a separate ADD instruction unlike architectures with hardware-supported complex addressing. 2. Update isLSRCostLess() to use EffInsns = Insns + ScaleCost for the primary comparison. This ensures that LSR solutions using scaled addressing are properly penalized for the actual per-iteration work on AMDGPU. New priority order for GFX9+: 1. EffInsns (Insns + ScaleCost) - true per-iteration cost 2. NumIVMuls - IV multiplication chains 3. AddRecCost - IV update cost 4. NumBaseAdds - base address additions 5. SetupCost - preheader costs 6. ImmCost, NumRegs - tie-breakers Lit tests were carefully analyzed before regeneration to ensure: - No functional regressions (all test objectives preserved) - No incorrect machine sinking or divergence handling issues - Acceptable trade-offs (preheader overhead vs per-iteration efficiency) - No register spills or unnecessary barriers introduced
| if: github.event.pull_request.draft == false | ||
| runs-on: | ||
| group: compiler-generic-runners | ||
| env: | ||
| svc_acc_org_secret: ${{secrets.CI_GITHUB_TOKEN}} | ||
| input_sha: ${{ github.event.pull_request.head.sha != '' && github.event.pull_request.head.sha || github.sha }} | ||
| input_pr_num: ${{ github.event.pull_request.number != '' && github.event.pull_request.number || 0 }} | ||
| input_pr_url: ${{ github.event.pull_request.html_url != '' && github.event.pull_request.html_url || '' }} | ||
| input_pr_title: ${{ github.event.pull_request.title != '' && github.event.pull_request.title || '' }} | ||
| # set the pipeline name here based on branch name | ||
| pipeline_name: ${{secrets.CI_JENKINS_JOB_NAME}} | ||
| JENKINS_URL: ${{secrets.CI_JENKINS_URL}} | ||
| CONTAINER_IMAGE: ${{ secrets.JENKINS_TRIGGER_DOCKER_IMAGE }} | ||
|
|
||
| # Steps represent a sequence of tasks that will be executed as part of the job | ||
| steps: | ||
|
|
||
| # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it | ||
| - name: Set environment variable for container image | ||
| run: | | ||
| echo "CONTAINER_IMAGE=${{ secrets.JENKINS_TRIGGER_DOCKER_IMAGE }}" >> $GITHUB_ENV | ||
| echo "CONTAINER_NAME=my_container_${{ github.run_id }}" >> $GITHUB_ENV | ||
|
|
||
|
|
||
| - name: Pull container image | ||
| run: docker pull "${{env.CONTAINER_IMAGE}}" | ||
|
|
||
|
|
||
| - name: Run container | ||
| run: | | ||
| docker run -d --name "${{env.CONTAINER_NAME}}" $CONTAINER_IMAGE sleep infinity | ||
| #docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "git clone ${{secrets.CI_UTILS_REPO}} ." | ||
| docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "echo 'Running commands inside the container'" | ||
|
|
||
| - name: Escape pull request title | ||
| run: | | ||
| import json | ||
| import os | ||
| import shlex | ||
| with open('${{ github.event_path }}') as fh: | ||
| event = json.load(fh) | ||
| escaped = event['pull_request']['title'] | ||
| with open(os.environ['GITHUB_ENV'], 'a') as fh: | ||
| print(f'PR_TITLE={escaped}', file=fh) | ||
| shell: python3 {0} | ||
|
|
||
| - name: Run Jenkins Cancel Script | ||
| env: | ||
| JENKINS_URL: ${{secrets.CI_JENKINS_URL}} | ||
| JENKINS_USER: ${{secrets.CI_JENKINS_USER}} | ||
| JENKINS_API_TOKEN: ${{secrets.CI_JENKINS_TOKEN}} | ||
| JENKINS_JOB_NAME: ${{secrets.CI_JENKINS_JOB_NAME}} | ||
| PR_NUMBER: ${{ github.event.pull_request.number }} | ||
| COMMIT_HASH: ${{ github.event.after }} | ||
| run: | | ||
| docker exec -e JENKINS_JOB_NAME=${{secrets.CI_JENKINS_JOB_NAME}} -e PR_NUMBER=${{ github.event.pull_request.number }} -e COMMIT_HASH=${{ github.event.after }} -e JENKINS_URL=${{secrets.CI_JENKINS_URL}} -e JENKINS_USER=${{secrets.CI_JENKINS_USER}} -e JENKINS_API_TOKEN=${{secrets.CI_JENKINS_TOKEN}} "${{env.CONTAINER_NAME}}" /bin/bash -c "PYTHONHTTPSVERIFY=0 python3 cancel_previous_build.py" | ||
|
|
||
|
|
||
| # Runs a set of commands using the runners shell | ||
| - name: Getting Event Details | ||
| run: | | ||
| echo $(pwd) | ||
| echo $GITHUB_ENV | ||
| echo $GITHUB_REPOSITORY | ||
| echo $GITHUB_SERVER_URL | ||
| echo "GITHUB_SHA is: $GITHUB_SHA" | ||
| echo "GITHUB_WORKFLOW_SHA is: $GITHUB_WORKFLOW_SHA" | ||
| echo "GITHUB_BASE_REF is: $GITHUB_BASE_REF" | ||
| echo "GITHUB_REF_NAME is: $GITHUB_REF_NAME" | ||
| echo "github.event.pull_request.id is: ${{github.event.pull_request.id}}" | ||
| echo "github.event.pull_request.html_url is: ${{github.event.pull_request.html_url}}" | ||
| echo "github.event.pull_request.number is: ${{github.event.pull_request.number}}" | ||
| echo "github.event.pull_request.url is: ${{github.event.pull_request.url}}" | ||
| echo "github.event.pull_request.issue_url is: ${{github.event.pull_request.issue_url}}" | ||
| echo "github.event.pull_request.head.sha is: ${{github.event.pull_request.head.sha}}" | ||
| echo "github.event.pull_request.base.ref is: ${{github.event.pull_request.base.ref}}" | ||
| echo "github.event.pull_request.merge_commit_sha is: ${{github.event.pull_request.merge_commit_sha}}" | ||
| echo "github.event.pull_request is: ${{github.event.pull_request}}" | ||
|
|
||
|
|
||
| - name: Trigger Jenkins Pipeline | ||
| if: steps.check_changes.outcome != 'failure' | ||
| run: | | ||
| echo "--Running jenkins_api.py with input sha - $input_sha for pull request - $input_pr_url" | ||
| docker exec -e GITHUB_REPOSITORY="$GITHUB_REPOSITORY" -e svc_acc_org_secret="$svc_acc_org_secret" -e input_sha="$input_sha" -e input_pr_url="$input_pr_url" -e pipeline_name="$pipeline_name" \ | ||
| -e input_pr_num="$input_pr_num" -e PR_TITLE="$PR_TITLE" -e JENKINS_URL="$JENKINS_URL" -e GITHUB_PAT="$svc_acc_org_secret" "${{env.CONTAINER_NAME}}" \ | ||
| /bin/bash -c 'echo \"PR NUM: "$input_pr_num"\" && PYTHONHTTPSVERIFY=0 python3 jenkins_api.py -s \"${JENKINS_URL}\" -jn "$pipeline_name" -ghr "$GITHUB_REPOSITORY" -ghsha "$input_sha" -ghprn "$input_pr_num" -ghpru "$input_pr_url" -ghprt "$PR_TITLE" -ghpat="$svc_acc_org_secret"' | ||
|
|
||
| - name: Stop and remove container | ||
| if: always() | ||
| run: | | ||
| docker stop "${{env.CONTAINER_NAME}}" | ||
| docker rm "${{env.CONTAINER_NAME}}" |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 9 days ago
To fix the problem, explicitly declare a permissions: block at the root of the workflow YAML file (line 2, immediately after the name: and before on:) to limit the GITHUB_TOKEN supplied to jobs. As a minimal baseline, set contents: read, which allows only read access to repository contents. If future steps need further access (such as posting checks, updating issues, etc.), they can be added to the permissions block, but for now, limiting to contents: read is the secure default. No methods, imports, or code logic change is required — only the YAML workflow file must be updated.
-
Copy modified lines R2-R3
| @@ -1,4 +1,6 @@ | ||
| name: Compiler CI PSDB trigger on amd-staging branch | ||
| permissions: | ||
| contents: read | ||
|
|
||
| # Controls when the workflow will run | ||
| on: |
| if: github.event.pull_request.draft == false | ||
| runs-on: | ||
| group: compiler-generic-runners | ||
| env: | ||
| PR_SHA: ${{ github.event.pull_request.head.sha != '' && github.event.pull_request.head.sha || github.sha }} | ||
| PR_NUMBER: ${{ github.event.pull_request.number != '' && github.event.pull_request.number || 0 }} | ||
| PR_URL: ${{ github.event.pull_request.html_url != '' && github.event.pull_request.html_url || '' }} | ||
| PR_TITLE: ${{ github.event.pull_request.title != '' && github.event.pull_request.title || '' }} | ||
| BASE_BRANCH: ${{ github.event.pull_request.base.ref != '' && github.event.pull_request.base.ref || '' }} | ||
| GITHUB_TOKEN: ${{secrets.CI_GITHUB_TOKEN}} | ||
|
|
||
| steps: | ||
| # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it | ||
| - name: Set environment variable for container image | ||
| run: | | ||
| echo "CONTAINER_IMAGE=${{ secrets.BUILDBOT_DOCKER_IMAGE }}" >> $GITHUB_ENV | ||
| echo "CONTAINER_NAME=my_container_${{ github.run_id }}" >> $GITHUB_ENV | ||
|
|
||
| - name: Pull container image | ||
| run: docker pull "${{env.CONTAINER_IMAGE}}" | ||
|
|
||
| - name: Run container | ||
| run: | | ||
| docker run -d --name "${{env.CONTAINER_NAME}}" $CONTAINER_IMAGE sleep infinity | ||
| docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "echo 'Running commands inside the container'" | ||
|
|
||
| - name: Escape pull request title | ||
| run: | | ||
| import json | ||
| import os | ||
| import shlex | ||
| with open('${{ github.event_path }}') as fh: | ||
| event = json.load(fh) | ||
| escaped = event['pull_request']['title'] | ||
| with open(os.environ['GITHUB_ENV'], 'a') as fh: | ||
| print(f'PR_TITLE={escaped}', file=fh) | ||
| shell: python3 {0} | ||
|
|
||
| - name: Trigger Buildbot Build | ||
| run: | | ||
| echo "${{ secrets.BUILDBOT_HOST }}:${{ secrets.BUILDBOT_WORKER_PORT }}" | ||
| docker exec -e PR_TITLE="$PR_TITLE" "${{env.CONTAINER_NAME}}" /bin/bash -c 'buildbot sendchange -W ${{ secrets.BUILDBOT_USER }} -a ${{secrets.BUILDBOT_USER}}:${{secrets.BUILDBOT_PWD}} --master="${{ secrets.BUILDBOT_HOST }}:${{ secrets.BUILDBOT_WORKER_PORT }}" --branch=${{ env.BASE_BRANCH }} --revision=${{ env.PR_SHA }} -p PR_NUMBER:${{ env.PR_NUMBER }} -p PR_TITLE:"$PR_TITLE" -p PR_URL:${{ env.PR_URL }} -p SHA:${{ env.PR_SHA }}' | ||
|
|
||
| - name: Set Initial Status to Pending | ||
| run: | | ||
| docker exec -e PR_SHA=$PR_SHA -e GITHUB_TOKEN=$GITHUB_TOKEN "${{env.CONTAINER_NAME}}" /bin/bash -c "python3 -c \" | ||
| import os | ||
| import requests | ||
| GITHUB_TOKEN = os.getenv('GITHUB_TOKEN') | ||
| TARGET_SHA = os.getenv('PR_SHA') | ||
| print('debug', TARGET_SHA) | ||
| api_url = f'https://api.github.com/repos/AMD-Lightning-Internal/llvm-project/statuses/{TARGET_SHA}' | ||
| headers = { | ||
| 'Authorization': f'token {GITHUB_TOKEN}', | ||
| 'Content-Type': 'application/json' | ||
| } | ||
| payload = { | ||
| 'state': 'pending', | ||
| 'context': 'buildbot', | ||
| 'description': 'Build is in queue' | ||
| } | ||
| response = requests.post(api_url, json=payload, headers=headers) | ||
| if response.status_code == 201: | ||
| print('Status set to pending successfully.') | ||
| else: | ||
| print(f'Failed to set status: {response.status_code} {response.text}') | ||
| \"" | ||
|
|
||
| - name: Poll Buildbot build status | ||
| run: | | ||
| python3 -c " | ||
| import os | ||
| import time | ||
| import requests | ||
| GITHUB_TOKEN = os.getenv('GITHUB_TOKEN') | ||
| BUILD_URL = 'http://${{ secrets.BUILDBOT_HOST }}:${{ secrets.BUILDBOT_MASTER_PORT }}/api/v2/builds' | ||
| TARGET_SHA = os.getenv('PR_SHA') | ||
| print('debug', TARGET_SHA) | ||
| MAX_RETRIES = 10 | ||
| RETRY_INTERVAL = 30 # seconds | ||
|
|
||
| def get_build_properties(build_id): | ||
| build_properties_url = f'http://${{ secrets.BUILDBOT_HOST }}:${{ secrets.BUILDBOT_MASTER_PORT }}/api/v2/builds/{build_id}/properties' | ||
| response = requests.get(build_properties_url, headers={'Accept': 'application/json', 'Authorization': f'token {GITHUB_TOKEN}'}) | ||
| return response.json() | ||
|
|
||
| for i in range(MAX_RETRIES): | ||
| response = requests.get(BUILD_URL, headers={'Accept': 'application/json'}) | ||
| response_json = response.json() | ||
| print(f'Attempt {i + 1}: Buildbot response:', response_json) | ||
|
|
||
| # Check if any build has the target SHA | ||
| builds = response_json.get('builds', []) | ||
| print (builds) | ||
| build_with_sha = None | ||
| for build in builds: | ||
| build_id = build['buildid'] | ||
| properties = get_build_properties(build_id) | ||
| #print(properties) | ||
| #prop = properties.get('revision', []) | ||
|
|
||
| if 'properties' in properties: | ||
| print (properties['properties']) | ||
| if 'revision' in properties['properties'][0]: | ||
| print(properties['properties'][0]) | ||
| if 'revision' in properties['properties'][0] and properties['properties'][0]['revision'] [0] == TARGET_SHA: | ||
| build_with_sha = build | ||
| break | ||
|
|
||
| if build_with_sha: | ||
| print('Build started successfully for SHA:', TARGET_SHA) | ||
| break | ||
| else: | ||
| print('Build for SHA not started yet, retrying in', RETRY_INTERVAL, 'seconds') | ||
| time.sleep(RETRY_INTERVAL) | ||
| else: | ||
| print('Build did not start for SHA:', TARGET_SHA, 'after maximum retries') | ||
| exit(1) | ||
| " | ||
|
|
||
| - name: Stop and remove container | ||
| if: always() | ||
| run: | | ||
| docker stop "${{env.CONTAINER_NAME}}" | ||
| docker rm "${{env.CONTAINER_NAME}}" |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 9 days ago
To fix this issue, an explicit permissions block should be added to the workflow YAML file. This block should be placed at the root of the workflow (i.e., before jobs:), so it apples to all jobs unless further overridden per-job. The permissions should be as minimal as required by the workflow. From the workflow code provided, the statuses: write permission is needed (the workflow sets commit status via API), and possibly contents: read if it needs to fetch the repository contents. Unless there's evidence the workflow needs other write-level permissions, the block
should specify:
permissions:
contents: read
statuses: writeThis must be added on a new line after (or before) the name: and before the on: block. No additional imports or complex changes are needed; only a change to the YAML file.
-
Copy modified lines R2-R4
| @@ -1,4 +1,7 @@ | ||
| name: Trigger amd-debug Buildbot Build | ||
| permissions: | ||
| contents: read | ||
| statuses: write | ||
| on: | ||
| workflow_dispatch: | ||
| pull_request: |
| runs-on: | ||
| group: compiler-generic-runners | ||
|
|
||
| steps: | ||
| - name: Set environment variable for container image | ||
| run: | | ||
| echo "CONTAINER_IMAGE=${{ secrets.JENKINS_TRIGGER_DOCKER_IMAGE }}" >> $GITHUB_ENV | ||
| echo "CONTAINER_NAME=my_container_${{ github.run_id }}" >> $GITHUB_ENV | ||
|
|
||
| - name: Pull container image | ||
| run: docker pull "${{env.CONTAINER_IMAGE}}" | ||
|
|
||
| - name: Run container | ||
| run: | | ||
| docker run -d --name "${{env.CONTAINER_NAME}}" $CONTAINER_IMAGE sleep infinity | ||
| docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "echo 'Running commands inside the container'" | ||
|
|
||
| - name: Trigger compute-rocm-dkms-afar job | ||
| run: | | ||
| docker exec "${{env.CONTAINER_NAME}}" /bin/bash -c "python -c \" | ||
| import requests | ||
| import time | ||
| from requests.auth import HTTPBasicAuth | ||
|
|
||
| jenkins_user = '${{ secrets.CI_JENKINS_USER }}' | ||
| jenkins_token = '${{ secrets.ROCM_JENKINS_CI_TOKEN }}' | ||
| jenkins_host = '${{ secrets.ROCM_JENKINS_HOST }}' | ||
| jenkins_job = '${{ secrets.ROCM_JENKINS_OSDB_JOB }}' | ||
|
|
||
| jenkins_url = f'{jenkins_host}/job/{jenkins_job}/buildWithParameters' | ||
|
|
||
| response = requests.post(jenkins_url, auth=HTTPBasicAuth(jenkins_user, jenkins_token)) | ||
|
|
||
| if response.status_code == 201: | ||
| print('Jenkins job triggered successfully!') | ||
| queue_url = response.headers.get('Location') | ||
| if queue_url: | ||
| print(f'Queue URL: {queue_url}') | ||
| print(f'Getting build URL(max 5 attempts with 10seconds interval)...') | ||
| # Poll the queue item to get the build number, limited to 5 attempts | ||
| max_attempts = 5 | ||
| attempts = 0 | ||
| while attempts < max_attempts: | ||
| queue_response = requests.get(queue_url + 'api/json', auth=HTTPBasicAuth(jenkins_user, jenkins_token)) | ||
| queue_data = queue_response.json() | ||
| if 'executable' in queue_data: | ||
| build_number = queue_data['executable']['number'] | ||
| build_url = f'{jenkins_host}/job/{jenkins_job}/{build_number}/' | ||
| print(f'Build URL: {build_url}') | ||
| break | ||
| attempts += 1 | ||
| time.sleep(10) # Wait for 10 seconds before polling again | ||
| else: | ||
| print('Exceeded maximum attempts to get the build URL. The trigger happened, so not failing the workflow') | ||
| else: | ||
| print('Build URL not found in the response headers.') | ||
|
|
||
| elif response.status_code == 200: | ||
| print('Request was successful, but check the response content for details.') | ||
| print(response.text) | ||
| else: | ||
| print(f'Failed to trigger Jenkins job. Status code: {response.status_code}') | ||
| \"" | ||
|
|
||
| - name: Stop and remove container | ||
| if: always() | ||
| run: | | ||
| docker stop "${{env.CONTAINER_NAME}}" | ||
| docker rm "${{env.CONTAINER_NAME}}" |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 9 days ago
To fix this, add a permissions block at the top-level of the workflow YAML (recommended), so it applies to all jobs unless overridden. The workflow does not perform any actions that require write privileges (e.g., modifying contents, issues, metadata, etc.). Therefore, the most restrictive and safe default is to set contents: read, which prevents the token from being used to make changes to the repository. This should be inserted directly after the name: declaration, before the on: block. No additional methods, imports, or code changes are needed—just an update to the YAML structure in .github/workflows/compute-rocm-dkmd-afar-trigger.yml.
-
Copy modified lines R2-R3
| @@ -1,4 +1,6 @@ | ||
| name: Trigger compute-rocm-dkms-afar job on push | ||
| permissions: | ||
| contents: read | ||
|
|
||
| on: | ||
| push: # This triggers the workflow on push events |
AMDGPU: reflect lack of complex addressing in LSR; fix regressions after SROA copy-load promotion
Background
Recent SROA changes promote copy-load style IR more aggressively. As a side effect, LoopStrengthReduce (LSR) now triggers on loops where it previously stayed quiet. On AMDGPU (no rich base+index addressing modes), generic LSR heuristics could pick plans that looked cheaper but expanded into extra address formation work in the loop body, causing performance regressions.
Problem
Solution
getScalingFactorCostinGCNTTIImplto model scaled addressing:If
HasBaseReg && Scale != 0, return cost 1 (one extra per-iteration instruction).isLSRCostLess(GFX9+):Effective Insns = Insns + ScaleCostThis makes scaled address formation count as real per-iteration work.
Priority order on GFX9+
Rationale
Validation
Notes
P.S. The commented-out debug messages will stay there till the final form (waiting for CQE extensive perf testing), then I'll remove them.