Skip to content

[TLE][feat] Add tle dsa extension#715

Open
huanghaoXcore wants to merge 16 commits into
flagos-ai:triton_v3.5.xfrom
huanghaoXcore:add_tle_extension
Open

[TLE][feat] Add tle dsa extension#715
huanghaoXcore wants to merge 16 commits into
flagos-ai:triton_v3.5.xfrom
huanghaoXcore:add_tle_extension

Conversation

@huanghaoXcore

Copy link
Copy Markdown
Collaborator

No description provided.

small-cat and others added 14 commits June 24, 2026 14:37
…lagos-ai#399)

* [FEAT](tle): WIP - add tle features

* [FEAT]: WIP - refactor tle
* move tle.ascend to tle.dsa.ascend
* move tle_ir to third_party/tle/dsa
* reimplement alloc/to_tensor/to_buffer reference to buffer_ir in third_party/ascend
* reimplement tle.dsa.ascend scope with address_space in ascend

* [FEAT](tle): support add, sub, mul, div, max, min in tle.dsaf

* [FIX](tle): fix to_tensor in test_add_vec_mix.py
* [FIX] remove memory_space_cast in dsa_to_tensor because the op removes the memory space attribute and result in compiling errors
* [TESTING] add collect_single method in ascend/testing.py to preserve the original benchmark statistics

* [FEAT](tle) add hint, subview, extract_slice, extrace_element in tle.dsa

* [REFACT](tle): decouple tle from TritonOps.td
* decouple TleOps from TritonOps and mov to third_party/tle/dsa/dialect
* implement the TleOp conversion in third_party/tle/dsa rather than in flir directly, flir just call the conversion in its pass

* [CHORE]: update doc in tle

* [FEAT]: decouple tle.dsa in backend/ascend/spec
* backend/ascend/spec/triton/compiler/code_generator.py still use tle.dsa in its visitor to visit python ast

* [FIX]: fix copyright declaration in tle

* [FIX](tle): fix extract and apply tle.hints when visit ast
* fix tle.dsa.hint for nested usage, see python/test/tle/test_tle_with_hints.py
* implement extract_tle in experimental/tle

* [FIX](tle): fix tle module importing in ascend/backend/spec/triton/compiler/code_generator.py and add sparse_flash_attn_tle.py

* [FIX]: fix copyright declaration in tle

* [FIX](tle): remove redundant code and fix code format

(cherry picked from commit 837fe3e)
Refactor the TLE DSA build layout so DSA is managed as part of the TLE subtree instead of a separate plugin-style entry point.

Keep DSA operations in the main `tle` dialect and avoid defining a second `TleDialect::initialize()`. DSA-specific op and attr registration is now routed through `TleDialect::dsaInitialize()`, with the implementation kept under the DSA IR directory to preserve module separation.

Rename the DSA TableGen files to the `TleDSA*` naming scheme and update generated include references accordingly. Adjust CMake dependencies and ordering so DSA TableGen/IR targets are created before the main TLE IR target, `TleIR` links `TleDSAIR`, and DSA conversion is configured after the main TLE IR target is available.

Also package DSA Python bindings into the main `TritonTLE` plugin path to keep a single TLE plugin and a single dialect registration path.
* [feat]: enable cvpipeline for sfa

* Apply code-format changes

---------

Co-authored-by: flagtree-bot <flagtree_ai@163.com>
Co-authored-by: zhzhcookie <zhengyang_pku@163.com>
(cherry picked from commit 35cc929)
…verhead (flagos-ai#593)

Co-authored-by: 谢昱 <xieyu@xcoresigma.com>
(cherry picked from commit b238fdb)
Expose PIPE, block synchronization, and sub-vector helpers through the TLE language namespace to support pipeline-enabled kernels.

(cherry picked from commit db7e7ff)
* [TLE] Add tle_swiglu kernel on Ascend NPU

Signed-off-by: wangziyi <wangziyi@xcoresigma.com>

* [TLE] Add tle_swiglu kernel on Ascend NPU

Signed-off-by: wangziyi <wangziyi@xcoresigma.com>

---------

Signed-off-by: wangziyi <wangziyi@xcoresigma.com>
(cherry picked from commit b522050)
* [TLE]: add add_rmsnorm_bias kernel in triton/tutorials/tle

* [TLE]: rename 06-add-rms-norm-bias.py to 07-add-rms-norm-bias.py

* [chore]: fix code format

* [ci][fix]: fix conflict in ascend3.2-build-and-test.yml

* [fix]: rename add_rms_norm_bias kernel

* [fix]: fix for getting vector core nums which requires torch_npu greater than 2.9.0

* [tle][tutorials]: fix for bare except forbidden

(cherry picked from commit fefa349)
…-ai#675)

Signed-off-by: Waynefeiran <wangziyi@xcoresigma.com>
(cherry picked from commit 951af63)
@CLAassistant

CLAassistant commented Jun 24, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 5 committers have signed the CLA.

✅ small-cat
✅ huanghaoXcore
✅ wangziy1
✅ yanminghui123
❌ Quamaly
You have signed the CLA already but the status is still pending? Let us recheck it.

@huanghaoXcore huanghaoXcore changed the title [feat] Add tle extension [TLE][feat] Add tle dsa extension Jun 25, 2026
@huanghaoXcore huanghaoXcore force-pushed the add_tle_extension branch 2 times, most recently from 9e7af70 to 57b9f02 Compare June 26, 2026 02:26
Enable TLE struct support for the Ascend build and add the missing include paths needed by the Ascend and TLE Python bindings.

Register the TLE dialect in the Ascend IR loader and adjust DSA codegen/semantics for hint scopes, integer constants, max operands, buffer shapes, and static slice metadata.

Update the AddRmsNormBias tutorial to avoid calling the DSA parallel marker at runtime and include pre-commit formatting fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants