-
Notifications
You must be signed in to change notification settings - Fork 0
Update tilelang #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* [Bugfix] Fix the jit_kernel issue * Update README.md --------- Co-authored-by: Lei Wang <[email protected]>
…ethod (#1359) This commit refines the Fragment creation process in the InferLayout method of ParallelOpNode. It removes the unnecessary forward_index array and utilizes default fragment indexing for consistency with other operations. Additionally, it binds the thread range to enhance comparability across different operations.
* [Analysis] Enhance NestedLoopChecker with tile op cases * fix tileop issue
* [misc] add a cpp side wrapper for gemm_sp_py * [misc] typing * [IR] bind GemmSPWarpPolicy * [chore] add wrapper code * [IR] fix GemmSPWarpPolicy * [codegen] apply ptxas instructions * [intrinsic] add typical (unused) mma layout * [template] add uint16 debug func * [intrinsic] add b matrix layout * [gemm_sp] enable fp16/bf16 on sm8x * [layout] refactor fp16/bf16 layout * [gemm_sp] enable int8 * [chore] update test case dtype * [gemm_sp] enable fp32 * [layout] refactor layouts * [intrinsic] enable ldmatrix for mat A * [layout] enable ldsm for matrix b * [layout] add ldmatrix for fp32 and fp8 * [chore] refine * [chore] refactor * [chore] add fp8 efactor * [chore] refactor * [chore] add remove negative zero util * [example] add a custom compress kernel * [chore] minor update * [test] refactor gemm_sp test * [refactor] make metadata layout func * [example] add option for using cutlass layout * [doc] add a gemm_sp doc * [doc] minor polish * [chore] remove unused * [bugfix] fix non replicate b case * [test] refactor * [chore] add a check * [bugfix] fix util bug * [wip] init a new test case for v2 * [chore] minor refactor * [chore] minor update * [bugfix] enable 16bit rs * [language] enable rs * [language] enable gemm_sp_sr * [language] enable gemm_sp_rr * [test] enable more tests * [tvm] update ffi binding * [chore] remove print * [chore] fix benchmark script * [lint] precommit lint * [chore] apply feedback * [test] use arch 8.0 * [chore] rollback ::ordered_metadata for backward compatibility * [bugfix] fix captialized * [example] keep gemm_sp on hopper * [test] fix no fp8 normal kernel * [test] reduce matmul size to satisfy accum error * [test] use cal_diff for assertion * [bugfix] expand float8 type * [lib] add make_int4 for short type * [language] add transpose E * [bugfix] fix wrong var * [format] format * [chore] refactor binding * [chore] fix wrong passing var
…#1360) * [Enhancement] Implement dynamic unroll factor in CUDA code generation This commit introduces support for specifying a dynamic unroll factor in the CUDA code generation. The `unroll_factor` map is added to store unroll factors for loop variables, allowing for more flexible and optimized loop unrolling. Additionally, the `unroll` function is integrated into the loop language, enabling users to define unroll factors directly in their code. This enhancement improves performance by allowing tailored unrolling strategies based on specific loop characteristics. * lint fix * [Bugfix] Correct initialization of non-zero counters in custom compress kernel and update TIR registration for gemm_sp_py to use the correct tile operation
updates: - [github.com/pre-commit/mirrors-clang-format: v21.1.2 → v21.1.6](pre-commit/mirrors-clang-format@v21.1.2...v21.1.6) - [github.com/astral-sh/ruff-pre-commit: v0.14.3 → v0.14.7](astral-sh/ruff-pre-commit@v0.14.3...v0.14.7) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…implify cached library path handling in sparse.py (#1365)
* [Refactor] Update condition for benchmarking in example_gemv.py and simplify cached library path handling in sparse.py * [Enhancement] Extend support for float8 data types in GEMM operations - Updated GEMM operations to recognize additional float8 data types: `float8_e4m3fn` and `float8_e5m2fnuz`. - Refactored condition checks in `checkWgmma` methods to simplify float8 type handling. - Adjusted test cases to ensure compatibility with the new float8 types in tile language examples. * lint fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the tilelang library with significant enhancements to sparse tensor operations, loop control features, and infrastructure improvements. Key changes include adding support for sparse GEMM operations (gemm_sp_v2), introducing unroll loop functionality, renaming tile operation namespaces for consistency, and improving cache control mechanisms.
- Added comprehensive sparse tensor core support with gemm_sp_v2 API and custom compression utilities
- Introduced T.unroll loop functionality with step and unroll_factor parameters
- Renamed tile operation namespace from "tl." to "tl.tileop." for better organization
- Enhanced cache control with separate environment variables for global and autotuning cache management
Reviewed changes
Copilot reviewed 70 out of 71 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tilelang/utils/tensor.py | Added float8 dtype detection and negative zero removal utilities |
| tilelang/utils/sparse.py | Enhanced compression support and added randint_semi_sparse generator |
| tilelang/tileop/gemm_sp/*.py | New sparse GEMM implementation with MMA support |
| tilelang/language/loop.py | Added unroll loop with step and factor support |
| tilelang/language/experimental/gemm_sp.py | Added gemm_sp_v2 API for sparse matrix multiplication |
| tilelang/layout/gemm_sp.py | Refactored metadata layout functions with clearer naming |
| tilelang/env.py | Enhanced cache control with separate disable flags |
| src/op/*.cc | Updated tile operation registration to use "tl.tileop" namespace |
| src/target/codegen_cuda.cc | Added pragma unroll factor code generation support |
| testing/python/**/*.py | Added comprehensive test coverage for new features |
Comments suppressed due to low confidence (3)
tilelang/language/loop.py:1
- The error message contradicts the check condition. If unroll_factor is specified, pragma_unroll_explicit should be False (implicit unroll with factor), but the check raises an error when it's True. The message states "must be True" but should state "must be False".
tilelang/layout/gemm_sp.py:1 - Variable name inconsistency: the code uses lowercase
block_kbut the function parameter is uppercaseBlockK. This will cause a NameError at runtime.
tilelang/layout/gemm_sp.py:1 - The function signature indicates it should return
int, but line 132 returns a tuple(offset // k, offset % k). The return type should betuple[int, int]to match the actual implementation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.