feat: scope stable abi to backend#682
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Coverage report —
|
| Name | Stmts | Miss | Cover | Missing |
|---|---|---|---|---|
| src/kernels/__init__.py | 10 | 0 | 100% | |
| src/kernels/_system.py | 6 | 1 | 83% | 10 |
| src/kernels/_versions.py | 63 | 7 | 89% | 46, 49, 52-53, 56-57, 100 |
| src/kernels/backends.py | 194 | 55 | 72% | 40, 44, 48-51, 68, 90, 108, 117, 121, 125-127, 148, 170, 181, 188-191, 201, 205-225, 233, 256-276 |
| src/kernels/compat.py | 8 | 1 | 88% | 5 |
| src/kernels/deps.py | 54 | 4 | 93% | 58-59, 95, 98 |
| src/kernels/layer/__init__.py | 6 | 0 | 100% | |
| src/kernels/layer/_interval_tree.py | 103 | 4 | 96% | 23, 52, 147, 150 |
| src/kernels/layer/device.py | 48 | 14 | 71% | 42, 47-49, 91, 96-98, 101, 149, 152, 155-157 |
| src/kernels/layer/func.py | 81 | 7 | 91% | 81, 111, 183, 301, 307, 320, 338 |
| src/kernels/layer/globals.py | 5 | 0 | 100% | |
| src/kernels/layer/kernelize.py | 74 | 8 | 89% | 255, 281, 289-290, 296, 300, 316-318 |
| src/kernels/layer/layer.py | 210 | 16 | 92% | 167, 210, 216, 229, 337, 417-418, 430, 439, 447, 458, 487, 491, 504, 557, 587 |
| src/kernels/layer/mode.py | 14 | 0 | 100% | |
| src/kernels/layer/repos.py | 130 | 34 | 74% | 27, 33, 36-41, 61-62, 68, 71-74, 88, 92, 101-102, 108, 111-114, 121-122, 128, 131-134, 141-142, 148, 151-154, 235 |
| src/kernels/lockfile.py | 71 | 46 | 35% | 37-104, 108-131 |
| src/kernels/status.py | 49 | 2 | 96% | 23, 81 |
| src/kernels/utils.py | 301 | 55 | 82% | 65, 77-81, 87-88, 218, 222, 225, 287, 295, 334-335, 373, 404, 409, 444, 673, 676, 678, 684, 697-698, 719-731, 735-742, 750, 754-764, 768-775, 813, 817, 836, 838 |
| src/kernels/variants.py | 262 | 19 | 93% | 56, 87, 108, 138, 247-248, 289, 291, 371-378, 384-390, 421-427, 439-445, 534-536 |
| src/kernels/verify.py | 88 | 1 | 99% | 32 |
| TOTAL | 1777 | 274 | 85% |
Updated by the Test kernels workflow on commit c4a0e0d0688190f0e515b34e01e8936cecd7f861.
| /// Validate that a per-backend `[torch.stable-abi]` mapping only references | ||
| /// backends that are declared in `[general].backends`. | ||
| fn validate_stable_abi(build: &CurrentConfig) -> Result<()> { | ||
| let v4::Framework::Torch(torch) = &build.framework else { |
There was a problem hiding this comment.
I know this is still in draft, but vN should not spread across the code base. This validation should probably be in the v4 file itself in the deserialization of Build (since that holds both General and Framework).
Though I'm not 100% sure if this is worth validating. It's nice to be able to remove a backend from the list for quick testing without removing every other occurrence of that backend in build.toml.
Co-authored-by: Daniël de Kok <me@danieldk.eu>
0633fd1 to
defa006
Compare
| pub enum TorchAbi { | ||
| All(Version), | ||
| Mapping(BTreeMap<Backend, Version>), | ||
| } |
There was a problem hiding this comment.
Pasted from Slack to have it in the review:
I thought a bit more about the extended Stable ABI mapping. Let's do this:
• Make v5 of the Build format. In v4 it stays String, in v5 we make it the mapping.
• Conversion from v4 to v5 is simple: when stable-abi is set, enumerate over the backends and set the stable ABI for the available backends..
• If we make v5, we should add edition to general and make it mandatory to set edition = 5. In this way we can start to unambiguously parse different versions in the future..
• If we don't detect edition, ask the user to run kernel-builder update-build (or maybe auto-update from v4 to v5, not on disk, but just in the builder, so that it works without running update-build).
By doing this, we avoid a lot of conditionals that check whether stable-abi is a string or mapping/attrset.
What do you think?
There was a problem hiding this comment.
these are great recommendations
- added v5 in latest changes with edition field and expects a mapping for stable abi
- added logic to upgrade v4 to v5 in memory so existing stable-abi="string" should work without issues
- updated
update-builderto target updating to v5
these changes also include config changes to all of the examples so there are many more files changed in latest but most are one line additions to configs
|
this kernel community PR uses this branch to validate the new version/scoped stable abi builds |
danieldk
left a comment
There was a problem hiding this comment.
Looks great!
One question about a CMake change, plus a suggestion to hardcode the version in the v5 General.
| add_compile_definitions(CUDA_KERNEL) | ||
| elseif(GPU_LANG STREQUAL "HIP") | ||
| if(NOT HIP_FOUND) | ||
| if(NOT HIP_FOUND AND NOT PYTORCH_FOUND_HIP) |
There was a problem hiding this comment.
Why is this additional condition needed? Seems to make it less strict?
There was a problem hiding this comment.
I ran into a build issue with rocm which seems related to torch 2.13. It seems like they no longer expose HIP_FOUND and now expose PYTORCH_FOUND_HIP only
build issue:
https://github.com/huggingface/kernels-community/actions/runs/28620863071/job/84876068475
pr with changes to HIP_FOUND
pytorch/pytorch@d921fd0
HIP_FOUND is not longer found in the codebase when searching on main
I opt'ed to adding PYTORCH_FOUND_HIP instead of replacing in order to avoid any regressions with older torch versions, this way we will only throw if both HIP_FOUND and PYTORCH_FOUND_HIP are false
| let edition = EditionProbe::deserialize(value.clone()) | ||
| .map_err(|err| de::Error::custom(format!("invalid `general.edition`: {err}")))? | ||
| .general | ||
| .and_then(|general| general.edition); | ||
|
|
||
| match edition { | ||
| // Configs with editions | ||
| Some(5) => v5::Build::deserialize(value) | ||
| .map(BuildCompat::V5) | ||
| .map_err(de::Error::custom), | ||
|
|
| pub version: usize, | ||
|
|
||
| /// Build format edition. Must be `5` for this schema. | ||
| pub edition: usize, |
There was a problem hiding this comment.
Would be nice to hard-set this to 5. The serde author has a crate for this:
There was a problem hiding this comment.
ah nice improvement, I've added in the latest commits.
this is a work in progress PR that scopes torches stable abi to individual back ends rather than having it be a global attribute.