Skip to content

Refine the packaging infrastructure.#207

Open
eirrgang wants to merge 7 commits intollnl:notexfrom
eirrgang:eirrgang-packaging
Open

Refine the packaging infrastructure.#207
eirrgang wants to merge 7 commits intollnl:notexfrom
eirrgang:eirrgang-packaging

Conversation

@eirrgang
Copy link
Copy Markdown
Collaborator

@eirrgang eirrgang commented Mar 5, 2026

Continues the CMake reorganization and migrates from setup.py to pyproject.toml to drive the build and packaging.

Includes some source code patches for more consistent behavior and some shim headers for better compatibility across CUDA, HIP, and CPU runtimes.

Use pyproject.toml and scikit-build-core to drive the CMake build.

Minor CMake modernization.

Add some CMake infrastructure to try to handle the
three target build types (cuda, hip, and cpu-only)
but the project infrastructure may not be set up
for non-cuda builds at this point.
- Fix default accelerator framework.
- Only define `__USE_NOTEX` for AMD
Normalize some math functions to improve compatibility and consistency.

Try to use the hipify wrappers in `torch.utils`, if available, else try
to call `hipify-clang` directly.

Prefer torch-based hipify into `/hipified_src`. Keep `hipify-clang` as
an explicitly experimental fallback, and stage fallback inputs under a
separate build-local `hipify_stage` tree.

Add `gpu_runtime.h` and `gpu_fft.h` compatibility shims with comments
explaining why LEAP still needs them even after source translation.
Update the translated source set and include ordering so generated
sources consistently see translated headers and copied support headers
before the original src tree.

Harden tools/run_hipify_clang.py by recording retry-aware manifest
files, removing stale outputs, retrying known -p/-o conflicts without
compile_commands.json context, accepting stdout-only output as a
compatibility fallback, and surfacing clearer diagnostics for likely
CUDA-arch propagation failures inside hipify-clang.

Document the supported build and wheel paths in the README, including
CPU, CUDA, and AMD usage, when visible GPUs are or are not required on
the build host, and the caveats around isolated builds and the
experimental hipify-clang fallback.
@eirrgang
Copy link
Copy Markdown
Collaborator Author

Commit 6de2200 is a pretty substantial change to restore automatic hipification that setup.py performed via torch.utils.cpp_extension.

Background

torch.utils has some fairly elaborate tooling to automatically detect AMD GPUs and hipify the cuda sources automatically, but this does not rely on very stable interfaces and is sensitive to the torch release, the ROCM version, and the CUDA version, as well as relying on the distutils style setuptools Command conventions, which aren't fully compatible with other modern Python packaging tools.

A complete migration off of torch.utils hipify wrappers is not easy because the behavior of hipify-clang is inconsistent across ROCM versions.

CMake-driven hipify

We're trying to call hipify-clang, but, at least in ROCM 7.2, we're having a hard time managing its output files correctly.

In the mean time, we can use the hipify in torch.utils, but

  • this requires a torch installation
  • we need to avoid build-isolation in order to use the torch installation, and
  • we still need to do some tweaking to account for different rocm and cuda versions

For best results, use the most recent ROCM available and the oldest supported CUDA available.

With ROCM 7.2 and CUDA 12.9, the following seems to work

python -m build . -Ccmake.define.LEAP_GPU=AMD -Ccmake.define.CMAKE_CXX_COMPILER=`which amdclang++` --no-isolation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant