Perlmutter: GPU Docker Container #6389

ax3l · 2025-11-13T20:34:26Z

Finalize a proper, pre-build WarpX for Perlmutter GPUs with GPU-aware/GPUdirect MPI.

To Do

Builds
Finalize Entrypoint

Follow-Up

Finalize MPI tuning based on INC0245154 response/guidance
Ensure it runs on Perlmutter
Ensure GPU-aware MPI/GPUdirect works
- Ensure Slingshot is used optimally (something something Cray MPICH)
Build all WarpX dims
Docs

Finalize a proper, pre-build WarpX for Perlmutter GPUs with GPU-aware/GPUdirect MPI.

ax3l · 2025-11-16T22:13:05Z

Tools/machines/perlmutter-nersc/Containerfile

+      ./configure                                                              \
+        --disable-fortran                                                      \
+        --prefix=/opt/warpx                                                    \
+        --with-ch4-shmmods=posix,gpudirect                                     \


From our NERSC Ticket:

Rahulkumar Gayatri (rgayatri)

Hey Axel, Adam:
Just FYI - If the plan is to build mpich inside the container and then replace it at runtime with cray-mpich, make sure that the mpich inside the container is built WITHOUT cuda, since that interferes with cuda-aware-mpi of cray-mpich at runtime for some reason.

Regards,
Rahul.

As written above, slightly counter intuitively, if we do not compile with gpudirect it is possible for NERSC to automatically swap out our MPI libs as we start up the container (they copy in the cray MPI libs and resquash the image on startup):

Suggested change

--with-ch4-shmmods=posix,gpudirect \

ax3l · 2025-11-18T16:55:20Z

Tools/machines/perlmutter-nersc/Containerfile

+# WarpX Python bindings are installed in /opt/venv
+#
+# On Perlmutter, run WarpX like this:
+#   podman run --rm --gpu --mpi --nccl -it warpx-perlmutter warpx.2d inputs_2d


Urgh, the podman-hpc --mpi plugin is broken on PM, fails in startup.

The moment I put --mpi or --cuda-mpi in, it starts to wildly connect to external registries and fails

https://docs.nersc.gov/development/containers/podman-hpc/overview/#using-podman-hpc-as-a-container-runtime

Everyone is at SC25 this week, so no progress on my NERSC bug report in INC0245154 yet.

I posted a reproducer therein: https://github.com/ax3l/warpx/tree/doc-proper-pm-container-mpihellogpu/Tools/machines/perlmutter-nersc

Prevents swapping for Cray Libs with MPI Plugin, per NERSC support

ax3l · 2025-11-25T16:58:55Z

Tools/machines/perlmutter-nersc/Containerfile

+ARG mpich_prefix=mpich-$mpich
+
+RUN \
+    curl -Lo $mpich_prefix.tar.gz https://www.mpich.org/static/downloads/$mpich/$mpich_prefix.tar.gz && \


If we want to run with GPU-unaware MPI to work-around the Podman-HPC MPI plugin issue, we would need to patch in this to avoid an assert in startup of AMReX, which uses this function: pmodels/mpich#5720
https://github.com/AMReX-Codes/amrex/blob/25.11/Src/Base/AMReX_ParallelDescriptor.cpp#L1547-L1549

AMReX-Codes/amrex#4820

ax3l · 2025-11-25T19:29:17Z

@RemiLehe I would merge this update, as it is a good basis for sharing with other power-developers.
I intentionally add no docs outside of inline comments.

Follow-ups will A) generalize this once NERSC is unbroken for MPI B) make a dead-end PR that patches this to build a one-off no-MPI variant for our LDRD.

ax3l added backend: cuda Specific to CUDA execution (GPUs) component: documentation Docs, readme and manual install machine / system Machine or system-specific issue labels Nov 13, 2025

ax3l force-pushed the doc-proper-pm-container branch 6 times, most recently from 5f90655 to b7f0d77 Compare November 14, 2025 22:55

Perlmutter: GPU Docker Container

c7f3cd5

Finalize a proper, pre-build WarpX for Perlmutter GPUs with GPU-aware/GPUdirect MPI.

ax3l force-pushed the doc-proper-pm-container branch from b7f0d77 to c7f3cd5 Compare November 14, 2025 22:55

ax3l commented Nov 16, 2025

View reviewed changes

ax3l commented Nov 18, 2025

View reviewed changes

Container: No GPU Aware

1bf7d87

Prevents swapping for Cray Libs with MPI Plugin, per NERSC support

ax3l commented Nov 25, 2025

View reviewed changes

ax3l changed the title ~~[WIP] Perlmutter: GPU Docker Container~~ Perlmutter: GPU Docker Container Nov 25, 2025

ax3l marked this pull request as ready for review November 25, 2025 19:28

ax3l requested review from EZoni and RemiLehe November 25, 2025 19:28

ax3l requested a review from kngott November 25, 2025 19:54

ax3l mentioned this pull request Nov 25, 2025

MPICH <4: Broken MPIX_Query_cuda_support AMReX-Codes/amrex#4820

Closed

RemiLehe approved these changes Nov 25, 2025

View reviewed changes

ax3l merged commit 57e114e into BLAST-WarpX:development Nov 25, 2025
82 checks passed

ax3l deleted the doc-proper-pm-container branch November 25, 2025 21:47

EZoni mentioned this pull request Dec 3, 2025

Doc: WarpX no-MPI Perlmutter Container #6422

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perlmutter: GPU Docker Container #6389

Perlmutter: GPU Docker Container #6389

Uh oh!

ax3l commented Nov 13, 2025 •

edited

Loading

Uh oh!

ax3l Nov 16, 2025

Uh oh!

ax3l Nov 18, 2025 •

edited

Loading

Uh oh!

ax3l Nov 18, 2025 •

edited

Loading

Uh oh!

ax3l Nov 21, 2025 •

edited

Loading

Uh oh!

ax3l Nov 25, 2025

Uh oh!

ax3l Nov 25, 2025

Uh oh!

ax3l commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Perlmutter: GPU Docker Container #6389

Perlmutter: GPU Docker Container #6389

Uh oh!

Conversation

ax3l commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To Do

Follow-Up

Uh oh!

ax3l Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

ax3l Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ax3l Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ax3l Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ax3l Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

ax3l Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

ax3l commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ax3l commented Nov 13, 2025 •

edited

Loading

ax3l Nov 18, 2025 •

edited

Loading

ax3l Nov 18, 2025 •

edited

Loading

ax3l Nov 21, 2025 •

edited

Loading