-
Notifications
You must be signed in to change notification settings - Fork 104
Add support for --split-compile and --jobserver in build.sh #626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds PARALLEL_LEVEL initialization and propagation through the build system, enables parallel jobserver flags for nvcc, introduces a cpp/Makefile that passes parallel jobs to cmake --build, and adds make to the libcuopt conda recipe build requirements. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
Makefile (1)
10-10: Consider addingcleanandtestphony targets. Static analysis (checkmake) flags these as missing. While they may not be strictly necessary given thatbuild.shdrives the overall orchestration, adding them as stubs (e.g.,.PHONY: clean testwith no-op rules or forwarding to build.sh targets) would align with Make conventions and suppress the linter warnings.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
Makefile(1 hunks)build.sh(6 hunks)conda/recipes/libcuopt/recipe.yaml(1 hunks)cpp/CMakeLists.txt(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Applied to files:
cpp/CMakeLists.txt
🧬 Code graph analysis (1)
build.sh (1)
datasets/get_test_data.sh (1)
hasArg(77-79)
🪛 checkmake (0.2.2)
Makefile
[warning] 10-10: Missing required phony target "clean"
(minphony)
[warning] 10-10: Missing required phony target "test"
(minphony)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
🔇 Additional comments (9)
conda/recipes/libcuopt/recipe.yaml (1)
58-58: LGTM! Addingmakeas an explicit build dependency aligns with the new Makefile introduced in this PR that drives the parallel build flow.cpp/CMakeLists.txt (1)
127-131: Clean integration of nvcc jobserver support. The conditional check onPARALLEL_LEVELis correct, and appending the nvcc flags (--threads=0 --split-compile=0 --jobserver) to enable parallel compilation is well-placed in the global CUDA flags propagation. This correctly complements the build.sh and Makefile changes that introducePARALLEL_LEVELplumbing.Makefile (1)
1-15: Makefile structure is solid. The use of conditional-jvia$(if $(PARALLEL_LEVEL),-j$(PARALLEL_LEVEL),)is idiomatic and correct. Variable defaults allow flexibility while integrating cleanly with thebuild.shorchestration layer.build.sh (6)
18-18: LGTM! The-jflag is correctly added toVALIDARGSand clearly documented in the help text with an explanation that it supports unlimited jobs when no value is provided, or explicit thread count with-j<N>.Also applies to: 37-37
177-195: Solid argument parser for-jflag. The regex\-j[0-9]*correctly matches both-j(unlimited) and-j8(explicit count), thehead -1ensures only the first match is processed, and fallback tonprocwhen no number is provided is sensible. Removal of the matched argument fromARGSbefore validation prevents spurious "invalid option" errors. Well done.
383-387: LGTM! The non--nbuild path correctly handles the conditional logic: whenPARALLEL_LEVELis set,cmake --buildis invoked with-j${PARALLEL_LEVEL}; otherwise, it runs without a parallel flag. This avoids the issue seen in the-npath and matches the intended behavior.
375-375: LGTM! PassingPARALLEL_LEVELto CMake via-DPARALLEL_LEVEL="${PARALLEL_LEVEL}"correctly propagates the user's parallelism preference to the CMakeLists.txt conditional that gates the nvcc jobserver flags (lines 127–131 of cpp/CMakeLists.txt).
89-89: LGTM! InitializingPARALLEL_LEVEL=""as an empty string ensures safe default behavior when-jis not provided; theparallelArgsfunction only populates this when the flag is explicitly passed.
208-208: LGTM! CallingparallelArgsbefore the validation loop is correct; it removes the-jargument fromARGSso that subsequent validation does not reject it as an unrecognized option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cpp/Makefile (1)
1-15: Makefile structure is sound, but consider addingcleanandtestphony targets.The conditional
-j$(PARALLEL_LEVEL)logic on line 15 correctly expands only whenPARALLEL_LEVELis non-empty, avoiding the unlimited parallelism issue. The variable initialization using?=allows proper override from the calling environment (build.sh).Static analysis tools flag missing
cleanandtestphony targets as Makefile conventions. While this thin wrapper doesn't strictly need them, adding delegating targets tocmake --buildwould improve consistency and discoverability.Consider delegating
cleanandtesttargets to cmake:+clean: + cmake --build $(LIBCUOPT_BUILD_DIR) --target clean + +test: + cmake --build $(LIBCUOPT_BUILD_DIR) --target test
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
build.sh(6 hunks)cpp/Makefile(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Applied to files:
cpp/Makefile
🧬 Code graph analysis (1)
build.sh (1)
datasets/get_test_data.sh (1)
hasArg(77-79)
🪛 checkmake (0.2.2)
cpp/Makefile
[warning] 10-10: Missing required phony target "clean"
(minphony)
[warning] 10-10: Missing required phony target "test"
(minphony)
🔇 Additional comments (4)
build.sh (4)
177-208:parallelArgsfunction logic and placement are correct.The function properly extracts
-jwith an optional numeric argument, defaults tonproc, and removes the flag fromARGSbefore the validation loop. Placement at line 208 (after cache/cmake/logging processing) ensures it runs before thevalidArgscheck so the removed flag doesn't cause validation to fail.
375-375: JFLAG parameter expansion correctly addresses the prior critical issue.Line 379 uses
"${PARALLEL_LEVEL:+-j${PARALLEL_LEVEL}}"which expands to-jNonly whenPARALLEL_LEVELis non-empty, and to an empty string otherwise. This prevents the issue flagged in the prior review where an emptyPARALLEL_LEVELwould produce bare-j(unlimited parallelism). Both the-nbranch (line 382, make invocation) and the default branch (line 384, cmake --build) correctly use${JFLAG}, ensuring consistent behavior.Also applies to: 379-385
18-18: VALIDARGS and help text properly document the new-jflag.The flag is correctly added to the valid arguments list and documented as enabling unlimited parallelism if no value is provided, or a specific thread count if
-j<N>is given.Also applies to: 37-37
359-378: No action required. The libmps_parser CMake configuration is correct.Verification confirms that libmps_parser is a pure C++ library (CMakeLists.txt declares
LANGUAGES CXXonly) with no CUDA source files or compilation. PARALLEL_LEVEL controls NVCC parallelization and is appropriately applied only to libcuopt, which contains CUDA code. The separation is intentional and correct.Likely an incorrect or invalid review comment.
jameslamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be possible to do this without needing to directly invoke make or introduce a new Makefile in source control. I left some comments for your consideration, sorry in advance if I've misunderstood the goal here.
build.sh
Outdated
|
|
||
| function parallelArgs { | ||
| # Check for -j option | ||
| if [[ -n $(echo "$ARGS" | { grep -E "\-j" || true; } ) ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a hasArg function doing this:
Lines 97 to 99 in 7543358
| function hasArg { | |
| (( NUMARGS != 0 )) && (echo " ${ARGS} " | grep -q " $1 ") | |
| } |
Using that might make this a bit easier to read, would you consider it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And actually instead of doing all this command-line parsing and needing to care about spellings like -j2, -j 2, -j=2 etc. I think you might find it easier to just allow the shell variable PARALLEL_LEVEL to be overridden by an environment variable, like cudf does:
PARALLEL_LEVEL=${PARALLEL_LEVEL:=$(nproc)}Would you consider that?
And in CI if you explicitly want to use GNU Make, you could override it here:
Line 13 in 7543358
| export CMAKE_GENERATOR=Ninja |
With
CMAKE_GENERATOR="Unix Makefiles"docs: https://cmake.org/cmake/help/latest/manual/cmake-generators.7.html#makefile-generators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shell variable override approach sounds good, thanks for the suggestion! Will implement it.
| if hasArg -n; then | ||
| cmake --build "${LIBCUOPT_BUILD_DIR}" ${VERBOSE_FLAG} | ||
| # Manual make invocation to start its jobserver | ||
| make ${JFLAG} -C "${REPODIR}/cpp" LIBCUOPT_BUILD_DIR="${LIBCUOPT_BUILD_DIR}" VERBOSE_FLAG="${VERBOSE_FLAG}" PARALLEL_LEVEL="${PARALLEL_LEVEL}" ninja-build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of needing to directly invoke make, you could explicitly select that in the cmake configure step above if it's important, like this:
cmake -G 'Unix Makefiles' \
-DDEFINE_BENCHMARK="${DEFINE_BENCHMARK}" \
# ... etc., etc.That'd allow you to continue to just use cmake --build instead of invoking the preferred build tool yourself.
Would you consider that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use Ninja as the default generator; this step is only necessary in order to create a make jobserver that ninja can hook into (as a jobserver client). This workaround is (to the best of my knowledge) necessary to get nvcc's jobserver support to play nice with Ninja, which is a jobsever client itself
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
1 similar comment
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
build.sh (1)
88-88: Good use of default parameter expansion.The initialization correctly allows environment override while providing a sensible default. This follows the pattern suggested in previous review feedback.
Consider adding validation to ensure PARALLEL_LEVEL is a positive integer, though in practice build tools will fail clearly if given invalid input:
PARALLEL_LEVEL=${PARALLEL_LEVEL:=$(nproc)} +if ! [[ "${PARALLEL_LEVEL}" =~ ^[0-9]+$ ]] || [ "${PARALLEL_LEVEL}" -eq 0 ]; then + echo "Error: PARALLEL_LEVEL must be a positive integer, got: ${PARALLEL_LEVEL}" + exit 1 +fi
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
build.sh(2 hunks)conda/recipes/libcuopt/recipe.yaml(1 hunks)cpp/CMakeLists.txt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- cpp/CMakeLists.txt
- conda/recipes/libcuopt/recipe.yaml
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
🧬 Code graph analysis (1)
build.sh (1)
datasets/get_test_data.sh (1)
hasArg(78-80)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
- GitHub Check: checks / check-style
🔇 Additional comments (4)
build.sh (4)
353-353: LGTM!Correctly propagates PARALLEL_LEVEL to the CMake configuration for use in enabling nvcc parallel compilation flags.
358-358: Correctly addresses the critical issue with JFLAG computation.The parameter expansion
${PARALLEL_LEVEL:+-j${PARALLEL_LEVEL}}ensures JFLAG is either empty or "-jN", preventing the unsafemake -j(unlimited parallelism) that was flagged in previous reviews. This implementation matches the fix confirmed in commit a111bf6.
360-361: LGTM!The manual make invocation correctly uses
${JFLAG}to control parallelism. The comment clearly explains the architectural rationale for using make to start a jobserver that ninja can hook into, which is necessary for nvcc's jobserver support. The additionalPARALLEL_LEVELvariable passed to the Makefile is intentional for propagation to the ninja build.
363-363: LGTM!Correct use of
${JFLAG}withcmake --build. The flag will be appropriately forwarded to the Ninja build system when PARALLEL_LEVEL is set.
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
1 similar comment
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
Description
NVCC has support for intra-TU and target arch parallelization opportunities, which are enabled with --split-compile and --threads. This PR adds a build flag to enable them.
Additionally, support for Make >=4.4 jobserver has been added, in order to prevent oversubscription. NVCC can act as a jobserver client with the --jobserver flag, support for which has been added in this PR.
On a 256 logical cores machine, this cut down build times by about 1.5x.
Issue
Checklist
Summary by CodeRabbit
Chores
Chores
✏️ Tip: You can customize this high-level summary in your review settings.