Skip to content

Conversation

@aliceb-nv
Copy link
Contributor

@aliceb-nv aliceb-nv commented Nov 25, 2025

Description

NVCC has support for intra-TU and target arch parallelization opportunities, which are enabled with --split-compile and --threads. This PR adds a build flag to enable them.
Additionally, support for Make >=4.4 jobserver has been added, in order to prevent oversubscription. NVCC can act as a jobserver client with the --jobserver flag, support for which has been added in this PR.

On a 256 logical cores machine, this cut down build times by about 1.5x.

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

Summary by CodeRabbit

  • Chores

    • Added automatic parallel build support with a default that uses available CPU cores, enabling faster compilation.
    • Build flow now respects a no-install option while still supporting parallel job execution.
    • Enhanced parallel handling for CUDA compilations to better utilize jobserver-style parallelism.
  • Chores

    • Ensured build tooling requires make during recipe builds to support the new flow.

✏️ Tip: You can customize this high-level summary in your review settings.

@aliceb-nv aliceb-nv requested review from a team as code owners November 25, 2025 10:24
@aliceb-nv aliceb-nv changed the title Build script splitcompile Add support for --split-compile and --jobserver in build.sh Nov 25, 2025
@aliceb-nv aliceb-nv changed the base branch from main to release/25.12 November 25, 2025 10:24
@aliceb-nv aliceb-nv added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Nov 25, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 25, 2025

📝 Walkthrough

Walkthrough

Adds PARALLEL_LEVEL initialization and propagation through the build system, enables parallel jobserver flags for nvcc, introduces a cpp/Makefile that passes parallel jobs to cmake --build, and adds make to the libcuopt conda recipe build requirements.

Changes

Cohort / File(s) Summary
Build script updates
build.sh
Adds PARALLEL_LEVEL (defaulting to nproc) and a JFLAG derived from it; propagates PARALLEL_LEVEL into CMake invocations as -DPARALLEL_LEVEL; adjusts build flow to use parallel flags for manual make and cmake --build depending on PARALLEL_LEVEL and -n/no-install paths.
CMake CUDA flags
cpp/CMakeLists.txt
When PARALLEL_LEVEL is set and non-empty, appends nvcc jobserver flags (--threads=0 --split-compile=0 --jobserver) to CUOPT_CUDA_FLAGS to enable parallel nvcc compilation.
Makefile for cpp
cpp/Makefile
New Makefile exposing LIBCUOPT_BUILD_DIR, VERBOSE_FLAG, and PARALLEL_LEVEL; defines phony targets (all, ninja-build) and invokes cmake --build $(LIBCUOPT_BUILD_DIR) with conditional -j$(PARALLEL_LEVEL) when set.
Conda build requirements
conda/recipes/libcuopt/recipe.yaml
Adds make to recipe.cache.build.requirements for the libcuopt recipe.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review parsing and initialization of PARALLEL_LEVEL in build.sh, and ensure numeric/non-numeric handling and interactions with existing flag parsing.
  • Verify correct propagation to CMake (-DPARALLEL_LEVEL) and that cmake --build / manual make invocations receive appropriate -j semantics.
  • Check cpp/CMakeLists.txt conditional that adds nvcc jobserver flags is scoped correctly and safe for targets without CUDA.
  • Validate cpp/Makefile variable usage and conditional -j insertion syntax.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title partially captures the main change but misses the primary improvement of parallel build support via PARALLEL_LEVEL and jobserver integration across multiple build systems.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
Makefile (1)

10-10: Consider adding clean and test phony targets. Static analysis (checkmake) flags these as missing. While they may not be strictly necessary given that build.sh drives the overall orchestration, adding them as stubs (e.g., .PHONY: clean test with no-op rules or forwarding to build.sh targets) would align with Make conventions and suppress the linter warnings.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0802827 and 970ccc4.

📒 Files selected for processing (4)
  • Makefile (1 hunks)
  • build.sh (6 hunks)
  • conda/recipes/libcuopt/recipe.yaml (1 hunks)
  • cpp/CMakeLists.txt (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution

Applied to files:

  • cpp/CMakeLists.txt
🧬 Code graph analysis (1)
build.sh (1)
datasets/get_test_data.sh (1)
  • hasArg (77-79)
🪛 checkmake (0.2.2)
Makefile

[warning] 10-10: Missing required phony target "clean"

(minphony)


[warning] 10-10: Missing required phony target "test"

(minphony)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
🔇 Additional comments (9)
conda/recipes/libcuopt/recipe.yaml (1)

58-58: LGTM! Adding make as an explicit build dependency aligns with the new Makefile introduced in this PR that drives the parallel build flow.

cpp/CMakeLists.txt (1)

127-131: Clean integration of nvcc jobserver support. The conditional check on PARALLEL_LEVEL is correct, and appending the nvcc flags (--threads=0 --split-compile=0 --jobserver) to enable parallel compilation is well-placed in the global CUDA flags propagation. This correctly complements the build.sh and Makefile changes that introduce PARALLEL_LEVEL plumbing.

Makefile (1)

1-15: Makefile structure is solid. The use of conditional -j via $(if $(PARALLEL_LEVEL),-j$(PARALLEL_LEVEL),) is idiomatic and correct. Variable defaults allow flexibility while integrating cleanly with the build.sh orchestration layer.

build.sh (6)

18-18: LGTM! The -j flag is correctly added to VALIDARGS and clearly documented in the help text with an explanation that it supports unlimited jobs when no value is provided, or explicit thread count with -j<N>.

Also applies to: 37-37


177-195: Solid argument parser for -j flag. The regex \-j[0-9]* correctly matches both -j (unlimited) and -j8 (explicit count), the head -1 ensures only the first match is processed, and fallback to nproc when no number is provided is sensible. Removal of the matched argument from ARGS before validation prevents spurious "invalid option" errors. Well done.


383-387: LGTM! The non--n build path correctly handles the conditional logic: when PARALLEL_LEVEL is set, cmake --build is invoked with -j${PARALLEL_LEVEL}; otherwise, it runs without a parallel flag. This avoids the issue seen in the -n path and matches the intended behavior.


375-375: LGTM! Passing PARALLEL_LEVEL to CMake via -DPARALLEL_LEVEL="${PARALLEL_LEVEL}" correctly propagates the user's parallelism preference to the CMakeLists.txt conditional that gates the nvcc jobserver flags (lines 127–131 of cpp/CMakeLists.txt).


89-89: LGTM! Initializing PARALLEL_LEVEL="" as an empty string ensures safe default behavior when -j is not provided; the parallelArgs function only populates this when the flag is explicitly passed.


208-208: LGTM! Calling parallelArgs before the validation loop is correct; it removes the -j argument from ARGS so that subsequent validation does not reject it as an unrecognized option.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cpp/Makefile (1)

1-15: Makefile structure is sound, but consider adding clean and test phony targets.

The conditional -j$(PARALLEL_LEVEL) logic on line 15 correctly expands only when PARALLEL_LEVEL is non-empty, avoiding the unlimited parallelism issue. The variable initialization using ?= allows proper override from the calling environment (build.sh).

Static analysis tools flag missing clean and test phony targets as Makefile conventions. While this thin wrapper doesn't strictly need them, adding delegating targets to cmake --build would improve consistency and discoverability.

Consider delegating clean and test targets to cmake:

+clean:
+	cmake --build $(LIBCUOPT_BUILD_DIR) --target clean
+
+test:
+	cmake --build $(LIBCUOPT_BUILD_DIR) --target test
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a111bf6 and a7ed7a7.

📒 Files selected for processing (2)
  • build.sh (6 hunks)
  • cpp/Makefile (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/Makefile
🧬 Code graph analysis (1)
build.sh (1)
datasets/get_test_data.sh (1)
  • hasArg (77-79)
🪛 checkmake (0.2.2)
cpp/Makefile

[warning] 10-10: Missing required phony target "clean"

(minphony)


[warning] 10-10: Missing required phony target "test"

(minphony)

🔇 Additional comments (4)
build.sh (4)

177-208: parallelArgs function logic and placement are correct.

The function properly extracts -j with an optional numeric argument, defaults to nproc, and removes the flag from ARGS before the validation loop. Placement at line 208 (after cache/cmake/logging processing) ensures it runs before the validArgs check so the removed flag doesn't cause validation to fail.


375-375: JFLAG parameter expansion correctly addresses the prior critical issue.

Line 379 uses "${PARALLEL_LEVEL:+-j${PARALLEL_LEVEL}}" which expands to -jN only when PARALLEL_LEVEL is non-empty, and to an empty string otherwise. This prevents the issue flagged in the prior review where an empty PARALLEL_LEVEL would produce bare -j (unlimited parallelism). Both the -n branch (line 382, make invocation) and the default branch (line 384, cmake --build) correctly use ${JFLAG}, ensuring consistent behavior.

Also applies to: 379-385


18-18: VALIDARGS and help text properly document the new -j flag.

The flag is correctly added to the valid arguments list and documented as enabling unlimited parallelism if no value is provided, or a specific thread count if -j<N> is given.

Also applies to: 37-37


359-378: No action required. The libmps_parser CMake configuration is correct.

Verification confirms that libmps_parser is a pure C++ library (CMakeLists.txt declares LANGUAGES CXX only) with no CUDA source files or compilation. PARALLEL_LEVEL controls NVCC parallelization and is appropriately applied only to libcuopt, which contains CUDA code. The separation is intentional and correct.

Likely an incorrect or invalid review comment.

@anandhkb anandhkb added this to the 26.02 milestone Nov 25, 2025
Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to do this without needing to directly invoke make or introduce a new Makefile in source control. I left some comments for your consideration, sorry in advance if I've misunderstood the goal here.

build.sh Outdated

function parallelArgs {
# Check for -j option
if [[ -n $(echo "$ARGS" | { grep -E "\-j" || true; } ) ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a hasArg function doing this:

cuopt/build.sh

Lines 97 to 99 in 7543358

function hasArg {
(( NUMARGS != 0 )) && (echo " ${ARGS} " | grep -q " $1 ")
}

Using that might make this a bit easier to read, would you consider it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually instead of doing all this command-line parsing and needing to care about spellings like -j2, -j 2, -j=2 etc. I think you might find it easier to just allow the shell variable PARALLEL_LEVEL to be overridden by an environment variable, like cudf does:

PARALLEL_LEVEL=${PARALLEL_LEVEL:=$(nproc)}

(cudf build.sh)

Would you consider that?

And in CI if you explicitly want to use GNU Make, you could override it here:

export CMAKE_GENERATOR=Ninja

With

CMAKE_GENERATOR="Unix Makefiles"

docs: https://cmake.org/cmake/help/latest/manual/cmake-generators.7.html#makefile-generators

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shell variable override approach sounds good, thanks for the suggestion! Will implement it.

if hasArg -n; then
cmake --build "${LIBCUOPT_BUILD_DIR}" ${VERBOSE_FLAG}
# Manual make invocation to start its jobserver
make ${JFLAG} -C "${REPODIR}/cpp" LIBCUOPT_BUILD_DIR="${LIBCUOPT_BUILD_DIR}" VERBOSE_FLAG="${VERBOSE_FLAG}" PARALLEL_LEVEL="${PARALLEL_LEVEL}" ninja-build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of needing to directly invoke make, you could explicitly select that in the cmake configure step above if it's important, like this:

cmake -G 'Unix Makefiles' \
    -DDEFINE_BENCHMARK="${DEFINE_BENCHMARK}" \
    # ... etc., etc.

That'd allow you to continue to just use cmake --build instead of invoking the preferred build tool yourself.

Would you consider that?

Copy link
Contributor Author

@aliceb-nv aliceb-nv Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use Ninja as the default generator; this step is only necessary in order to create a make jobserver that ninja can hook into (as a jobserver client). This workaround is (to the best of my knowledge) necessary to get nvcc's jobserver support to play nice with Ninja, which is a jobsever client itself

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

1 similar comment
@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@aliceb-nv aliceb-nv changed the base branch from release/25.12 to main December 17, 2025 16:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
build.sh (1)

88-88: Good use of default parameter expansion.

The initialization correctly allows environment override while providing a sensible default. This follows the pattern suggested in previous review feedback.

Consider adding validation to ensure PARALLEL_LEVEL is a positive integer, though in practice build tools will fail clearly if given invalid input:

 PARALLEL_LEVEL=${PARALLEL_LEVEL:=$(nproc)}
+if ! [[ "${PARALLEL_LEVEL}" =~ ^[0-9]+$ ]] || [ "${PARALLEL_LEVEL}" -eq 0 ]; then
+    echo "Error: PARALLEL_LEVEL must be a positive integer, got: ${PARALLEL_LEVEL}"
+    exit 1
+fi
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7ed7a7 and bce1494.

📒 Files selected for processing (3)
  • build.sh (2 hunks)
  • conda/recipes/libcuopt/recipe.yaml (1 hunks)
  • cpp/CMakeLists.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • cpp/CMakeLists.txt
  • conda/recipes/libcuopt/recipe.yaml
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
🧬 Code graph analysis (1)
build.sh (1)
datasets/get_test_data.sh (1)
  • hasArg (78-80)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
  • GitHub Check: checks / check-style
🔇 Additional comments (4)
build.sh (4)

353-353: LGTM!

Correctly propagates PARALLEL_LEVEL to the CMake configuration for use in enabling nvcc parallel compilation flags.


358-358: Correctly addresses the critical issue with JFLAG computation.

The parameter expansion ${PARALLEL_LEVEL:+-j${PARALLEL_LEVEL}} ensures JFLAG is either empty or "-jN", preventing the unsafe make -j (unlimited parallelism) that was flagged in previous reviews. This implementation matches the fix confirmed in commit a111bf6.


360-361: LGTM!

The manual make invocation correctly uses ${JFLAG} to control parallelism. The comment clearly explains the architectural rationale for using make to start a jobserver that ninja can hook into, which is necessary for nvcc's jobserver support. The additional PARALLEL_LEVEL variable passed to the Makefile is intentional for propagation to the ninja build.


363-363: LGTM!

Correct use of ${JFLAG} with cmake --build. The flag will be appropriately forwarded to the Ninja build system when PARALLEL_LEVEL is set.

@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

1 similar comment
@github-actions
Copy link

github-actions bot commented Jan 1, 2026

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants