-
Notifications
You must be signed in to change notification settings - Fork 103
Add clang host build and ThreadSanitizer support, race condition fixes #603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@nguidotti I've added mutex locks a bit haphazardly for now. You might have better guidance as to the best way to ensure race-free access :) |
nguidotti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look goods to me! Thanks for the hard work, Alice!
rgsl888prabhu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion, rest looks good.
| @@ -0,0 +1,6 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move this utils folder
cpp/CMakeLists.txt
Outdated
|
|
||
| # To use ThreadSanitizer: | ||
| # 1. Build with clang and the -tsan flag | ||
| # 2. Run the binary with env var set: OMP_TOOL_LIBRARIES=/usr/lib/llvm-17/lib/libarcher.so ARCHER_OPTIONS='verbose=1' TSAN_OPTIONS='suppresions=ci/tsan_suppressions.txt:ignore_noninstrumented_modules=1:halt_on_error=1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are planning to use this file in CI, may be we can keep it in cpp folder and use it from local space ?
hlinsen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alice! lgtm
|
@aliceb-nv can you confirm that should go into the 25.12 release? If not, can we move the milestone to 26.02? Thank you. |
|
@chris-maes I'd like to get it in for 25.12 as it contains race condition fixes. It has already been engine-reviewed |
|
@chris-maes Ah nevermind, it seems my changes had already been integrated by Nicolas Lee in one of his merged PRs :) I will retarget this for 26.02 then. The remaining changes are non critical. |
20601c0 to
d52bcb0
Compare
📝 WalkthroughWalkthroughAdds ThreadSanitizer and MemorySanitizer build options and CMake handling; suppresses TSAN races for third-party libs; plus multiple internal code fixes: destructor and constructor adjustments, unused-variable annotations, lambda-capture reductions, concurrency fixes, helper wrappers for device transforms, and several assertion and test-initialization updates. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
cpp/src/linear_programming/utilities/cython_solve.cu (1)
222-224: Add explicit stream cleanup before function return.The user-created stream at line 223 is passed to
raft::handle_t, which does not take ownership of it. Per RAFT documentation, caller responsibility is to destroy user-created streams. The function returns at line 237 without callingcudaStreamDestroy(stream), creating a resource leak. In batch mode, this function is invoked multiple times in parallel, amplifying the leak.Add the following before the return statement at line 237:
RAFT_CUDA_TRY(cudaStreamSynchronize(stream)); RAFT_CUDA_TRY(cudaStreamDestroy(stream));cpp/src/mip/local_search/local_search.cu (1)
56-93: Lambda capture missingthiswill cause compilation failure; add synchronization for static variable raceTwo issues require immediate fixes:
Compile-time error: Missing
thisin lambda capture (line 82)
The lambda captures only[&population]but references the non-static membercontextinside its body. Unlike the corrected version instart_cpufj_lptopt_scratch_threadsat line 120 which uses[this, &population], this lambda is ill-formed and will not compile.Fix: Change line 82 from
[&population]to[this, &population]to match the pattern already used in the second method.Data race on
local_search_best_obj(lines 56, 85, 91)
The static file-scope variable is written without synchronization from multiple solver threads spawned at line 99 (cpu_fj.start_cpu_solver()). This is undefined behavior. Either guard writes withstd::mutexor usestd::atomic<double>with atomic compare-and-swap for minimum updates.cpp/tests/routing/level0/l0_objective_function_test.cu (1)
65-93: Objective test helper currently always returns 0 and doesn’t validate behavior
get_objectiveis defined as:auto get_objective = [](const auto& routing_solution, const double weight) { return (routing_solution.get_total_objective() - routing_solution.get_total_objective()) / weight; };This expression is identically 0 for all inputs, so
objective_values[i]is always 0 and the monotonicity checks in the loop never exercise the real objective behavior. This effectively disables the intended validation of objective scaling with weight.Consider computing the delta against a baseline solution or decomposing the total objective so that the contribution of
this->objective_typeis isolated, for example:- auto get_objective = [](const auto& routing_solution, const double weight) { - return (routing_solution.get_total_objective() - routing_solution.get_total_objective()) / weight; - }; + auto get_objective = [](double total_with_obj, double total_without_obj, double weight) { + return (total_with_obj - total_without_obj) / weight; + };and capture both totals appropriately in the test loop so the check reflects the actual objective term.
Based on learnings, tests here should validate numerical correctness of objectives, not just “runs without error”.
🧹 Nitpick comments (6)
cpp/tests/linear_programming/c_api_tests/c_api_test.c (1)
56-56: Fallback return for unrecognized termination status looks goodAdding a final
return "Unknown";removes the undefined behavior of falling off the end of a non-void function and makes logging robust to new/invalid status codes. Optionally, you could express this as adefault:branch in theswitchfor slightly clearer intent, but the current form is correct and safe.cpp/tests/routing/level0/l0_ges_test.cu (1)
58-58: Empty TearDown overrides are typically unnecessary.The
overridespecifier is good practice, but these emptyTearDown()methods serve no functional purpose. In Google Test, you only need to overrideTearDown()if cleanup is required. If these were added to satisfy compiler warnings or as placeholders for future cleanup logic, consider adding a comment to clarify the intent; otherwise, they can be safely removed.Also applies to: 166-166
cpp/tests/routing/level0/l0_routing_test.cu (1)
323-323: Empty TearDown override is unnecessary.This empty
TearDown()method serves no functional purpose. Note that other test fixtures in this file (e.g.,l0_routing_test_tat line 25 andlut_test_tat line 53) don't have explicitTearDown()methods, which is the typical pattern when no cleanup is needed. Consider removing this for consistency, or add cleanup logic if GPU state or device vectors need explicit deallocation.cpp/tests/distance_engine/waypoint_matrix_test.cpp (1)
47-47: Empty TearDown overrides are unnecessary.The
overridespecifiers are appropriate, but these emptyTearDown()methods provide no functionality. The device vectors (rmm::device_uvector) andwaypoint_matrixobjects use RAII and clean up automatically when the test fixture is destroyed. Unless these were added to address specific compiler warnings or as placeholders for future cleanup, they can be removed for clarity.Also applies to: 134-134, 195-195
cpp/src/linear_programming/utils.cuh (1)
65-79: Remove unusediterationcounter inconditional_major
iterationis never read; even with[[maybe_unused]], it and its increment are pure dead code in a small utility that may be called frequently. You can simplify:- uint64_t step = 10; - uint64_t threshold = 1000; - [[maybe_unused]] uint64_t iteration = 0; + uint64_t step = 10; + uint64_t threshold = 1000; @@ - while (total_pdlp_iterations >= threshold) { - ++iteration; + while (total_pdlp_iterations >= threshold) {This keeps the overflow checks and behavior identical while avoiding unnecessary work.
build.sh (1)
347-359: CMake propagation of BUILD_TSAN is correct; consider whether Python builds also need itPassing
-DBUILD_TSAN=${BUILD_TSAN}into the libcuopt CMake invocation aligns with the newBUILD_TSANlogic incpp/CMakeLists.txt, so C++ builds will be TSAN‑instrumented when requested.If you also want TSAN coverage for any CMake‑driven components built via
SKBUILD_CMAKE_ARGS(Python wheels), you may want to append-DBUILD_TSAN=${BUILD_TSAN}into those arguments as well; currently only the core C++ build sees this option.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
build.sh(6 hunks)cpp/CMakeLists.txt(2 hunks)cpp/include/cuopt/error.hpp(0 hunks)cpp/include/cuopt/linear_programming/utilities/internals.hpp(1 hunks)cpp/src/linear_programming/pdlp.cu(1 hunks)cpp/src/linear_programming/utilities/cython_solve.cu(1 hunks)cpp/src/linear_programming/utils.cuh(1 hunks)cpp/src/mip/diversity/lns/rins.cu(1 hunks)cpp/src/mip/local_search/local_search.cu(1 hunks)cpp/src/mip/presolve/gf2_presolve.hpp(1 hunks)cpp/src/mip/presolve/third_party_presolve.cpp(1 hunks)cpp/src/mip/utilities/cpu_worker_thread.cuh(2 hunks)cpp/src/routing/crossovers/ox_recombiner.cuh(2 hunks)cpp/tests/distance_engine/waypoint_matrix_test.cpp(3 hunks)cpp/tests/linear_programming/c_api_tests/c_api_test.c(1 hunks)cpp/tests/routing/level0/l0_ges_test.cu(2 hunks)cpp/tests/routing/level0/l0_objective_function_test.cu(1 hunks)cpp/tests/routing/level0/l0_routing_test.cu(1 hunks)cpp/tests/routing/level0/l0_vehicle_order_match.cu(1 hunks)cpp/tests/routing/level0/l0_vehicle_types_test.cu(1 hunks)cpp/tsan_suppressions.txt(1 hunks)
💤 Files with no reviewable changes (1)
- cpp/include/cuopt/error.hpp
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{cu,cuh}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Files:
cpp/tests/routing/level0/l0_ges_test.cucpp/src/mip/utilities/cpu_worker_thread.cuhcpp/src/linear_programming/utilities/cython_solve.cucpp/tests/routing/level0/l0_routing_test.cucpp/src/linear_programming/pdlp.cucpp/tests/routing/level0/l0_objective_function_test.cucpp/src/linear_programming/utils.cuhcpp/src/mip/local_search/local_search.cucpp/tests/routing/level0/l0_vehicle_types_test.cucpp/src/routing/crossovers/ox_recombiner.cuhcpp/src/mip/diversity/lns/rins.cucpp/tests/routing/level0/l0_vehicle_order_match.cu
**/*.cu
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Files:
cpp/tests/routing/level0/l0_ges_test.cucpp/src/linear_programming/utilities/cython_solve.cucpp/tests/routing/level0/l0_routing_test.cucpp/src/linear_programming/pdlp.cucpp/tests/routing/level0/l0_objective_function_test.cucpp/src/mip/local_search/local_search.cucpp/tests/routing/level0/l0_vehicle_types_test.cucpp/src/mip/diversity/lns/rins.cucpp/tests/routing/level0/l0_vehicle_order_match.cu
**/*.{cu,cuh,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...
Files:
cpp/tests/routing/level0/l0_ges_test.cucpp/src/mip/presolve/gf2_presolve.hppcpp/src/mip/utilities/cpu_worker_thread.cuhcpp/include/cuopt/linear_programming/utilities/internals.hppcpp/src/linear_programming/utilities/cython_solve.cucpp/tests/routing/level0/l0_routing_test.cucpp/tests/distance_engine/waypoint_matrix_test.cppcpp/src/linear_programming/pdlp.cucpp/tests/routing/level0/l0_objective_function_test.cucpp/src/linear_programming/utils.cuhcpp/src/mip/local_search/local_search.cucpp/tests/routing/level0/l0_vehicle_types_test.cucpp/src/mip/presolve/third_party_presolve.cppcpp/src/routing/crossovers/ox_recombiner.cuhcpp/src/mip/diversity/lns/rins.cucpp/tests/routing/level0/l0_vehicle_order_match.cu
**/*test*.{cpp,cu,py}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)
Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment
Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Add tests for problem transformations: verify correctness of original→transformed→postsolve mappings and index consistency across problem representations
Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling
Files:
cpp/tests/routing/level0/l0_ges_test.cucpp/tests/routing/level0/l0_routing_test.cucpp/tests/distance_engine/waypoint_matrix_test.cppcpp/tests/routing/level0/l0_objective_function_test.cucpp/tests/routing/level0/l0_vehicle_types_test.cu
**/*.{cu,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Files:
cpp/tests/routing/level0/l0_ges_test.cucpp/src/mip/presolve/gf2_presolve.hppcpp/include/cuopt/linear_programming/utilities/internals.hppcpp/src/linear_programming/utilities/cython_solve.cucpp/tests/routing/level0/l0_routing_test.cucpp/tests/distance_engine/waypoint_matrix_test.cppcpp/src/linear_programming/pdlp.cucpp/tests/routing/level0/l0_objective_function_test.cucpp/src/mip/local_search/local_search.cucpp/tests/routing/level0/l0_vehicle_types_test.cucpp/src/mip/presolve/third_party_presolve.cppcpp/src/mip/diversity/lns/rins.cucpp/tests/routing/level0/l0_vehicle_order_match.cu
**/*.{h,hpp,py}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Verify C API does not break ABI stability (no struct layout changes, field reordering); maintain backward compatibility in Python and server APIs with deprecation warnings
Files:
cpp/src/mip/presolve/gf2_presolve.hppcpp/include/cuopt/linear_programming/utilities/internals.hpp
**/*.{cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Files:
cpp/src/mip/presolve/gf2_presolve.hppcpp/include/cuopt/linear_programming/utilities/internals.hppcpp/tests/distance_engine/waypoint_matrix_test.cppcpp/src/mip/presolve/third_party_presolve.cpp
cpp/include/cuopt/**/*
⚙️ CodeRabbit configuration file
cpp/include/cuopt/**/*: For public header files (C++ API):
- Check if new public functions/classes have documentation comments (Doxygen format)
- Flag API changes that may need corresponding docs/ updates
- Verify parameter descriptions match actual types/behavior
- Suggest documenting thread-safety, GPU requirements, and numerical behavior
- For breaking changes, recommend updating docs and migration guides
Files:
cpp/include/cuopt/linear_programming/utilities/internals.hpp
🧠 Learnings (25)
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/tsan_suppressions.txtcpp/src/mip/utilities/cpu_worker_thread.cuh
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.cu : Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Applied to files:
cpp/tsan_suppressions.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*test*.{cpp,cu,py} : Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment
Applied to files:
cpp/tsan_suppressions.txtcpp/tests/routing/level0/l0_ges_test.cucpp/tests/routing/level0/l0_routing_test.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Applied to files:
cpp/tsan_suppressions.txtcpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Applied to files:
cpp/tsan_suppressions.txtcpp/src/mip/utilities/cpu_worker_thread.cuh
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Applied to files:
cpp/tsan_suppressions.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Applied to files:
cpp/tsan_suppressions.txtcpp/src/mip/presolve/gf2_presolve.hppcpp/src/linear_programming/pdlp.cucpp/src/mip/presolve/third_party_presolve.cpp
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Applied to files:
cpp/tsan_suppressions.txtcpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cpp,hpp,h} : Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Applied to files:
cpp/tsan_suppressions.txtcpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Applied to files:
cpp/tsan_suppressions.txtcpp/src/mip/presolve/gf2_presolve.hppcpp/include/cuopt/linear_programming/utilities/internals.hppcpp/src/linear_programming/pdlp.cucpp/tests/routing/level0/l0_objective_function_test.cucpp/src/linear_programming/utils.cuhcpp/src/mip/presolve/third_party_presolve.cppcpp/src/routing/crossovers/ox_recombiner.cuhcpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Applied to files:
cpp/tests/routing/level0/l0_ges_test.cucpp/tests/routing/level0/l0_routing_test.cucpp/tests/distance_engine/waypoint_matrix_test.cppcpp/tests/routing/level0/l0_objective_function_test.cucpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Applied to files:
cpp/src/mip/presolve/gf2_presolve.hppcpp/src/linear_programming/utilities/cython_solve.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/utils.cuhcpp/src/mip/local_search/local_search.cucpp/src/mip/presolve/third_party_presolve.cppcpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Applied to files:
cpp/src/mip/presolve/gf2_presolve.hppbuild.shcpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Applied to files:
cpp/src/mip/presolve/gf2_presolve.hppcpp/src/linear_programming/pdlp.cucpp/tests/routing/level0/l0_objective_function_test.cucpp/src/linear_programming/utils.cuhcpp/src/mip/presolve/third_party_presolve.cppcpp/src/mip/diversity/lns/rins.cucpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/src/mip/presolve/gf2_presolve.hppcpp/src/mip/utilities/cpu_worker_thread.cuhcpp/include/cuopt/linear_programming/utilities/internals.hppcpp/src/linear_programming/utilities/cython_solve.cucpp/src/linear_programming/pdlp.cucpp/src/linear_programming/utils.cuhcpp/src/mip/local_search/local_search.cucpp/src/mip/presolve/third_party_presolve.cpp
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.
Applied to files:
cpp/src/mip/presolve/gf2_presolve.hppcpp/src/mip/presolve/third_party_presolve.cpp
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
Applied to files:
cpp/src/mip/presolve/gf2_presolve.hpp
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Applied to files:
cpp/src/mip/presolve/gf2_presolve.hppcpp/src/linear_programming/utils.cuh
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*test*.{cpp,cu,py} : Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling
Applied to files:
cpp/tests/distance_engine/waypoint_matrix_test.cppcpp/tests/routing/level0/l0_objective_function_test.cucpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Applied to files:
cpp/src/linear_programming/pdlp.cucpp/tests/routing/level0/l0_objective_function_test.cucpp/src/linear_programming/utils.cuhcpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Applied to files:
cpp/src/linear_programming/pdlp.cucpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*test*.{cpp,cu,py} : Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)
Applied to files:
cpp/tests/routing/level0/l0_objective_function_test.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh} : Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Applied to files:
cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Applied to files:
cpp/CMakeLists.txt
🧬 Code graph analysis (4)
cpp/src/linear_programming/utilities/cython_solve.cu (1)
cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp (1)
call_solve(95-98)
build.sh (1)
datasets/get_test_data.sh (1)
hasArg(77-79)
cpp/src/linear_programming/pdlp.cu (1)
cpp/src/mip/diversity/recombiners/recombiner_stats.hpp (2)
maybe_unused(53-64)maybe_unused(53-53)
cpp/src/mip/diversity/lns/rins.cu (1)
cpp/src/mip/solver.cu (6)
solution(64-68)solution(64-64)solution(70-75)solution(70-72)solution(77-80)solution(77-77)
🪛 Clang (14.0.6)
cpp/src/mip/presolve/gf2_presolve.hpp
[error] 14-14: 'papilo/Config.hpp' file not found
(clang-diagnostic-error)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
🔇 Additional comments (18)
cpp/src/mip/presolve/gf2_presolve.hpp (2)
10-13: LGTM! Proper Clang compatibility guard.Correctly excludes GCC-specific diagnostic pragmas when compiling with Clang, enabling the Clang build path for ThreadSanitizer support.
18-20: LGTM! Properly paired diagnostic guard.The closing guard correctly matches the opening block, maintaining diagnostic state consistency across compilers.
cpp/src/mip/presolve/third_party_presolve.cpp (2)
17-20: LGTM! Consistent Clang compatibility.Applies the same GCC diagnostic guard pattern as in the header file, maintaining consistency across the codebase for Clang support.
23-25: LGTM! Properly closed diagnostic scope.The closing guard is correctly paired with the opening block, completing the Clang-aware diagnostic handling.
cpp/src/routing/crossovers/ox_recombiner.cuh (2)
339-391:[[maybe_unused]]on cost_n/cost_p/total_delta is a reasonable way to silence clang warnings for debug-only validationThese locals only participate in the cost-coherence validation (via
total_deltaandcuopt_assert), so marking them[[maybe_unused]]is a harmless, targeted fix for release builds where asserts may be compiled out; the main recombination logic and costs are unaffected. Please just confirm all supported host compilers are built with a C++ standard that recognizes[[maybe_unused]]so this remains portable across your toolchains. Based on learnings, algorithm correctness is unchanged.
533-544:[[maybe_unused]]onfoundis appropriate for assert-only usage
foundis only consumed bycuopt_assert, so in builds where that macro compiles away, this variable can trigger unused-variable warnings. The[[maybe_unused]]annotation cleanly addresses that without touching the transpose-graph correctness checks. As with the other attribute use, please confirm it is supported across your full compiler matrix.cpp/src/linear_programming/utilities/cython_solve.cu (1)
298-298: LGTM: Correctly removes redundantstd::move.The explicit
std::moveon the prvalue returned bycall_solveis unnecessary. The move constructor is invoked automatically when assigning the return value tolist[i]. Removing it improves code clarity and may enable better compiler optimizations.cpp/src/linear_programming/pdlp.cu (1)
1513-1518: Markingsing_itersas[[maybe_unused]]is appropriateThe variable is only used under debug builds; the attribute cleanly suppresses release-build warnings without changing behavior.
cpp/tests/routing/level0/l0_vehicle_order_match.cu (1)
25-32: Constructor default now sets zeronot_matching_constraints_fractionInitializing
base_test_twith(512, 0, 0)is consistent with later overridingnot_matching_constraints_fractionfromGetParam(), and should make the parameter sweep the sole source of this value. Please just confirm thatbase_test_t’s constructor does not rely on a non‑zero second argument for any other setup logic.cpp/tsan_suppressions.txt (1)
1-6: TSAN suppressions look reasonable; keep an eye on scopeThe suppressions for
tbbandPapilomatch the stated goal of ignoring races in external header‑only libraries. Just be aware thatrace:tbb/race:Papiloare broad and will also hide any genuine races in your own code that TSAN attributes to those symbols. If that becomes a concern, you can later narrow them (e.g., by function or file patterns).cpp/src/mip/utilities/cpu_worker_thread.cuh (1)
80-87: Condition‑variable synchronization for worker completion is correctSwitching
wait_for_cpu_solver()to acondition_variable::waitoncpu_thread_done || cpu_thread_terminate, and notifying after settingcpu_thread_done = true, gives proper blocking behavior without introducing races. Reads of these flags remain atomic for callers that still poll them directly. This is a solid improvement over any previous busy‑wait.Also applies to: 135-137
cpp/include/cuopt/linear_programming/utilities/internals.hpp (1)
63-67: Adding a virtual destructor tobase_solution_tis the right fix
base_solution_tis polymorphic (virtual bool is_mip() const), so providing a virtual destructor is required for safe deletion through base pointers. The newvirtual ~base_solution_t() = default;resolves that without changing semantics.cpp/src/mip/diversity/lns/rins.cu (1)
207-252: Review comment is based on incorrect assumption about callback threadingThe claim that "solution_callback is invoked from multiple B&B threads" concurrently is unsupported by the code. In
branch_and_bound.cpp, both invocation sites (lines 433 and 553) are protected bymutex_upper_, which serializes all callback executions—only one callback can run at a time. Additionally, the main thread inrins.cuonly accessesrins_solution_queueafterbranch_and_bound.solve()returns (which waits for all worker threads to complete), eliminating concurrent access to the queue.While user callbacks executing in worker threads warrant careful design consideration, the specific data race concern on
rins_solution_queue.push_back()does not apply here due to the serialization provided by the branch-and-bound solver's internal locking.Likely an incorrect or invalid review comment.
cpp/tests/routing/level0/l0_vehicle_types_test.cu (1)
24-28: Confirm impact of changing base_test_t parameters on routing test stabilityThe constructor now initializes
base_test_t<i_t, f_t>(512, 0, 0). Given that the second parameter previously was non‑zero, please confirm this does not inadvertently relax or over‑tighten numerical tolerances or other thresholds inbase_test_tthat these L0 routing tests rely on (especially across GPUs/architectures).Based on learnings, this may affect numerical tolerance behavior in tests.
cpp/tests/routing/level0/l0_objective_function_test.cu (1)
24-29: Verify changed base_test_t parameters don’t break objective-function test assumptionsThe constructor now calls
base_test_t<i_t, f_t>(512, 0, 0). Since this test exercises objective scaling vs.min_weight, please confirm this change doesn’t invalidate assumptions baked into theINSTANTIATE_TEST_SUITE_Pparameters (e.g., tolerances, default weights) or make the test numerically fragile on different hardware.Based on learnings, constructor-level tolerance changes can easily over‑constrain numerical tests.
cpp/CMakeLists.txt (2)
82-88: Sanitizer CXX flags look reasonable; confirm Clang/GCC coverageConditionally appending
-Wno-error=maybe-uninitializedonly whenCMAKE_CXX_COMPILER_IDis not Clang is a good way to avoid GCC‑specific warnings on Clang. The combined-fsanitize=address,undefined -fno-omit-frame-pointer -gCXX flags also look standard for ASan/UBSan builds.Please verify in CI that both GCC and Clang host builds pick up these flags as expected (and that Clang builds no longer warn/error on
-Wno-error=maybe-uninitialized).
132-137: CUDA host-compiler conditional flags nicely avoid GCC-specific warnings on ClangConditioning the CUDA host flags on
${CMAKE_CUDA_HOST_COMPILER}/${CMAKE_CXX_COMPILER_ID}so that only non‑Clang builds receive-Wno-error=non-template-friendis a good way to unblock Clang host builds while preserving stricter warnings for GCC. The separation into-Xcompiler=-Wallplus a bare-Wno-error=non-template-friendis also consistent with how nvcc forwards unknown-W*flags to the host compiler.Please sanity‑check a Clang + nvcc configuration to ensure no unexpected “unknown warning option” diagnostics remain.
build.sh (1)
18-19: TSAN CLI flag wiring and mutual exclusion with -fsanitize look solidThe
-tsanflag is correctly:
- Added to
VALIDARGSand documented in the help text as mutually exclusive with-fsanitize.- Backed by a
BUILD_TSANvariable with sensible default (0).- Parsed via
hasArg -tsanto setBUILD_TSAN=1.- Guarded by an explicit check that errors out if both
BUILD_SANITIZERandBUILD_TSANare enabled.This gives a clear and safe UX for selecting sanitizer mode at build time.
Also applies to: 35-37, 80-81, 232-237, 306-310
| # To use ThreadSanitizer: | ||
| # 1. Build with clang and the -tsan flag | ||
| # 2. Run the binary with env var set: OMP_TOOL_LIBRARIES=/usr/lib/llvm-17/lib/libarcher.so ARCHER_OPTIONS='verbose=1' TSAN_OPTIONS='suppresions=cpp/tsan_suppressions.txt:ignore_noninstrumented_modules=1:halt_on_error=1' | ||
| # Replace with local llvm install path. libarcher.so must be presetn | ||
| if(BUILD_TSAN) | ||
| message(STATUS "Building with ThreadSanitizer enabled") | ||
| list(APPEND CUOPT_CXX_FLAGS -fsanitize=thread -fno-omit-frame-pointer -g) | ||
| add_link_options(-fsanitize=thread) | ||
| endif(BUILD_TSAN) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ThreadSanitizer integration is fine, but TSAN_OPTIONS comment has typos
The BUILD_TSAN block cleanly wires in -fsanitize=thread -fno-omit-frame-pointer -g and matching link flags behind a simple CMake option, which is what build.sh expects. That’s a solid pattern.
In the usage comment, though, the TSAN_OPTIONS example contains typos:
TSAN_OPTIONS='suppresions=cpp/tsan_suppressions.txt:...'
# and: libarcher.so must be presetnFor discoverability, consider correcting this to:
# 2. Run with:
# TSAN_OPTIONS='suppressions=cpp/tsan_suppressions.txt:ignore_noninstrumented_modules=1:halt_on_error=1'
# OMP_TOOL_LIBRARIES=/usr/lib/llvm-17/lib/libarcher.so ARCHER_OPTIONS='verbose=1'This avoids users copy‑pasting an invalid TSAN_OPTIONS key.
🤖 Prompt for AI Agents
In cpp/CMakeLists.txt around lines 90 to 99, the usage comment for running with
ThreadSanitizer contains typos and a confusing ordering: change the TSAN_OPTIONS
key from "suppresions" to "suppressions" and correct "presetn" to "present";
also split/clarify the environment variables so the example shows
TSAN_OPTIONS='suppressions=cpp/tsan_suppressions.txt:ignore_noninstrumented_modules=1:halt_on_error=1'
and OMP_TOOL_LIBRARIES and ARCHER_OPTIONS as separate env vars (e.g.
OMP_TOOL_LIBRARIES=/usr/lib/llvm-17/lib/libarcher.so ARCHER_OPTIONS='verbose=1')
to make the example copy-pasteable and accurate.
I only included the changes to B&B and OpenMP. Your PR has additional fixes, no? And enabling TSan support is quite helpful. |
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cpp/src/mip/local_search/local_search.cu (2)
84-94: Critical: Lambda accesses member variable without capturingthis.The lambda captures only
[&population]but the body accessescontext.problem_ptr(lines 88-91), which is a member variable of thelocal_search_tclass. Without capturingthis, accessing member variables results in undefined behavior or a compilation error.Apply this diff to fix the capture:
- cpu_fj.fj_cpu->improvement_callback = [&population](f_t obj, const std::vector<f_t>& h_vec) { + cpu_fj.fj_cpu->improvement_callback = [this, &population](f_t obj, const std::vector<f_t>& h_vec) { population.add_external_solution(h_vec, obj, solution_origin_t::CPUFJ); if (obj < local_search_best_obj) { CUOPT_LOG_TRACE("******* New local search best obj %g, best overall %g",
86-92: Critical: Data race on static variablelocal_search_best_obj.The static variable
local_search_best_objis accessed from multiple CPU solver threads without synchronization, creating a classic TOCTOU race condition:
- Thread A reads
local_search_best_obj(line 86)- Thread B reads
local_search_best_obj(line 86)- Thread A writes
local_search_best_obj(line 92)- Thread B writes
local_search_best_obj(line 92)This can cause lost updates and non-deterministic "best" values. ThreadSanitizer will flag this race.
Apply this diff to add mutex protection:
+#include <mutex> + static double local_search_best_obj = std::numeric_limits<double>::max(); static population_t<int, double>* pop_ptr = nullptr; +static std::mutex local_search_best_obj_mutex;Then protect the critical section:
population.add_external_solution(h_vec, obj, solution_origin_t::CPUFJ); - if (obj < local_search_best_obj) { + { + std::lock_guard<std::mutex> lock(local_search_best_obj_mutex); + if (obj < local_search_best_obj) { - CUOPT_LOG_TRACE("******* New local search best obj %g, best overall %g", - context.problem_ptr->get_user_obj_from_solver_obj(obj), - context.problem_ptr->get_user_obj_from_solver_obj( - population.is_feasible() ? population.best_feasible().get_objective() - : std::numeric_limits<f_t>::max())); - local_search_best_obj = obj; + CUOPT_LOG_TRACE("******* New local search best obj %g, best overall %g", + context.problem_ptr->get_user_obj_from_solver_obj(obj), + context.problem_ptr->get_user_obj_from_solver_obj( + population.is_feasible() ? population.best_feasible().get_objective() + : std::numeric_limits<f_t>::max())); + local_search_best_obj = obj; + } }Note: The same issue exists at lines 120-127.
♻️ Duplicate comments (2)
cpp/src/linear_programming/utilities/cython_solve.cu (1)
307-309: Critical data race remains unfixed from previous review.The data race flagged in the previous review has not been addressed. All threads in the OpenMP parallel loop share the same
solver_settingspointer without synchronization. Insidecall_solve(), multiple threads concurrently access mutable state viaget_pdlp_settings()andget_mip_settings(), which return non-const references (verified in previous review at cpp/include/cuopt/linear_programming/solver_settings.hpp:89-90).This is undefined behavior and particularly concerning given this PR's focus on ThreadSanitizer and race condition fixes.
Required fix: Ensure thread-local
solver_settingsby creating a copy inside the parallel region before callingcall_solve(), or protect all accesses with proper synchronization. The previous review comment provides detailed fix options.Based on coding guidelines: "Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state."
cpp/src/mip/local_search/local_search.cu (1)
120-127: Critical: Data race on static variablelocal_search_best_obj(duplicate).This callback has the same unsynchronized access to the static
local_search_best_objvariable as the previous lambda. The same mutex protection should be applied here.Apply the same mutex protection pattern as suggested for lines 86-92.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
build.sh(6 hunks)cpp/CMakeLists.txt(2 hunks)cpp/src/dual_simplex/barrier.cu(3 hunks)cpp/src/linear_programming/pdlp.cu(1 hunks)cpp/src/linear_programming/utilities/cython_solve.cu(1 hunks)cpp/src/mip/diversity/lns/rins.cu(1 hunks)cpp/src/mip/local_search/local_search.cu(1 hunks)cpp/src/mip/presolve/third_party_presolve.cpp(1 hunks)cpp/src/routing/local_search/local_search.cu(1 hunks)cpp/src/routing/local_search/sliding_tsp.cu(1 hunks)cpp/src/routing/local_search/sliding_window.cu(1 hunks)cpp/src/routing/local_search/two_opt.cu(1 hunks)cpp/src/routing/local_search/vrp/vrp_execute.cu(1 hunks)cpp/tests/linear_programming/c_api_tests/c_api_test.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- cpp/CMakeLists.txt
- build.sh
- cpp/src/mip/diversity/lns/rins.cu
- cpp/tests/linear_programming/c_api_tests/c_api_test.c
- cpp/src/linear_programming/pdlp.cu
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{cu,cuh}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Files:
cpp/src/routing/local_search/sliding_window.cucpp/src/dual_simplex/barrier.cucpp/src/routing/local_search/vrp/vrp_execute.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/routing/local_search/local_search.cucpp/src/routing/local_search/two_opt.cucpp/src/routing/local_search/sliding_tsp.cucpp/src/mip/local_search/local_search.cu
**/*.cu
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Files:
cpp/src/routing/local_search/sliding_window.cucpp/src/dual_simplex/barrier.cucpp/src/routing/local_search/vrp/vrp_execute.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/routing/local_search/local_search.cucpp/src/routing/local_search/two_opt.cucpp/src/routing/local_search/sliding_tsp.cucpp/src/mip/local_search/local_search.cu
**/*.{cu,cuh,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...
Files:
cpp/src/routing/local_search/sliding_window.cucpp/src/mip/presolve/third_party_presolve.cppcpp/src/dual_simplex/barrier.cucpp/src/routing/local_search/vrp/vrp_execute.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/routing/local_search/local_search.cucpp/src/routing/local_search/two_opt.cucpp/src/routing/local_search/sliding_tsp.cucpp/src/mip/local_search/local_search.cu
**/*.{cu,cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code
Files:
cpp/src/routing/local_search/sliding_window.cucpp/src/mip/presolve/third_party_presolve.cppcpp/src/dual_simplex/barrier.cucpp/src/routing/local_search/vrp/vrp_execute.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/routing/local_search/local_search.cucpp/src/routing/local_search/two_opt.cucpp/src/routing/local_search/sliding_tsp.cucpp/src/mip/local_search/local_search.cu
**/*.{cpp,hpp,h}
📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)
**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Files:
cpp/src/mip/presolve/third_party_presolve.cpp
🧠 Learnings (22)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Applied to files:
cpp/src/routing/local_search/sliding_window.cucpp/src/routing/local_search/vrp/vrp_execute.cucpp/src/routing/local_search/local_search.cucpp/src/routing/local_search/two_opt.cucpp/src/routing/local_search/sliding_tsp.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Applied to files:
cpp/src/routing/local_search/sliding_window.cucpp/src/mip/presolve/third_party_presolve.cppcpp/src/routing/local_search/vrp/vrp_execute.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/routing/local_search/local_search.cucpp/src/routing/local_search/two_opt.cucpp/src/routing/local_search/sliding_tsp.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
Applied to files:
cpp/src/mip/presolve/third_party_presolve.cppcpp/src/dual_simplex/barrier.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/routing/local_search/local_search.cucpp/src/routing/local_search/two_opt.cucpp/src/routing/local_search/sliding_tsp.cucpp/src/mip/local_search/local_search.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Applied to files:
cpp/src/mip/presolve/third_party_presolve.cppcpp/src/dual_simplex/barrier.cucpp/src/routing/local_search/vrp/vrp_execute.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/routing/local_search/local_search.cucpp/src/routing/local_search/two_opt.cucpp/src/routing/local_search/sliding_tsp.cu
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.
Applied to files:
cpp/src/mip/presolve/third_party_presolve.cppcpp/src/dual_simplex/barrier.cucpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.
Applied to files:
cpp/src/mip/presolve/third_party_presolve.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Applied to files:
cpp/src/mip/presolve/third_party_presolve.cppcpp/src/dual_simplex/barrier.cucpp/src/linear_programming/utilities/cython_solve.cucpp/src/mip/local_search/local_search.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Applied to files:
cpp/src/mip/presolve/third_party_presolve.cppcpp/src/dual_simplex/barrier.cucpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Applied to files:
cpp/src/dual_simplex/barrier.cucpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*benchmark*.{cpp,cu,py} : Include performance benchmarks and regression detection for GPU operations; verify near real-time performance on million-variable problems
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-12-06T00:22:48.638Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 500
File: cpp/tests/linear_programming/c_api_tests/c_api_test.c:1033-1048
Timestamp: 2025-12-06T00:22:48.638Z
Learning: In cuOPT's quadratic programming API, when a user provides a quadratic objective matrix Q via set_quadratic_objective_matrix or the C API functions cuOptCreateQuadraticProblem/cuOptCreateQuadraticRangedProblem, the API internally computes Q_symmetric = Q + Q^T and the barrier solver uses 0.5 * x^T * Q_symmetric * x. From the user's perspective, the convention is x^T Q x. For a diagonal Q with values [q1, q2, ...], the resulting quadratic terms are q1*x1^2 + q2*x2^2 + ...
Applied to files:
cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Applied to files:
cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Applied to files:
cpp/src/routing/local_search/local_search.cu
🧬 Code graph analysis (1)
cpp/src/linear_programming/utilities/cython_solve.cu (1)
cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp (1)
call_solve(95-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
- GitHub Check: checks / check-style
🔇 Additional comments (8)
cpp/src/routing/local_search/vrp/vrp_execute.cu (1)
467-470: LGTM! Assertion parentheses corrected for proper numerical validation.The assertion now correctly applies
absto the entire delta expression(cost_before - cost_after) + debug_deltabefore comparing against the relative tolerance. This ensures accurate detection of cost mismatches in both positive and negative directions.cpp/src/routing/local_search/sliding_window.cu (1)
1118-1120: LGTM! Assertion corrected with proper parentheses and accurate error message.Two improvements here:
- The
absnow correctly wraps the entire delta expression for proper numerical validation- The assertion message now accurately reflects "sliding_window costs" instead of the incorrect "cross costs"
cpp/src/routing/local_search/sliding_tsp.cu (1)
567-570: LGTM! Assertion parentheses corrected for sliding TSP cost validation.The
absnow properly encloses the full delta expression(cost_before - cost_after) + debug_delta, ensuring accurate cost mismatch detection with relative tolerance.cpp/src/routing/local_search/two_opt.cu (1)
460-463: LGTM! Assertion corrected for two-opt cost validation.The parentheses now ensure
abswraps the complete delta(cost_before - cost_after) + debug_deltafor accurate relative tolerance checking.cpp/src/routing/local_search/local_search.cu (1)
128-131: LGTM! Assertion corrected for cross-search cost validation.The
absnow properly wraps the entire delta expression, ensuring accurate cost mismatch detection with relative tolerance. This completes the consistent assertion fix pattern across all local_search components.cpp/src/linear_programming/utilities/cython_solve.cu (1)
309-309: LGTM: Removing redundantstd::move.The removal of explicit
std::moveis correct. Sincecall_solve()returns astd::unique_ptrby value (a prvalue), the return value is automatically moved. The explicitstd::movewas redundant and could potentially inhibit copy elision.However, note that this change does not address the critical data race issue flagged in the previous review (see next comment).
cpp/src/dual_simplex/barrier.cu (1)
1449-1449: LGTM! Wrapper function usage is correct.The calls to
pairwise_multiplyandaxpyproperly replace the previous thrust::transform operations. The arguments are correctly ordered, types match the overload resolution, and stream handling is appropriate.Also applies to: 1459-1459
cpp/src/mip/presolve/third_party_presolve.cpp (1)
17-25: Conditional compilation guards correctly enable Clang compatibility.The
#if !defined(__clang__)guards properly isolate GCC-specific diagnostic pragmas. The__clang__macro correctly identifies the Clang compiler, and the push/pop pragma pair is consistently guarded on both sides (lines 17-20 and 23-25), ensuring proper diagnostic state management. The pattern is identically used ingf2_presolve.hpp, confirming consistency across the codebase. Suppressing-Wstringop-overflowfor Papilo includes is appropriate since this GCC warning can produce false positives with third-party libraries.
| // non-template wrappers to work around clang compiler bug | ||
| [[maybe_unused]] static void pairwise_multiply( | ||
| float* a, float* b, float* out, int size, rmm::cuda_stream_view stream) | ||
| { | ||
| cub::DeviceTransform::Transform( | ||
| cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream); | ||
| } | ||
|
|
||
| [[maybe_unused]] static void pairwise_multiply( | ||
| double* a, double* b, double* out, int size, rmm::cuda_stream_view stream) | ||
| { | ||
| cub::DeviceTransform::Transform( | ||
| cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream); | ||
| } | ||
|
|
||
| [[maybe_unused]] static void axpy( | ||
| float alpha, float* x, float beta, float* y, float* out, int size, rmm::cuda_stream_view stream) | ||
| { | ||
| cub::DeviceTransform::Transform( | ||
| cuda::std::make_tuple(x, y), | ||
| out, | ||
| size, | ||
| [alpha, beta] __host__ __device__(float a, float b) { return alpha * a + beta * b; }, | ||
| stream); | ||
| } | ||
|
|
||
| [[maybe_unused]] static void axpy(double alpha, | ||
| double* x, | ||
| double beta, | ||
| double* y, | ||
| double* out, | ||
| int size, | ||
| rmm::cuda_stream_view stream) | ||
| { | ||
| cub::DeviceTransform::Transform( | ||
| cuda::std::make_tuple(x, y), | ||
| out, | ||
| size, | ||
| [alpha, beta] __host__ __device__(double a, double b) { return alpha * a + beta * b; }, | ||
| stream); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add CUDA error checking and improve bug documentation.
The wrapper functions lack CUDA error checking, which violates the coding guideline requiring verification for all CUDA operations. Additionally, the comment about "clang compiler bug" is vague and provides no context for future maintainers.
As per coding guidelines, every CUDA operation must have error checking.
Apply this diff to add error checking:
[[maybe_unused]] static void pairwise_multiply(
float* a, float* b, float* out, int size, rmm::cuda_stream_view stream)
{
- cub::DeviceTransform::Transform(
+ RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream);
+ stream.synchronize());
}
[[maybe_unused]] static void pairwise_multiply(
double* a, double* b, double* out, int size, rmm::cuda_stream_view stream)
{
- cub::DeviceTransform::Transform(
+ RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream);
+ stream.synchronize());
}
[[maybe_unused]] static void axpy(
float alpha, float* x, float beta, float* y, float* out, int size, rmm::cuda_stream_view stream)
{
- cub::DeviceTransform::Transform(
+ RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
cuda::std::make_tuple(x, y),
out,
size,
[alpha, beta] __host__ __device__(float a, float b) { return alpha * a + beta * b; },
- stream);
+ stream));
+ RAFT_CHECK_CUDA(stream);
}
[[maybe_unused]] static void axpy(double alpha,
double* x,
double beta,
double* y,
double* out,
int size,
rmm::cuda_stream_view stream)
{
- cub::DeviceTransform::Transform(
+ RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
cuda::std::make_tuple(x, y),
out,
size,
[alpha, beta] __host__ __device__(double a, double b) { return alpha * a + beta * b; },
- stream);
+ stream));
+ RAFT_CHECK_CUDA(stream);
}Also improve the comment to document the specific issue:
-// non-template wrappers to work around clang compiler bug
+// Non-template wrappers to work around clang compiler issue with thrust::transform template
+// instantiation. These use cub::DeviceTransform directly with explicit float/double overloads
+// instead of templates to ensure proper compilation with clang when ThreadSanitizer is enabled.Consider whether a templated version with explicit instantiation could work once the clang/TSan build is stable:
template<typename T>
[[maybe_unused]] static void pairwise_multiply(
T* a, T* b, T* out, int size, rmm::cuda_stream_view stream)
{
RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream));
RAFT_CHECK_CUDA(stream);
}
// Explicit instantiations
template void pairwise_multiply<float>(float*, float*, float*, int, rmm::cuda_stream_view);
template void pairwise_multiply<double>(double*, double*, double*, int, rmm::cuda_stream_view);|
🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you! If this is an "epic" issue, then please add the "epic" label to this issue. |
This PR fixes compilation issues to allow for building with clang, in order to use ThreadSanitizer.
ThreadSanitizer is enabled by passing the
-tsanflag tobuild.sh.Also included are some minor data race fixes in the CPU code.
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.