Skip to content

Conversation

@aliceb-nv
Copy link
Contributor

@aliceb-nv aliceb-nv commented Nov 18, 2025

This PR fixes compilation issues to allow for building with clang, in order to use ThreadSanitizer.
ThreadSanitizer is enabled by passing the -tsan flag to build.sh.

Also included are some minor data race fixes in the CPU code.

Summary by CodeRabbit

  • New Features

    • Added ThreadSanitizer and MemorySanitizer build options and corresponding build flags; build system now enforces sanitizer mutual exclusivity and propagates flags to the build.
  • Bug Fixes

    • Replaced busy-waiting with condition-variable synchronization for CPU worker threads.
    • Fixed several cost-validation/assertion parentheses to correct comparisons.
  • Refactor

    • Improved compiler-aware flag handling for host/CUDA compilation and reduced warning noise.
  • Tests

    • Added test annotations/overrides and introduced sanitizer suppression entries for external libraries.

✏️ Tip: You can customize this high-level summary in your review settings.

@aliceb-nv aliceb-nv added this to the 25.12 milestone Nov 18, 2025
@aliceb-nv aliceb-nv requested review from a team as code owners November 18, 2025 12:58
@aliceb-nv aliceb-nv added the bug Something isn't working label Nov 18, 2025
@aliceb-nv aliceb-nv requested a review from vyasr November 18, 2025 12:58
@aliceb-nv aliceb-nv added the non-breaking Introduces a non-breaking change label Nov 18, 2025
@aliceb-nv
Copy link
Contributor Author

@nguidotti I've added mutex locks a bit haphazardly for now. You might have better guidance as to the best way to ensure race-free access :)

@aliceb-nv aliceb-nv requested review from nguidotti and removed request for kaatish November 18, 2025 12:59
Copy link
Contributor

@nguidotti nguidotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look goods to me! Thanks for the hard work, Alice!

@aliceb-nv aliceb-nv changed the base branch from main to release/25.12 November 18, 2025 14:05
Copy link
Collaborator

@rgsl888prabhu rgsl888prabhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion, rest looks good.

@@ -0,0 +1,6 @@
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move this utils folder


# To use ThreadSanitizer:
# 1. Build with clang and the -tsan flag
# 2. Run the binary with env var set: OMP_TOOL_LIBRARIES=/usr/lib/llvm-17/lib/libarcher.so ARCHER_OPTIONS='verbose=1' TSAN_OPTIONS='suppresions=ci/tsan_suppressions.txt:ignore_noninstrumented_modules=1:halt_on_error=1'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are planning to use this file in CI, may be we can keep it in cpp folder and use it from local space ?

Copy link
Contributor

@hlinsen hlinsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alice! lgtm

@chris-maes
Copy link
Contributor

@aliceb-nv can you confirm that should go into the 25.12 release? If not, can we move the milestone to 26.02? Thank you.

@aliceb-nv
Copy link
Contributor Author

@chris-maes I'd like to get it in for 25.12 as it contains race condition fixes. It has already been engine-reviewed

@aliceb-nv
Copy link
Contributor Author

@chris-maes Ah nevermind, it seems my changes had already been integrated by Nicolas Lee in one of his merged PRs :) I will retarget this for 26.02 then. The remaining changes are non critical.

@aliceb-nv aliceb-nv modified the milestones: 25.12, 26.02 Nov 25, 2025
@coderabbitai
Copy link

coderabbitai bot commented Nov 25, 2025

📝 Walkthrough

Walkthrough

Adds ThreadSanitizer and MemorySanitizer build options and CMake handling; suppresses TSAN races for third-party libs; plus multiple internal code fixes: destructor and constructor adjustments, unused-variable annotations, lambda-capture reductions, concurrency fixes, helper wrappers for device transforms, and several assertion and test-initialization updates.

Changes

Cohort / File(s) Summary
Build Infrastructure — Sanitizers
build.sh, cpp/CMakeLists.txt, cpp/tsan_suppressions.txt
Added -tsan and -msan build options and BUILD_TSAN/BUILD_MSAN flags; enforced mutual exclusivity among sanitizer modes; propagated -DBUILD_TSAN/-DBUILD_MSAN to CMake; adjusted sanitizer CXX/CUDA flags and added cpp/tsan_suppressions.txt with TSAN suppressions.
Public API Headers
cpp/include/cuopt/error.hpp, cpp/include/cuopt/linear_programming/utilities/internals.hpp
Removed explicit default constructor from cuopt::logic_error; added virtual ~base_solution_t() = default; to base_solution_t.
Device Transform Helpers
cpp/src/dual_simplex/barrier.cu
Introduced non-template pairwise_multiply and axpy wrappers (float/double) replacing direct thrust transforms to work around a Clang compiler issue.
Compiler Diagnostics — Clang Guards
cpp/src/mip/presolve/gf2_presolve.hpp, cpp/src/mip/presolve/third_party_presolve.cpp
Wrapped GCC diagnostic pragmas with #if !defined(__clang__)/#endif to avoid applying them under Clang.
Concurrency Improvements
cpp/src/mip/utilities/cpu_worker_thread.cuh
Replaced busy-wait with condition_variable::wait protected by a mutex and added cpu_cv.notify_all() to signal completion.
Lambda Capture Reductions
cpp/src/mip/diversity/lns/rins.cu, cpp/src/mip/local_search/local_search.cu
Reduced captures in lambdas to remove unnecessary this (capture only needed variables).
Unused Variable Annotations
cpp/src/linear_programming/pdlp.cu, cpp/src/linear_programming/utils.cuh, cpp/src/routing/crossovers/ox_recombiner.cuh
Marked several local variables with [[maybe_unused]] to suppress unused-variable warnings; minor formatting adjustments.
Move Semantics Simplification
cpp/src/linear_programming/utilities/cython_solve.cu
Removed explicit std::move in batch solve loop, using implicit move from returned prvalue.
Assertion Parentheses Fixes
cpp/src/routing/local_search/local_search.cu, cpp/src/routing/local_search/sliding_tsp.cu, cpp/src/routing/local_search/sliding_window.cu, cpp/src/routing/local_search/two_opt.cu, cpp/src/routing/local_search/vrp/vrp_execute.cu
Corrected parentheses around absolute/delta expressions in several assertions and adjusted an assertion message in sliding_window.cu.
LNS / Local Search Behavior
cpp/src/mip/diversity/lns/rins.cu, cpp/src/mip/local_search/local_search.cu
(Also listed above) Lambdas updated to narrower captures—verify access to required state.
Tests — Override Specifiers
cpp/tests/distance_engine/waypoint_matrix_test.cpp, cpp/tests/routing/level0/l0_ges_test.cu, cpp/tests/routing/level0/l0_routing_test.cu
Added override to TearDown() methods in multiple test fixtures.
Tests — Base Class Parameter Changes
cpp/tests/routing/level0/l0_objective_function_test.cu, cpp/tests/routing/level0/l0_vehicle_order_match.cu, cpp/tests/routing/level0/l0_vehicle_types_test.cu
Changed base-test initializer arguments from (512, 5E-2, 0) to (512, 0, 0), altering test initialization parameters.
Tests — Misc
cpp/tests/linear_programming/c_api_tests/c_api_test.c, cpp/tests/routing/level0/l0_vehicle_types_test.cu
Added a default "Unknown" return in termination_status_to_string; other test constructor declaration adjusted (see above).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review TSAN/MSAN build logic and mutual-exclusion checks in build.sh and cpp/CMakeLists.txt.
  • Inspect cpu_worker_thread.cuh for correct condition-variable usage and absence of race/deadlock.
  • Validate the non-template GPU helper wrappers in barrier.cu for correctness and performance implications.
  • Confirm lambda capture reductions do not remove needed state access, especially in local_search.cu.
  • Run affected tests where base initializer parameters changed to ensure expectations still hold.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary objectives of the PR: adding Clang host build support, ThreadSanitizer support, and fixing race conditions.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
cpp/src/linear_programming/utilities/cython_solve.cu (1)

222-224: Add explicit stream cleanup before function return.

The user-created stream at line 223 is passed to raft::handle_t, which does not take ownership of it. Per RAFT documentation, caller responsibility is to destroy user-created streams. The function returns at line 237 without calling cudaStreamDestroy(stream), creating a resource leak. In batch mode, this function is invoked multiple times in parallel, amplifying the leak.

Add the following before the return statement at line 237:

RAFT_CUDA_TRY(cudaStreamSynchronize(stream));
RAFT_CUDA_TRY(cudaStreamDestroy(stream));
cpp/src/mip/local_search/local_search.cu (1)

56-93: Lambda capture missing this will cause compilation failure; add synchronization for static variable race

Two issues require immediate fixes:

  1. Compile-time error: Missing this in lambda capture (line 82)
    The lambda captures only [&population] but references the non-static member context inside its body. Unlike the corrected version in start_cpufj_lptopt_scratch_threads at line 120 which uses [this, &population], this lambda is ill-formed and will not compile.

    Fix: Change line 82 from [&population] to [this, &population] to match the pattern already used in the second method.

  2. Data race on local_search_best_obj (lines 56, 85, 91)
    The static file-scope variable is written without synchronization from multiple solver threads spawned at line 99 (cpu_fj.start_cpu_solver()). This is undefined behavior. Either guard writes with std::mutex or use std::atomic<double> with atomic compare-and-swap for minimum updates.

cpp/tests/routing/level0/l0_objective_function_test.cu (1)

65-93: Objective test helper currently always returns 0 and doesn’t validate behavior

get_objective is defined as:

auto get_objective = [](const auto& routing_solution, const double weight) {
  return (routing_solution.get_total_objective() - routing_solution.get_total_objective()) / weight;
};

This expression is identically 0 for all inputs, so objective_values[i] is always 0 and the monotonicity checks in the loop never exercise the real objective behavior. This effectively disables the intended validation of objective scaling with weight.

Consider computing the delta against a baseline solution or decomposing the total objective so that the contribution of this->objective_type is isolated, for example:

- auto get_objective = [](const auto& routing_solution, const double weight) {
-   return (routing_solution.get_total_objective() - routing_solution.get_total_objective()) / weight;
- };
+ auto get_objective = [](double total_with_obj, double total_without_obj, double weight) {
+   return (total_with_obj - total_without_obj) / weight;
+ };

and capture both totals appropriately in the test loop so the check reflects the actual objective term.
Based on learnings, tests here should validate numerical correctness of objectives, not just “runs without error”.

🧹 Nitpick comments (6)
cpp/tests/linear_programming/c_api_tests/c_api_test.c (1)

56-56: Fallback return for unrecognized termination status looks good

Adding a final return "Unknown"; removes the undefined behavior of falling off the end of a non-void function and makes logging robust to new/invalid status codes. Optionally, you could express this as a default: branch in the switch for slightly clearer intent, but the current form is correct and safe.

cpp/tests/routing/level0/l0_ges_test.cu (1)

58-58: Empty TearDown overrides are typically unnecessary.

The override specifier is good practice, but these empty TearDown() methods serve no functional purpose. In Google Test, you only need to override TearDown() if cleanup is required. If these were added to satisfy compiler warnings or as placeholders for future cleanup logic, consider adding a comment to clarify the intent; otherwise, they can be safely removed.

Also applies to: 166-166

cpp/tests/routing/level0/l0_routing_test.cu (1)

323-323: Empty TearDown override is unnecessary.

This empty TearDown() method serves no functional purpose. Note that other test fixtures in this file (e.g., l0_routing_test_t at line 25 and lut_test_t at line 53) don't have explicit TearDown() methods, which is the typical pattern when no cleanup is needed. Consider removing this for consistency, or add cleanup logic if GPU state or device vectors need explicit deallocation.

cpp/tests/distance_engine/waypoint_matrix_test.cpp (1)

47-47: Empty TearDown overrides are unnecessary.

The override specifiers are appropriate, but these empty TearDown() methods provide no functionality. The device vectors (rmm::device_uvector) and waypoint_matrix objects use RAII and clean up automatically when the test fixture is destroyed. Unless these were added to address specific compiler warnings or as placeholders for future cleanup, they can be removed for clarity.

Also applies to: 134-134, 195-195

cpp/src/linear_programming/utils.cuh (1)

65-79: Remove unused iteration counter in conditional_major

iteration is never read; even with [[maybe_unused]], it and its increment are pure dead code in a small utility that may be called frequently. You can simplify:

-  uint64_t step                       = 10;
-  uint64_t threshold                  = 1000;
-  [[maybe_unused]] uint64_t iteration = 0;
+  uint64_t step      = 10;
+  uint64_t threshold = 1000;
@@
-  while (total_pdlp_iterations >= threshold) {
-    ++iteration;
+  while (total_pdlp_iterations >= threshold) {

This keeps the overflow checks and behavior identical while avoiding unnecessary work.

build.sh (1)

347-359: CMake propagation of BUILD_TSAN is correct; consider whether Python builds also need it

Passing -DBUILD_TSAN=${BUILD_TSAN} into the libcuopt CMake invocation aligns with the new BUILD_TSAN logic in cpp/CMakeLists.txt, so C++ builds will be TSAN‑instrumented when requested.

If you also want TSAN coverage for any CMake‑driven components built via SKBUILD_CMAKE_ARGS (Python wheels), you may want to append -DBUILD_TSAN=${BUILD_TSAN} into those arguments as well; currently only the core C++ build sees this option.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0802827 and d52bcb0.

📒 Files selected for processing (21)
  • build.sh (6 hunks)
  • cpp/CMakeLists.txt (2 hunks)
  • cpp/include/cuopt/error.hpp (0 hunks)
  • cpp/include/cuopt/linear_programming/utilities/internals.hpp (1 hunks)
  • cpp/src/linear_programming/pdlp.cu (1 hunks)
  • cpp/src/linear_programming/utilities/cython_solve.cu (1 hunks)
  • cpp/src/linear_programming/utils.cuh (1 hunks)
  • cpp/src/mip/diversity/lns/rins.cu (1 hunks)
  • cpp/src/mip/local_search/local_search.cu (1 hunks)
  • cpp/src/mip/presolve/gf2_presolve.hpp (1 hunks)
  • cpp/src/mip/presolve/third_party_presolve.cpp (1 hunks)
  • cpp/src/mip/utilities/cpu_worker_thread.cuh (2 hunks)
  • cpp/src/routing/crossovers/ox_recombiner.cuh (2 hunks)
  • cpp/tests/distance_engine/waypoint_matrix_test.cpp (3 hunks)
  • cpp/tests/linear_programming/c_api_tests/c_api_test.c (1 hunks)
  • cpp/tests/routing/level0/l0_ges_test.cu (2 hunks)
  • cpp/tests/routing/level0/l0_objective_function_test.cu (1 hunks)
  • cpp/tests/routing/level0/l0_routing_test.cu (1 hunks)
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu (1 hunks)
  • cpp/tests/routing/level0/l0_vehicle_types_test.cu (1 hunks)
  • cpp/tsan_suppressions.txt (1 hunks)
💤 Files with no reviewable changes (1)
  • cpp/include/cuopt/error.hpp
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{cu,cuh}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Files:

  • cpp/tests/routing/level0/l0_ges_test.cu
  • cpp/src/mip/utilities/cpu_worker_thread.cuh
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/tests/routing/level0/l0_routing_test.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/src/linear_programming/utils.cuh
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/routing/level0/l0_vehicle_types_test.cu
  • cpp/src/routing/crossovers/ox_recombiner.cuh
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
**/*.cu

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Files:

  • cpp/tests/routing/level0/l0_ges_test.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/tests/routing/level0/l0_routing_test.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/routing/level0/l0_vehicle_types_test.cu
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/tests/routing/level0/l0_ges_test.cu
  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/src/mip/utilities/cpu_worker_thread.cuh
  • cpp/include/cuopt/linear_programming/utilities/internals.hpp
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/tests/routing/level0/l0_routing_test.cu
  • cpp/tests/distance_engine/waypoint_matrix_test.cpp
  • cpp/src/linear_programming/pdlp.cu
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/src/linear_programming/utils.cuh
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/routing/level0/l0_vehicle_types_test.cu
  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/routing/crossovers/ox_recombiner.cuh
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
**/*test*.{cpp,cu,py}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*test*.{cpp,cu,py}: Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)
Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment
Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover
Add tests for problem transformations: verify correctness of original→transformed→postsolve mappings and index consistency across problem representations
Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling

Files:

  • cpp/tests/routing/level0/l0_ges_test.cu
  • cpp/tests/routing/level0/l0_routing_test.cu
  • cpp/tests/distance_engine/waypoint_matrix_test.cpp
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/tests/routing/level0/l0_vehicle_types_test.cu
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/tests/routing/level0/l0_ges_test.cu
  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/include/cuopt/linear_programming/utilities/internals.hpp
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/tests/routing/level0/l0_routing_test.cu
  • cpp/tests/distance_engine/waypoint_matrix_test.cpp
  • cpp/src/linear_programming/pdlp.cu
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/src/mip/local_search/local_search.cu
  • cpp/tests/routing/level0/l0_vehicle_types_test.cu
  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
**/*.{h,hpp,py}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Verify C API does not break ABI stability (no struct layout changes, field reordering); maintain backward compatibility in Python and server APIs with deprecation warnings

Files:

  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/include/cuopt/linear_programming/utilities/internals.hpp
**/*.{cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Files:

  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/include/cuopt/linear_programming/utilities/internals.hpp
  • cpp/tests/distance_engine/waypoint_matrix_test.cpp
  • cpp/src/mip/presolve/third_party_presolve.cpp
cpp/include/cuopt/**/*

⚙️ CodeRabbit configuration file

cpp/include/cuopt/**/*: For public header files (C++ API):

  • Check if new public functions/classes have documentation comments (Doxygen format)
  • Flag API changes that may need corresponding docs/ updates
  • Verify parameter descriptions match actual types/behavior
  • Suggest documenting thread-safety, GPU requirements, and numerical behavior
  • For breaking changes, recommend updating docs and migration guides

Files:

  • cpp/include/cuopt/linear_programming/utilities/internals.hpp
🧠 Learnings (25)
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/tsan_suppressions.txt
  • cpp/src/mip/utilities/cpu_worker_thread.cuh
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.cu : Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations

Applied to files:

  • cpp/tsan_suppressions.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*test*.{cpp,cu,py} : Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment

Applied to files:

  • cpp/tsan_suppressions.txt
  • cpp/tests/routing/level0/l0_ges_test.cu
  • cpp/tests/routing/level0/l0_routing_test.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Applied to files:

  • cpp/tsan_suppressions.txt
  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Applied to files:

  • cpp/tsan_suppressions.txt
  • cpp/src/mip/utilities/cpu_worker_thread.cuh
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution

Applied to files:

  • cpp/tsan_suppressions.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems

Applied to files:

  • cpp/tsan_suppressions.txt
  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/mip/presolve/third_party_presolve.cpp
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle

Applied to files:

  • cpp/tsan_suppressions.txt
  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cpp,hpp,h} : Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Applied to files:

  • cpp/tsan_suppressions.txt
  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results

Applied to files:

  • cpp/tsan_suppressions.txt
  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/include/cuopt/linear_programming/utilities/internals.hpp
  • cpp/src/linear_programming/pdlp.cu
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/src/linear_programming/utils.cuh
  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/routing/crossovers/ox_recombiner.cuh
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover

Applied to files:

  • cpp/tests/routing/level0/l0_ges_test.cu
  • cpp/tests/routing/level0/l0_routing_test.cu
  • cpp/tests/distance_engine/waypoint_matrix_test.cpp
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Applied to files:

  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/utils.cuh
  • cpp/src/mip/local_search/local_search.cu
  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/src/mip/presolve/gf2_presolve.hpp
  • build.sh
  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)

Applied to files:

  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/src/linear_programming/pdlp.cu
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/src/linear_programming/utils.cuh
  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/src/mip/utilities/cpu_worker_thread.cuh
  • cpp/include/cuopt/linear_programming/utilities/internals.hpp
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/linear_programming/pdlp.cu
  • cpp/src/linear_programming/utils.cuh
  • cpp/src/mip/local_search/local_search.cu
  • cpp/src/mip/presolve/third_party_presolve.cpp
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.

Applied to files:

  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/src/mip/presolve/third_party_presolve.cpp
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms

Applied to files:

  • cpp/src/mip/presolve/gf2_presolve.hpp
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse

Applied to files:

  • cpp/src/mip/presolve/gf2_presolve.hpp
  • cpp/src/linear_programming/utils.cuh
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*test*.{cpp,cu,py} : Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling

Applied to files:

  • cpp/tests/distance_engine/waypoint_matrix_test.cpp
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)

Applied to files:

  • cpp/src/linear_programming/pdlp.cu
  • cpp/tests/routing/level0/l0_objective_function_test.cu
  • cpp/src/linear_programming/utils.cuh
  • cpp/tests/routing/level0/l0_vehicle_order_match.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Applied to files:

  • cpp/src/linear_programming/pdlp.cu
  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*test*.{cpp,cu,py} : Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)

Applied to files:

  • cpp/tests/routing/level0/l0_objective_function_test.cu
📚 Learning: 2025-11-25T10:20:49.811Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.811Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify error propagation from CUDA to user-facing APIs is complete; ensure CUDA errors are caught and mapped to meaningful user error codes

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh} : Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification

Applied to files:

  • cpp/CMakeLists.txt
📚 Learning: 2025-11-25T10:20:49.810Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.810Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events

Applied to files:

  • cpp/CMakeLists.txt
🧬 Code graph analysis (4)
cpp/src/linear_programming/utilities/cython_solve.cu (1)
cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp (1)
  • call_solve (95-98)
build.sh (1)
datasets/get_test_data.sh (1)
  • hasArg (77-79)
cpp/src/linear_programming/pdlp.cu (1)
cpp/src/mip/diversity/recombiners/recombiner_stats.hpp (2)
  • maybe_unused (53-64)
  • maybe_unused (53-53)
cpp/src/mip/diversity/lns/rins.cu (1)
cpp/src/mip/solver.cu (6)
  • solution (64-68)
  • solution (64-64)
  • solution (70-75)
  • solution (70-72)
  • solution (77-80)
  • solution (77-77)
🪛 Clang (14.0.6)
cpp/src/mip/presolve/gf2_presolve.hpp

[error] 14-14: 'papilo/Config.hpp' file not found

(clang-diagnostic-error)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
🔇 Additional comments (18)
cpp/src/mip/presolve/gf2_presolve.hpp (2)

10-13: LGTM! Proper Clang compatibility guard.

Correctly excludes GCC-specific diagnostic pragmas when compiling with Clang, enabling the Clang build path for ThreadSanitizer support.


18-20: LGTM! Properly paired diagnostic guard.

The closing guard correctly matches the opening block, maintaining diagnostic state consistency across compilers.

cpp/src/mip/presolve/third_party_presolve.cpp (2)

17-20: LGTM! Consistent Clang compatibility.

Applies the same GCC diagnostic guard pattern as in the header file, maintaining consistency across the codebase for Clang support.


23-25: LGTM! Properly closed diagnostic scope.

The closing guard is correctly paired with the opening block, completing the Clang-aware diagnostic handling.

cpp/src/routing/crossovers/ox_recombiner.cuh (2)

339-391: [[maybe_unused]] on cost_n/cost_p/total_delta is a reasonable way to silence clang warnings for debug-only validation

These locals only participate in the cost-coherence validation (via total_delta and cuopt_assert), so marking them [[maybe_unused]] is a harmless, targeted fix for release builds where asserts may be compiled out; the main recombination logic and costs are unaffected. Please just confirm all supported host compilers are built with a C++ standard that recognizes [[maybe_unused]] so this remains portable across your toolchains. Based on learnings, algorithm correctness is unchanged.


533-544: [[maybe_unused]] on found is appropriate for assert-only usage

found is only consumed by cuopt_assert, so in builds where that macro compiles away, this variable can trigger unused-variable warnings. The [[maybe_unused]] annotation cleanly addresses that without touching the transpose-graph correctness checks. As with the other attribute use, please confirm it is supported across your full compiler matrix.

cpp/src/linear_programming/utilities/cython_solve.cu (1)

298-298: LGTM: Correctly removes redundant std::move.

The explicit std::move on the prvalue returned by call_solve is unnecessary. The move constructor is invoked automatically when assigning the return value to list[i]. Removing it improves code clarity and may enable better compiler optimizations.

cpp/src/linear_programming/pdlp.cu (1)

1513-1518: Marking sing_iters as [[maybe_unused]] is appropriate

The variable is only used under debug builds; the attribute cleanly suppresses release-build warnings without changing behavior.

cpp/tests/routing/level0/l0_vehicle_order_match.cu (1)

25-32: Constructor default now sets zero not_matching_constraints_fraction

Initializing base_test_t with (512, 0, 0) is consistent with later overriding not_matching_constraints_fraction from GetParam(), and should make the parameter sweep the sole source of this value. Please just confirm that base_test_t’s constructor does not rely on a non‑zero second argument for any other setup logic.

cpp/tsan_suppressions.txt (1)

1-6: TSAN suppressions look reasonable; keep an eye on scope

The suppressions for tbb and Papilo match the stated goal of ignoring races in external header‑only libraries. Just be aware that race:tbb / race:Papilo are broad and will also hide any genuine races in your own code that TSAN attributes to those symbols. If that becomes a concern, you can later narrow them (e.g., by function or file patterns).

cpp/src/mip/utilities/cpu_worker_thread.cuh (1)

80-87: Condition‑variable synchronization for worker completion is correct

Switching wait_for_cpu_solver() to a condition_variable::wait on cpu_thread_done || cpu_thread_terminate, and notifying after setting cpu_thread_done = true, gives proper blocking behavior without introducing races. Reads of these flags remain atomic for callers that still poll them directly. This is a solid improvement over any previous busy‑wait.

Also applies to: 135-137

cpp/include/cuopt/linear_programming/utilities/internals.hpp (1)

63-67: Adding a virtual destructor to base_solution_t is the right fix

base_solution_t is polymorphic (virtual bool is_mip() const), so providing a virtual destructor is required for safe deletion through base pointers. The new virtual ~base_solution_t() = default; resolves that without changing semantics.

cpp/src/mip/diversity/lns/rins.cu (1)

207-252: Review comment is based on incorrect assumption about callback threading

The claim that "solution_callback is invoked from multiple B&B threads" concurrently is unsupported by the code. In branch_and_bound.cpp, both invocation sites (lines 433 and 553) are protected by mutex_upper_, which serializes all callback executions—only one callback can run at a time. Additionally, the main thread in rins.cu only accesses rins_solution_queue after branch_and_bound.solve() returns (which waits for all worker threads to complete), eliminating concurrent access to the queue.

While user callbacks executing in worker threads warrant careful design consideration, the specific data race concern on rins_solution_queue.push_back() does not apply here due to the serialization provided by the branch-and-bound solver's internal locking.

Likely an incorrect or invalid review comment.

cpp/tests/routing/level0/l0_vehicle_types_test.cu (1)

24-28: Confirm impact of changing base_test_t parameters on routing test stability

The constructor now initializes base_test_t<i_t, f_t>(512, 0, 0). Given that the second parameter previously was non‑zero, please confirm this does not inadvertently relax or over‑tighten numerical tolerances or other thresholds in base_test_t that these L0 routing tests rely on (especially across GPUs/architectures).

Based on learnings, this may affect numerical tolerance behavior in tests.

cpp/tests/routing/level0/l0_objective_function_test.cu (1)

24-29: Verify changed base_test_t parameters don’t break objective-function test assumptions

The constructor now calls base_test_t<i_t, f_t>(512, 0, 0). Since this test exercises objective scaling vs. min_weight, please confirm this change doesn’t invalidate assumptions baked into the INSTANTIATE_TEST_SUITE_P parameters (e.g., tolerances, default weights) or make the test numerically fragile on different hardware.

Based on learnings, constructor-level tolerance changes can easily over‑constrain numerical tests.

cpp/CMakeLists.txt (2)

82-88: Sanitizer CXX flags look reasonable; confirm Clang/GCC coverage

Conditionally appending -Wno-error=maybe-uninitialized only when CMAKE_CXX_COMPILER_ID is not Clang is a good way to avoid GCC‑specific warnings on Clang. The combined -fsanitize=address,undefined -fno-omit-frame-pointer -g CXX flags also look standard for ASan/UBSan builds.

Please verify in CI that both GCC and Clang host builds pick up these flags as expected (and that Clang builds no longer warn/error on -Wno-error=maybe-uninitialized).


132-137: CUDA host-compiler conditional flags nicely avoid GCC-specific warnings on Clang

Conditioning the CUDA host flags on ${CMAKE_CUDA_HOST_COMPILER} / ${CMAKE_CXX_COMPILER_ID} so that only non‑Clang builds receive -Wno-error=non-template-friend is a good way to unblock Clang host builds while preserving stricter warnings for GCC. The separation into -Xcompiler=-Wall plus a bare -Wno-error=non-template-friend is also consistent with how nvcc forwards unknown -W* flags to the host compiler.

Please sanity‑check a Clang + nvcc configuration to ensure no unexpected “unknown warning option” diagnostics remain.

build.sh (1)

18-19: TSAN CLI flag wiring and mutual exclusion with -fsanitize look solid

The -tsan flag is correctly:

  • Added to VALIDARGS and documented in the help text as mutually exclusive with -fsanitize.
  • Backed by a BUILD_TSAN variable with sensible default (0).
  • Parsed via hasArg -tsan to set BUILD_TSAN=1.
  • Guarded by an explicit check that errors out if both BUILD_SANITIZER and BUILD_TSAN are enabled.

This gives a clear and safe UX for selecting sanitizer mode at build time.

Also applies to: 35-37, 80-81, 232-237, 306-310

Comment on lines +90 to +99
# To use ThreadSanitizer:
# 1. Build with clang and the -tsan flag
# 2. Run the binary with env var set: OMP_TOOL_LIBRARIES=/usr/lib/llvm-17/lib/libarcher.so ARCHER_OPTIONS='verbose=1' TSAN_OPTIONS='suppresions=cpp/tsan_suppressions.txt:ignore_noninstrumented_modules=1:halt_on_error=1'
# Replace with local llvm install path. libarcher.so must be presetn
if(BUILD_TSAN)
message(STATUS "Building with ThreadSanitizer enabled")
list(APPEND CUOPT_CXX_FLAGS -fsanitize=thread -fno-omit-frame-pointer -g)
add_link_options(-fsanitize=thread)
endif(BUILD_TSAN)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ThreadSanitizer integration is fine, but TSAN_OPTIONS comment has typos

The BUILD_TSAN block cleanly wires in -fsanitize=thread -fno-omit-frame-pointer -g and matching link flags behind a simple CMake option, which is what build.sh expects. That’s a solid pattern.

In the usage comment, though, the TSAN_OPTIONS example contains typos:

TSAN_OPTIONS='suppresions=cpp/tsan_suppressions.txt:...'
# and: libarcher.so must be presetn

For discoverability, consider correcting this to:

# 2. Run with:
#    TSAN_OPTIONS='suppressions=cpp/tsan_suppressions.txt:ignore_noninstrumented_modules=1:halt_on_error=1'
#    OMP_TOOL_LIBRARIES=/usr/lib/llvm-17/lib/libarcher.so ARCHER_OPTIONS='verbose=1'

This avoids users copy‑pasting an invalid TSAN_OPTIONS key.

🤖 Prompt for AI Agents
In cpp/CMakeLists.txt around lines 90 to 99, the usage comment for running with
ThreadSanitizer contains typos and a confusing ordering: change the TSAN_OPTIONS
key from "suppresions" to "suppressions" and correct "presetn" to "present";
also split/clarify the environment variables so the example shows
TSAN_OPTIONS='suppressions=cpp/tsan_suppressions.txt:ignore_noninstrumented_modules=1:halt_on_error=1'
and OMP_TOOL_LIBRARIES and ARCHER_OPTIONS as separate env vars (e.g.
OMP_TOOL_LIBRARIES=/usr/lib/llvm-17/lib/libarcher.so ARCHER_OPTIONS='verbose=1')
to make the example copy-pasteable and accurate.

@nguidotti
Copy link
Contributor

nguidotti commented Nov 25, 2025

@chris-maes Ah nevermind, it seems my changes had already been integrated by Nicolas Lee in one of his merged PRs :) I will retarget this for 26.02 then. The remaining changes are non critical.

I only included the changes to B&B and OpenMP. Your PR has additional fixes, no? And enabling TSan support is quite helpful.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@aliceb-nv aliceb-nv changed the base branch from release/25.12 to main December 17, 2025 16:36
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/src/mip/local_search/local_search.cu (2)

84-94: Critical: Lambda accesses member variable without capturing this.

The lambda captures only [&population] but the body accesses context.problem_ptr (lines 88-91), which is a member variable of the local_search_t class. Without capturing this, accessing member variables results in undefined behavior or a compilation error.

Apply this diff to fix the capture:

-    cpu_fj.fj_cpu->improvement_callback = [&population](f_t obj, const std::vector<f_t>& h_vec) {
+    cpu_fj.fj_cpu->improvement_callback = [this, &population](f_t obj, const std::vector<f_t>& h_vec) {
       population.add_external_solution(h_vec, obj, solution_origin_t::CPUFJ);
       if (obj < local_search_best_obj) {
         CUOPT_LOG_TRACE("******* New local search best obj %g, best overall %g",

86-92: Critical: Data race on static variable local_search_best_obj.

The static variable local_search_best_obj is accessed from multiple CPU solver threads without synchronization, creating a classic TOCTOU race condition:

  1. Thread A reads local_search_best_obj (line 86)
  2. Thread B reads local_search_best_obj (line 86)
  3. Thread A writes local_search_best_obj (line 92)
  4. Thread B writes local_search_best_obj (line 92)

This can cause lost updates and non-deterministic "best" values. ThreadSanitizer will flag this race.

Apply this diff to add mutex protection:

+#include <mutex>
+
 static double local_search_best_obj       = std::numeric_limits<double>::max();
 static population_t<int, double>* pop_ptr = nullptr;
+static std::mutex local_search_best_obj_mutex;

Then protect the critical section:

       population.add_external_solution(h_vec, obj, solution_origin_t::CPUFJ);
-      if (obj < local_search_best_obj) {
+      {
+        std::lock_guard<std::mutex> lock(local_search_best_obj_mutex);
+        if (obj < local_search_best_obj) {
-        CUOPT_LOG_TRACE("******* New local search best obj %g, best overall %g",
-                        context.problem_ptr->get_user_obj_from_solver_obj(obj),
-                        context.problem_ptr->get_user_obj_from_solver_obj(
-                          population.is_feasible() ? population.best_feasible().get_objective()
-                                                   : std::numeric_limits<f_t>::max()));
-        local_search_best_obj = obj;
+          CUOPT_LOG_TRACE("******* New local search best obj %g, best overall %g",
+                          context.problem_ptr->get_user_obj_from_solver_obj(obj),
+                          context.problem_ptr->get_user_obj_from_solver_obj(
+                            population.is_feasible() ? population.best_feasible().get_objective()
+                                                     : std::numeric_limits<f_t>::max()));
+          local_search_best_obj = obj;
+        }
       }

Note: The same issue exists at lines 120-127.

♻️ Duplicate comments (2)
cpp/src/linear_programming/utilities/cython_solve.cu (1)

307-309: Critical data race remains unfixed from previous review.

The data race flagged in the previous review has not been addressed. All threads in the OpenMP parallel loop share the same solver_settings pointer without synchronization. Inside call_solve(), multiple threads concurrently access mutable state via get_pdlp_settings() and get_mip_settings(), which return non-const references (verified in previous review at cpp/include/cuopt/linear_programming/solver_settings.hpp:89-90).

This is undefined behavior and particularly concerning given this PR's focus on ThreadSanitizer and race condition fixes.

Required fix: Ensure thread-local solver_settings by creating a copy inside the parallel region before calling call_solve(), or protect all accesses with proper synchronization. The previous review comment provides detailed fix options.

Based on coding guidelines: "Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state."

cpp/src/mip/local_search/local_search.cu (1)

120-127: Critical: Data race on static variable local_search_best_obj (duplicate).

This callback has the same unsynchronized access to the static local_search_best_obj variable as the previous lambda. The same mutex protection should be applied here.

Apply the same mutex protection pattern as suggested for lines 86-92.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d52bcb0 and 63c917a.

📒 Files selected for processing (14)
  • build.sh (6 hunks)
  • cpp/CMakeLists.txt (2 hunks)
  • cpp/src/dual_simplex/barrier.cu (3 hunks)
  • cpp/src/linear_programming/pdlp.cu (1 hunks)
  • cpp/src/linear_programming/utilities/cython_solve.cu (1 hunks)
  • cpp/src/mip/diversity/lns/rins.cu (1 hunks)
  • cpp/src/mip/local_search/local_search.cu (1 hunks)
  • cpp/src/mip/presolve/third_party_presolve.cpp (1 hunks)
  • cpp/src/routing/local_search/local_search.cu (1 hunks)
  • cpp/src/routing/local_search/sliding_tsp.cu (1 hunks)
  • cpp/src/routing/local_search/sliding_window.cu (1 hunks)
  • cpp/src/routing/local_search/two_opt.cu (1 hunks)
  • cpp/src/routing/local_search/vrp/vrp_execute.cu (1 hunks)
  • cpp/tests/linear_programming/c_api_tests/c_api_test.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • cpp/CMakeLists.txt
  • build.sh
  • cpp/src/mip/diversity/lns/rins.cu
  • cpp/tests/linear_programming/c_api_tests/c_api_test.c
  • cpp/src/linear_programming/pdlp.cu
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{cu,cuh}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh}: Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification
Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Files:

  • cpp/src/routing/local_search/sliding_window.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/routing/local_search/vrp/vrp_execute.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/routing/local_search/local_search.cu
  • cpp/src/routing/local_search/two_opt.cu
  • cpp/src/routing/local_search/sliding_tsp.cu
  • cpp/src/mip/local_search/local_search.cu
**/*.cu

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.cu: Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations
Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Files:

  • cpp/src/routing/local_search/sliding_window.cu
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/routing/local_search/vrp/vrp_execute.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/routing/local_search/local_search.cu
  • cpp/src/routing/local_search/two_opt.cu
  • cpp/src/routing/local_search/sliding_tsp.cu
  • cpp/src/mip/local_search/local_search.cu
**/*.{cu,cuh,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cu,cuh,cpp,hpp,h}: Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results
Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks
Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)
Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations
For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle
Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution
Assess algorithmic complexity for large-scale problems (millions of variables/constraints); ensure O(n log n) or better complexity, not O(n²) or worse
Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems
Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)
Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state
Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication
Check that hard-coded GPU de...

Files:

  • cpp/src/routing/local_search/sliding_window.cu
  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/routing/local_search/vrp/vrp_execute.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/routing/local_search/local_search.cu
  • cpp/src/routing/local_search/two_opt.cu
  • cpp/src/routing/local_search/sliding_tsp.cu
  • cpp/src/mip/local_search/local_search.cu
**/*.{cu,cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

Avoid inappropriate use of exceptions in performance-critical GPU operation paths; prefer error codes or CUDA error checking for latency-sensitive code

Files:

  • cpp/src/routing/local_search/sliding_window.cu
  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/routing/local_search/vrp/vrp_execute.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/routing/local_search/local_search.cu
  • cpp/src/routing/local_search/two_opt.cu
  • cpp/src/routing/local_search/sliding_tsp.cu
  • cpp/src/mip/local_search/local_search.cu
**/*.{cpp,hpp,h}

📄 CodeRabbit inference engine (.github/.coderabbit_review_guide.md)

**/*.{cpp,hpp,h}: Check for unclosed file handles when reading MPS/QPS problem files; ensure RAII patterns or proper cleanup in exception paths
Validate input sanitization to prevent buffer overflows and resource exhaustion attacks; avoid unsafe deserialization of problem files
Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Files:

  • cpp/src/mip/presolve/third_party_presolve.cpp
🧠 Learnings (22)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Identify assertions with overly strict numerical tolerances that fail on legitimate degenerate/edge cases (near-zero pivots, singular matrices, empty problems)

Applied to files:

  • cpp/src/routing/local_search/sliding_window.cu
  • cpp/src/routing/local_search/vrp/vrp_execute.cu
  • cpp/src/routing/local_search/local_search.cu
  • cpp/src/routing/local_search/two_opt.cu
  • cpp/src/routing/local_search/sliding_tsp.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate algorithm correctness in optimization logic: simplex pivots, branch-and-bound decisions, routing heuristics, and constraint/objective handling must produce correct results

Applied to files:

  • cpp/src/routing/local_search/sliding_window.cu
  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/routing/local_search/vrp/vrp_execute.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/routing/local_search/local_search.cu
  • cpp/src/routing/local_search/two_opt.cu
  • cpp/src/routing/local_search/sliding_tsp.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure variables and constraints are accessed from the correct problem context (original vs presolve vs folded vs postsolve); verify index mapping consistency across problem transformations

Applied to files:

  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/routing/local_search/local_search.cu
  • cpp/src/routing/local_search/two_opt.cu
  • cpp/src/routing/local_search/sliding_tsp.cu
  • cpp/src/mip/local_search/local_search.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)

Applied to files:

  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/routing/local_search/vrp/vrp_execute.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/routing/local_search/local_search.cu
  • cpp/src/routing/local_search/two_opt.cu
  • cpp/src/routing/local_search/sliding_tsp.cu
📚 Learning: 2025-12-04T20:09:09.264Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 602
File: cpp/src/linear_programming/solve.cu:732-742
Timestamp: 2025-12-04T20:09:09.264Z
Learning: In cpp/src/linear_programming/solve.cu, the barrier solver does not currently return INFEASIBLE or UNBOUNDED status. It only returns OPTIMAL, TIME_LIMIT, NUMERICAL_ISSUES, or CONCURRENT_LIMIT.

Applied to files:

  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-10-22T14:25:22.899Z
Learnt from: aliceb-nv
Repo: NVIDIA/cuopt PR: 527
File: cpp/src/mip/diversity/lns/rins.cu:167-175
Timestamp: 2025-10-22T14:25:22.899Z
Learning: In MIP (Mixed Integer Programming) problems in the cuOPT codebase, `n_integer_vars == 0` is impossible by definition—MIP problems must have at least one integer variable. If there are no integer variables, it would be a pure Linear Programming (LP) problem, not a MIP problem.

Applied to files:

  • cpp/src/mip/presolve/third_party_presolve.cpp
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Refactor code duplication in solver components (3+ occurrences) into shared utilities; for GPU kernels, use templated device functions to avoid duplication

Applied to files:

  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
  • cpp/src/mip/local_search/local_search.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems

Applied to files:

  • cpp/src/mip/presolve/third_party_presolve.cpp
  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh} : Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries; prefer standard library utilities over custom implementations

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Detect inefficient GPU kernel launches with low occupancy or poor memory access patterns; optimize for coalesced memory access and minimize warp divergence in hot paths

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Check that hard-coded GPU device IDs and resource limits are made configurable; abstract multi-backend support for different CUDA versions

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : For concurrent CUDA operations (barriers, async operations), explicitly create and manage dedicated streams instead of reusing the default stream; document stream lifecycle

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Ensure race conditions are absent in multi-GPU code and multi-threaded server implementations; verify proper synchronization of shared state

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cu,cuh,cpp,hpp,h} : Track GPU device memory allocations and deallocations to prevent memory leaks; ensure cudaMalloc/cudaFree balance and cleanup of streams/events

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*benchmark*.{cpp,cu,py} : Include performance benchmarks and regression detection for GPU operations; verify near real-time performance on million-variable problems

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-12-06T00:22:48.638Z
Learnt from: chris-maes
Repo: NVIDIA/cuopt PR: 500
File: cpp/tests/linear_programming/c_api_tests/c_api_test.c:1033-1048
Timestamp: 2025-12-06T00:22:48.638Z
Learning: In cuOPT's quadratic programming API, when a user provides a quadratic objective matrix Q via set_quadratic_objective_matrix or the C API functions cuOptCreateQuadraticProblem/cuOptCreateQuadraticRangedProblem, the API internally computes Q_symmetric = Q + Q^T and the barrier solver uses 0.5 * x^T * Q_symmetric * x. From the user's perspective, the convention is x^T Q x. For a diagonal Q with values [q1, q2, ...], the resulting quadratic terms are q1*x1^2 + q2*x2^2 + ...

Applied to files:

  • cpp/src/dual_simplex/barrier.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.{cpp,hpp,h} : Prevent thread-unsafe use of global and static variables; use proper mutex/synchronization in server code accessing shared solver state

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*.cu : Verify race conditions and correctness of GPU kernel shared memory, atomics, and warp-level operations

Applied to files:

  • cpp/src/linear_programming/utilities/cython_solve.cu
📚 Learning: 2025-11-25T10:20:49.822Z
Learnt from: CR
Repo: NVIDIA/cuopt PR: 0
File: .github/.coderabbit_review_guide.md:0-0
Timestamp: 2025-11-25T10:20:49.822Z
Learning: Applies to **/*test*.{cpp,cu,py} : Add tests for algorithm phase transitions: verify correct initialization of bounds and state when transitioning from presolve to simplex to diving to crossover

Applied to files:

  • cpp/src/routing/local_search/local_search.cu
🧬 Code graph analysis (1)
cpp/src/linear_programming/utilities/cython_solve.cu (1)
cpp/include/cuopt/linear_programming/utilities/cython_solve.hpp (1)
  • call_solve (95-98)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.12, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, arm64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.11, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-mps-parser / 13.0.2, 3.13, amd64, rockylinux8
  • GitHub Check: wheel-build-cuopt-sh-client / 13.0.2, 3.10, amd64, rockylinux8
  • GitHub Check: checks / check-style
🔇 Additional comments (8)
cpp/src/routing/local_search/vrp/vrp_execute.cu (1)

467-470: LGTM! Assertion parentheses corrected for proper numerical validation.

The assertion now correctly applies abs to the entire delta expression (cost_before - cost_after) + debug_delta before comparing against the relative tolerance. This ensures accurate detection of cost mismatches in both positive and negative directions.

cpp/src/routing/local_search/sliding_window.cu (1)

1118-1120: LGTM! Assertion corrected with proper parentheses and accurate error message.

Two improvements here:

  1. The abs now correctly wraps the entire delta expression for proper numerical validation
  2. The assertion message now accurately reflects "sliding_window costs" instead of the incorrect "cross costs"
cpp/src/routing/local_search/sliding_tsp.cu (1)

567-570: LGTM! Assertion parentheses corrected for sliding TSP cost validation.

The abs now properly encloses the full delta expression (cost_before - cost_after) + debug_delta, ensuring accurate cost mismatch detection with relative tolerance.

cpp/src/routing/local_search/two_opt.cu (1)

460-463: LGTM! Assertion corrected for two-opt cost validation.

The parentheses now ensure abs wraps the complete delta (cost_before - cost_after) + debug_delta for accurate relative tolerance checking.

cpp/src/routing/local_search/local_search.cu (1)

128-131: LGTM! Assertion corrected for cross-search cost validation.

The abs now properly wraps the entire delta expression, ensuring accurate cost mismatch detection with relative tolerance. This completes the consistent assertion fix pattern across all local_search components.

cpp/src/linear_programming/utilities/cython_solve.cu (1)

309-309: LGTM: Removing redundant std::move.

The removal of explicit std::move is correct. Since call_solve() returns a std::unique_ptr by value (a prvalue), the return value is automatically moved. The explicit std::move was redundant and could potentially inhibit copy elision.

However, note that this change does not address the critical data race issue flagged in the previous review (see next comment).

cpp/src/dual_simplex/barrier.cu (1)

1449-1449: LGTM! Wrapper function usage is correct.

The calls to pairwise_multiply and axpy properly replace the previous thrust::transform operations. The arguments are correctly ordered, types match the overload resolution, and stream handling is appropriate.

Also applies to: 1459-1459

cpp/src/mip/presolve/third_party_presolve.cpp (1)

17-25: Conditional compilation guards correctly enable Clang compatibility.

The #if !defined(__clang__) guards properly isolate GCC-specific diagnostic pragmas. The __clang__ macro correctly identifies the Clang compiler, and the push/pop pragma pair is consistently guarded on both sides (lines 17-20 and 23-25), ensuring proper diagnostic state management. The pattern is identically used in gf2_presolve.hpp, confirming consistency across the codebase. Suppressing -Wstringop-overflow for Papilo includes is appropriate since this GCC warning can produce false positives with third-party libraries.

Comment on lines +47 to +87
// non-template wrappers to work around clang compiler bug
[[maybe_unused]] static void pairwise_multiply(
float* a, float* b, float* out, int size, rmm::cuda_stream_view stream)
{
cub::DeviceTransform::Transform(
cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream);
}

[[maybe_unused]] static void pairwise_multiply(
double* a, double* b, double* out, int size, rmm::cuda_stream_view stream)
{
cub::DeviceTransform::Transform(
cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream);
}

[[maybe_unused]] static void axpy(
float alpha, float* x, float beta, float* y, float* out, int size, rmm::cuda_stream_view stream)
{
cub::DeviceTransform::Transform(
cuda::std::make_tuple(x, y),
out,
size,
[alpha, beta] __host__ __device__(float a, float b) { return alpha * a + beta * b; },
stream);
}

[[maybe_unused]] static void axpy(double alpha,
double* x,
double beta,
double* y,
double* out,
int size,
rmm::cuda_stream_view stream)
{
cub::DeviceTransform::Transform(
cuda::std::make_tuple(x, y),
out,
size,
[alpha, beta] __host__ __device__(double a, double b) { return alpha * a + beta * b; },
stream);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add CUDA error checking and improve bug documentation.

The wrapper functions lack CUDA error checking, which violates the coding guideline requiring verification for all CUDA operations. Additionally, the comment about "clang compiler bug" is vague and provides no context for future maintainers.

As per coding guidelines, every CUDA operation must have error checking.

Apply this diff to add error checking:

 [[maybe_unused]] static void pairwise_multiply(
   float* a, float* b, float* out, int size, rmm::cuda_stream_view stream)
 {
-  cub::DeviceTransform::Transform(
+  RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
     cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream);
+    stream.synchronize());
 }
 
 [[maybe_unused]] static void pairwise_multiply(
   double* a, double* b, double* out, int size, rmm::cuda_stream_view stream)
 {
-  cub::DeviceTransform::Transform(
+  RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
     cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream);
+    stream.synchronize());
 }
 
 [[maybe_unused]] static void axpy(
   float alpha, float* x, float beta, float* y, float* out, int size, rmm::cuda_stream_view stream)
 {
-  cub::DeviceTransform::Transform(
+  RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
     cuda::std::make_tuple(x, y),
     out,
     size,
     [alpha, beta] __host__ __device__(float a, float b) { return alpha * a + beta * b; },
-    stream);
+    stream));
+  RAFT_CHECK_CUDA(stream);
 }
 
 [[maybe_unused]] static void axpy(double alpha,
                                   double* x,
                                   double beta,
                                   double* y,
                                   double* out,
                                   int size,
                                   rmm::cuda_stream_view stream)
 {
-  cub::DeviceTransform::Transform(
+  RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
     cuda::std::make_tuple(x, y),
     out,
     size,
     [alpha, beta] __host__ __device__(double a, double b) { return alpha * a + beta * b; },
-    stream);
+    stream));
+  RAFT_CHECK_CUDA(stream);
 }

Also improve the comment to document the specific issue:

-// non-template wrappers to work around clang compiler bug
+// Non-template wrappers to work around clang compiler issue with thrust::transform template
+// instantiation. These use cub::DeviceTransform directly with explicit float/double overloads
+// instead of templates to ensure proper compilation with clang when ThreadSanitizer is enabled.

Consider whether a templated version with explicit instantiation could work once the clang/TSan build is stable:

template<typename T>
[[maybe_unused]] static void pairwise_multiply(
  T* a, T* b, T* out, int size, rmm::cuda_stream_view stream)
{
  RAFT_CUDA_TRY(cub::DeviceTransform::Transform(
    cuda::std::make_tuple(a, b), out, size, cuda::std::multiplies<>{}, stream));
  RAFT_CHECK_CUDA(stream);
}

// Explicit instantiations
template void pairwise_multiply<float>(float*, float*, float*, int, rmm::cuda_stream_view);
template void pairwise_multiply<double>(double*, double*, double*, int, rmm::cuda_stream_view);

@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants