Skip to content

Conversation

@hlinsen
Copy link
Contributor

@hlinsen hlinsen commented Dec 17, 2025

This PR implements batch solve with a focus on small TSP problems <100 nodes sizes. The performance
for other variants is not tested.

  • The cuopt::host_copy wrapper always needs a stream. Hence the changes in many other files.
  • The routing cuda_graph_t class now uses cudaStreamCaptureModeThreadLocal
  • The pool allocator pool stream is deprecated in favor of the data_model_t stream given to the raft::handle_t

TODO:

  • Tune with green contexts based on input size and HW.

Summary by CodeRabbit

  • New Features

    • Added batch solving capability for routing problems, enabling efficient parallel execution of multiple problem instances.
  • Refactor

    • Improved CUDA stream management throughout the solver for better GPU operation synchronization and potential execution overlap.
    • Updated pool allocator to use single-stream architecture instead of stream pool.
  • Tests

    • Added batch TSP solver unit tests and Python integration tests to validate batch processing functionality.

✏️ Tip: You can customize this high-level summary in your review settings.

@hlinsen hlinsen requested review from a team as code owners December 17, 2025 20:33
@hlinsen hlinsen added the feature request New feature or request label Dec 17, 2025
@hlinsen hlinsen requested a review from a team as a code owner December 17, 2025 20:33
@hlinsen hlinsen added the non-breaking Introduces a non-breaking change label Dec 17, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 17, 2025

📝 Walkthrough

Walkthrough

This pull request systematically propagates CUDA stream handling throughout the codebase by updating host_copy calls to accept explicit stream parameters, refactors the pool allocator to use a single stream, and introduces a new batch solving capability for routing problems via a parallel call_batch_solve function with Python bindings.

Changes

Cohort / File(s) Change Summary
Stream propagation in linear programming
cpp/src/linear_programming/optimization_problem.cu, cpp/src/linear_programming/translate.hpp
Updated host_copy calls to accept explicit CUDA streams obtained from handle_ptr->get_stream(), replacing default stream usage across methods like get_n_integers, write_to_mps, and constraint/objective data transfers.
Stream propagation in MIP presolve
cpp/src/mip/presolve/conditional_bound_strengthening.cu, cpp/src/mip/presolve/lb_probing_cache.cu, cpp/src/mip/presolve/probing_cache.cu, cpp/src/mip/presolve/trivial_presolve.cuh, cpp/src/mip/presolve/load_balanced_partition_helpers.cuh
Introduced explicit stream variables and propagated them through host_copy calls for variable bounds, offsets, indices, and other data transfers within presolve operations.
Stream propagation in MIP core
cpp/src/mip/diversity/lns/rins.cu, cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh, cpp/src/mip/local_search/local_search.cu, cpp/src/mip/problem/problem.cu, cpp/src/mip/solution/solution.cu, cpp/src/mip/solve.cu, cpp/src/mip/solver.cu
Updated host_copy calls across diverse operations (RINS, bound propagation, solution copies, problem data transfers) to use explicit streams instead of defaults.
Stream propagation in routing adapters and utilities
cpp/src/routing/adapters/adapted_sol.cuh, cpp/src/routing/adapters/assignment_adapter.cuh, cpp/src/routing/adapters/solution_adapter.cuh, cpp/src/routing/assignment.cu, cpp/src/routing/solution/solution.cu
Propagated explicit stream handling to host_copy calls for route data, node information, truck IDs, and node type transfers in routing solution pathways.
Stream integration in routing core structures
cpp/src/routing/order_info.hpp, cpp/src/routing/fleet_info.hpp, cpp/src/routing/fleet_order_constraints.hpp, cpp/src/routing/local_search/cycle_finder/cycle.hpp, cpp/src/routing/local_search/cycle_finder/cycle_graph.hpp
Updated to_host() method signatures across multiple classes to accept rmm::cuda_stream_view stream parameter and propagated stream through internal host_copy calls.
Stream updates in routing crossovers and cycle finding
cpp/src/routing/crossovers/ox_graph.hpp, cpp/src/routing/crossovers/ox_recombiner.cuh, cpp/src/routing/local_search/cycle_finder/cycle_finder.cu, cpp/src/routing/problem/problem.cu
Updated to_host() and helper method signatures to accept streams; propagated streams through graph conversions, adjacency transfers, and cycle data retrievals.
Pool allocator refactoring
cpp/src/routing/solution/pool_allocator.cuh
Refactored constructor to accept explicit rmm::cuda_stream_view stream_ parameter; replaced rmm::cuda_stream_pool stream_pool member with single rmm::cuda_stream_view stream member; updated solution handle construction and sync_all_streams to use single stream.
CUDA graph capture mode update
cpp/src/routing/cuda_graph.cuh
Changed cudaStreamBeginCapture mode from cudaStreamCaptureModeGlobal to cudaStreamCaptureModeThreadLocal with comment indicating support for multi-threaded batch execution.
GES solver stream initialization
cpp/src/routing/ges_solver.cu
Updated pool_allocator construction to pass additional stream handle from data_model.get_handle_ptr()->get_stream().
Batch solving implementation (C++)
cpp/include/cuopt/routing/cython/cython.hpp, cpp/src/routing/utilities/cython.cu
Added new function call_batch_solve accepting vector of data_model_view_t pointers and solver_settings_t pointer; implemented OpenMP-parallelized batch solving with collection of results into vector of unique_ptr<vehicle_routing_ret_t>.
Copy helpers update
cpp/src/utilities/copy_helpers.hpp
Removed convenience overloads for host_copy accepting device_uvector<T> references; updated extract_host_bounds to use explicit stream from handle_ptr->get_stream().
Test suite stream updates
cpp/tests/linear_programming/unit_tests/solver_settings_test.cu, cpp/tests/linear_programming/utilities/pdlp_test_utilities.cuh, cpp/tests/mip/elim_var_remap_test.cu, cpp/tests/mip/load_balancing_test.cu, cpp/tests/mip/multi_probe_test.cu, cpp/tests/mip/presolve_test.cu, cpp/tests/mip/problem_test.cu, cpp/tests/mip/unit_test.cu, cpp/tests/mip/mip_utils.cuh, cpp/tests/qp/unit_tests/two_variable_test.cu, cpp/tests/routing/routing_test.cuh, cpp/tests/routing/unit_tests/vehicle_order_match.cu
Updated test host_copy calls and method invocations to pass explicit CUDA streams obtained from handles or passed parameters.
New C++ batch test
cpp/tests/routing/unit_tests/batch_tsp.cu
Added new CUDA C++ test for batch TSP solving with multiple cost matrices; validates parallel batch processing with correct statuses and vehicle counts.
Batch solving Python/Cython bindings
python/cuopt/cuopt/routing/vehicle_routing.pxd, python/cuopt/cuopt/routing/vehicle_routing.py, python/cuopt/cuopt/routing/vehicle_routing_wrapper.pyx
Added Cython extern declaration for call_batch_solve; implemented BatchSolve Python function with decorator; added create_assignment_from_vr_ret helper for result conversion; extended imports for batch vector operations.
Routing module exports
python/cuopt/cuopt/routing/__init__.py
Added BatchSolve to public imports alongside existing DataModel, Solve, and SolverSettings.
New Python batch test
python/cuopt/cuopt/tests/routing/test_batch_solve.py
Added Python unit test validating batch TSP solving across varying problem sizes; includes helper for symmetric cost matrix construction.
CMake test configuration
cpp/tests/routing/CMakeLists.txt
Added batch_tsp.cu to ROUTING_UNIT_TEST set.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Breadth of changes: 40+ files modified across multiple subsystems (LP, MIP, Routing, Python bindings, tests)
  • Mixed complexity: While most individual changes are homogeneous stream parameter additions, significant heterogeneous changes include pool allocator refactoring, new batch solving functionality with OpenMP parallelization, and Cython-Python integration
  • Cross-system coordination: Changes require understanding interactions between C++/CUDA kernel operations, stream management semantics, pool allocator lifetime, and Python binding layers
  • New functionality: Batch solving introduces new logic patterns that don't simply replicate existing code, particularly around parallel execution and result aggregation

Areas requiring extra attention:

  • cpp/src/routing/solution/pool_allocator.cuh: Architectural change from stream pool to single stream; verify synchronization semantics and no resource leaks
  • cpp/src/routing/utilities/cython.cu: OpenMP parallelization logic; verify thread safety and stream ordering
  • cpp/src/routing/ges_solver.cu: Stream initialization with new pool allocator signature; ensure compatibility with multi-threaded batch execution
  • Python/Cython binding integration (vehicle_routing_wrapper.pyx): Verify C++/Python result conversion and vector lifetime management
  • CUDA capture mode change (cuda_graph.cuh): Verify ThreadLocal capture mode doesn't break existing single-threaded paths
  • Stream propagation consistency: Verify all host_copy call sites receive appropriate stream objects and no implicit default stream assumptions remain

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.19% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Implement batch solve for routing models' clearly and concisely summarizes the main feature addition described throughout the changeset: batch solving capability for routing problems.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/linear_programming/translate.hpp (1)

174-174: Use the local stream variable for consistency.

Line 174 should use the local stream variable (captured at line 119) instead of calling problem.handle_ptr->get_stream() again:

-  problem.handle_ptr->get_stream().synchronize();
+  stream.synchronize();

This maintains consistency with the rest of the function and avoids an unnecessary repeated getter call.

🧹 Nitpick comments (17)
cpp/src/mip/presolve/load_balanced_partition_helpers.cuh (1)

155-164: Add CUDA error checking for kernel launches.

The three kernel launches lack error checking, which violates the coding guidelines for CUDA files. Kernel failures could go undetected, making debugging difficult.

As per coding guidelines: "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification"

Consider adding error checking after each kernel launch:

 count_bin_sizes<i_t><<<blocks, BLOCK_SIZE, 0, stream>>>(
   bin_count.data(), offsets, vertex_begin, vertex_end, active_bitmap);
+CUDA_CHECK(cudaGetLastError());

 exclusive_scan<<<1, 1, 0, stream>>>(bin_count.data(), bin_count_offsets.data());
+CUDA_CHECK(cudaGetLastError());

 i_t vertex_count = bin_count.back_element(stream);
 reorg_vertices.resize(vertex_count, stream);

 create_vertex_bins<i_t><<<blocks, BLOCK_SIZE, 0, stream>>>(
   reorg_vertices.data(), bin_count.data(), offsets, vertex_begin, vertex_end, active_bitmap);
+CUDA_CHECK(cudaGetLastError());
cpp/src/linear_programming/translate.hpp (1)

20-109: Consider capturing the stream once for consistency and clarity.

For consistency with translate_to_crossover_problem and to avoid repeated calls to handle_ptr->get_stream(), consider capturing the stream at the beginning of this function:

 template <typename i_t, typename f_t>
 static dual_simplex::user_problem_t<i_t, f_t> cuopt_problem_to_simplex_problem(
   raft::handle_t const* handle_ptr, detail::problem_t<i_t, f_t>& model)
 {
   dual_simplex::user_problem_t<i_t, f_t> user_problem(handle_ptr);
+  auto stream = handle_ptr->get_stream();
 
   int m                  = model.n_constraints;
   int n                  = model.n_variables;
   int nz                 = model.nnz;
   user_problem.num_rows  = m;
   user_problem.num_cols  = n;
-  user_problem.objective = cuopt::host_copy(model.objective_coefficients, handle_ptr->get_stream());
+  user_problem.objective = cuopt::host_copy(model.objective_coefficients, stream);

And apply the same pattern to the remaining host_copy calls in this function.

cpp/include/cuopt/routing/cython/cython.hpp (1)

86-88: Consider adding documentation for the new public API.

As per coding guidelines for public headers under cpp/include/cuopt/**/*, new public functions should have Doxygen-style documentation describing parameters, return values, thread-safety guarantees, and GPU requirements. This is especially important since batch solving involves parallel execution.

+/**
+ * @brief Wrapper for batch vehicle_routing to expose the API to cython.
+ *
+ * @param data_models Vector of pointers to data model views (ownership not transferred)
+ * @param settings Pointer to solver settings (shared across all solves)
+ * @return Vector of unique_ptr to vehicle_routing_ret_t, one per input data model
+ *
+ * @note Thread-safe: uses OpenMP for parallel execution
+ * @note GPU requirements: each data model must have a valid raft::handle_t with an associated CUDA stream
+ */
 std::vector<std::unique_ptr<vehicle_routing_ret_t>> call_batch_solve(
   std::vector<routing::data_model_view_t<int, float>*>, routing::solver_settings_t<int, float>*);
cpp/src/routing/utilities/cython.cu (3)

15-16: Remove unused includes.

<chrono> is included but not used in this file. Consider removing it to keep the codebase clean.

 #include <omp.h>
-#include <chrono>

107-115: Clean up commented-out code and document stream expectations.

The commented-out stream pool code is confusing. Either:

  1. Remove it if each data_model is expected to have its own dedicated stream/handle, or
  2. Implement proper stream management if there's a concern about stream contention

Per coding guidelines, concurrent CUDA operations should explicitly manage dedicated streams. Please clarify whether each data model's handle provides a unique stream for parallel execution.

   // Use OpenMP for parallel execution
   const int max_thread = std::min(static_cast<int>(size), omp_get_max_threads());
-  // rmm::cuda_stream_pool stream_pool(data_models.size());

 #pragma omp parallel for num_threads(max_thread)
   for (std::size_t i = 0; i < size; ++i) {
-    // data_models[i]->get_handle_ptr()->sync_stream();
-    // raft::resource::set_cuda_stream(*(data_models[i]->get_handle_ptr()),
-    // stream_pool.get_stream(i));
     auto routing_solution = cuopt::routing::solve(*data_models[i], *settings);

93-103: Consider adding NVTX range for profiling.

Since raft/common/nvtx.hpp is included, consider adding an NVTX range to the batch solve function to aid GPU profiling and observability.

 std::vector<std::unique_ptr<vehicle_routing_ret_t>> call_batch_solve(
   std::vector<routing::data_model_view_t<int, float>*> data_models,
   routing::solver_settings_t<int, float>* settings)
 {
+  raft::common::nvtx::range fun_scope("call_batch_solve");
   const std::size_t size = data_models.size();
   std::vector<std::unique_ptr<vehicle_routing_ret_t>> list(size);
cpp/src/mip/local_search/local_search.cu (1)

133-133: Consider replacing cudaDeviceSynchronize() with stream-specific synchronization.

The cudaDeviceSynchronize() at line 133 blocks all CUDA streams on the device, which can harm GPU pipeline throughput. Since you're already using explicit streams, consider replacing it with context.problem_ptr->handle_ptr->sync_stream() to synchronize only the relevant stream.

Based on learnings, eliminating unnecessary host-device synchronization in hot paths is important for GPU performance.

-  cudaDeviceSynchronize();
+  context.problem_ptr->handle_ptr->sync_stream();
cpp/tests/routing/unit_tests/batch_tsp.cu (1)

46-85: Consider adding edge case tests for batch solving.

To ensure robustness, add tests for edge cases such as:

  • Single-node TSP (trivial solution)
  • Two-node TSP (simple round trip)
  • Varying the number of problems in the batch (e.g., 1 problem, max parallelism)

As per coding guidelines, "Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling."

python/cuopt/cuopt/routing/vehicle_routing.py (1)

1545-1583: Consider documenting parallel execution behavior.

The docstring mentions "parallel execution" but doesn't specify the parallelization mechanism (e.g., OpenMP, number of threads, CPU-bound vs GPU-bound).

Consider enhancing the docstring to clarify:

     """
     Solves multiple routing problems in batch mode using parallel execution.
+    
+    Note: The batch solver uses CPU parallelization to solve multiple problems
+    concurrently. Each problem instance runs on its own GPU stream to maximize
+    GPU utilization.
 
     Parameters
python/cuopt/cuopt/routing/vehicle_routing.pxd (1)

12-13: Confirm Cython handling of vector[unique_ptr[...]] ownership

The call_batch_solve extern returning vector[unique_ptr[vehicle_routing_ret_t]] is fine at the pxd level, but the .pyx wrapper must move each unique_ptr out before the temporary vector is destroyed to avoid dangling pointers or double frees. Please confirm the wrapper iterates the vector by value/move and never stores raw vehicle_routing_ret_t* beyond the lifetime of the returned container.

Also applies to: 132-141

cpp/src/routing/problem/problem.cu (1)

58-66: Routing problem host copies correctly wired to the handle stream

All new cuopt::host_copy and to_host calls now use handle_ptr->get_stream() (or a local alias), which aligns host-side structures (vehicle_types_h, fleet_info_h, order/depot/break metadata) with the same stream the device data is updated on. Behavior stays the same but with explicit stream bookkeeping, matching the updated host_copy API.

Also applies to: 103-108, 375-394, 399-404, 523-553, 578-579, 663-665

cpp/src/mip/problem/problem.cu (1)

1348-1354: Consistent use of the problem stream for MIP host-side views

All new cuopt::host_copy sites (variable_map, CSR triplets, bounds, variable types, objective coefficients) now use pb.handle_ptr->get_stream() / handle_ptr->get_stream(), matching the updated host_copy signature and ensuring host views are synchronized with the same stream that fills the device buffers. The surrounding logic (problem fixing, preprocessing, dual-simplex export) remains unchanged.

Also applies to: 1525-1529, 1546-1551, 1692-1709, 1787-1798

python/cuopt/cuopt/routing/vehicle_routing_wrapper.pyx (4)

842-916: Consider refactoring Solve to use create_assignment_from_vr_ret.

The new helper function duplicates ~70 lines of code from the existing Solve function (lines 765-839). The Solve function should be refactored to use this helper to eliminate duplication:

 def Solve(DataModel data_model, SolverSettings solver_settings):
     cdef data_model_view_t[int, float]* c_data_model_view = (
         data_model.c_data_model_view.get()
     )
     cdef solver_settings_t[int, float]* c_solver_settings = (
         solver_settings.c_solver_settings.get()
     )

     vr_ret_ptr = move(call_solve(
         c_data_model_view,
         c_solver_settings
     ))

-    vr_ret = move(vr_ret_ptr.get()[0])
-    vehicle_count = vr_ret.vehicle_count_
-    # ... ~70 lines of extraction code ...
-    return Assignment(...)
+    return create_assignment_from_vr_ret(vr_ret_ptr.get()[0])

This would also ensure consistency between single and batch solve result handling.


888-896: Extract get_type_from_int to module level.

This function is duplicated in both Solve (lines 811-819) and create_assignment_from_vr_ret. Consider defining it once at module level:

def _get_type_from_int(type_in_int):
    if type_in_int == int(NodeType.DEPOT):
        return "Depot"
    elif type_in_int == int(NodeType.PICKUP):
        return "Pickup"
    elif type_in_int == int(NodeType.DELIVERY):
        return "Delivery"
    elif type_in_int == int(NodeType.BREAK):
        return "Break"

948-954: Simplify: remove redundant variable.

The intermediate move from batch_solve_result to c_solutions is unnecessary:

     cdef vector[unique_ptr[vehicle_routing_ret_t]] batch_solve_result = (
         move(call_batch_solve(data_model_views, c_solver_settings))
     )

-    cdef vector[unique_ptr[vehicle_routing_ret_t]] c_solutions = (
-        move(batch_solve_result)
-    )
-
     solutions = []
-    for i in range(c_solutions.size()):
+    for i in range(batch_solve_result.size()):
         solutions.append(
-            create_assignment_from_vr_ret(c_solutions[i].get()[0])
+            create_assignment_from_vr_ret(batch_solve_result[i].get()[0])
         )

943-946: Consider adding input validation.

The function could benefit from basic validation to provide clearer error messages:

if not py_data_model_list:
    return []

This is optional since an empty input will simply return an empty list, but explicit handling may be clearer.

cpp/src/routing/solution/pool_allocator.cuh (1)

85-91: Consider updating the comment about destruction order.

The comment on lines 85-86 about keeping a member "above other members, so that it is destructed the last" was relevant for stream_pool (which owned streams). Since rmm::cuda_stream_view is non-owning, the destruction order concern no longer applies. Consider removing or updating this comment.

-  // rmm::cuda_stream_pool is non-movable as it contains an atomic variables
-  // KEEP THIS MEMBER ABOVE OTHER MEMBERS, so that it is destructed the last
-  // rmm::cuda_stream_pool stream_pool;
-
-  // problem description
   rmm::cuda_stream_view stream;
+  // problem description
   const Problem& problem;

Comment on lines +216 to 217
probing_config.probing_values = host_copy(probing_values, offspring.handle_ptr->get_stream());
constraint_prop.apply_round(offspring, lp_run_time_after_feasible, timer, probing_config);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add synchronization before apply_round in infeasible branch.

In the infeasible branch (lines 212–217), host_copy is asynchronous on the offspring's stream (line 216), but apply_round is called immediately on line 217 without a stream sync. If apply_round reads the host-side probing_config.probing_values buffer before the copy completes, this creates a race condition.

Add offspring.handle_ptr->sync_stream() after line 216 to ensure the host copy completes before apply_round consumes the buffer:

      probing_config.probing_values = host_copy(probing_values, offspring.handle_ptr->get_stream());
+     offspring.handle_ptr->sync_stream();
      constraint_prop.apply_round(offspring, lp_run_time_after_feasible, timer, probing_config);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
probing_config.probing_values = host_copy(probing_values, offspring.handle_ptr->get_stream());
constraint_prop.apply_round(offspring, lp_run_time_after_feasible, timer, probing_config);
probing_config.probing_values = host_copy(probing_values, offspring.handle_ptr->get_stream());
offspring.handle_ptr->sync_stream();
constraint_prop.apply_round(offspring, lp_run_time_after_feasible, timer, probing_config);
🤖 Prompt for AI Agents
In cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh around lines
216–217, the host_copy to probing_config.probing_values is asynchronous on the
offspring stream and apply_round is called immediately, creating a race where
apply_round may read the buffer before the copy completes; add a synchronization
call offspring.handle_ptr->sync_stream() immediately after the host_copy (after
line 216) to wait for the copy to finish before calling
constraint_prop.apply_round.

Comment on lines +46 to +85
TEST(batch_tsp, varying_sizes)
{
std::vector<i_t> tsp_sizes = {5, 8, 10, 6, 7, 9};
const i_t n_problems = static_cast<i_t>(tsp_sizes.size());

// Create handles and cost matrices for each problem
std::vector<std::unique_ptr<raft::handle_t>> handles;
std::vector<rmm::device_uvector<f_t>> cost_matrices_d;
std::vector<std::unique_ptr<cuopt::routing::data_model_view_t<i_t, f_t>>> data_models;
std::vector<cuopt::routing::data_model_view_t<i_t, f_t>*> data_model_ptrs;

for (i_t i = 0; i < n_problems; ++i) {
handles.push_back(std::make_unique<raft::handle_t>());
auto& handle = *handles.back();

auto cost_matrix_h = create_small_tsp_cost_matrix(tsp_sizes[i]);
cost_matrices_d.push_back(cuopt::device_copy(cost_matrix_h, handle.get_stream()));

data_models.push_back(std::make_unique<cuopt::routing::data_model_view_t<i_t, f_t>>(
&handle, tsp_sizes[i], 1, tsp_sizes[i]));
data_models.back()->add_cost_matrix(cost_matrices_d.back().data());
data_model_ptrs.push_back(data_models.back().get());
}

// Configure solver settings
cuopt::routing::solver_settings_t<i_t, f_t> settings;
settings.set_time_limit(5);

// Call batch solve
auto solutions = cuopt::cython::call_batch_solve(data_model_ptrs, &settings);

// Verify all solutions
ASSERT_EQ(solutions.size(), n_problems);
for (i_t i = 0; i < n_problems; ++i) {
EXPECT_EQ(solutions[i]->status_, cuopt::routing::solution_status_t::SUCCESS)
<< "TSP " << i << " (size " << tsp_sizes[i] << ") failed";
EXPECT_EQ(solutions[i]->vehicle_count_, 1)
<< "TSP " << i << " (size " << tsp_sizes[i] << ") used multiple vehicles";
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add numerical correctness validation to the batch TSP test.

The test currently only verifies that solutions have SUCCESS status and use one vehicle. According to coding guidelines, tests should validate numerical correctness of optimization results, not just that they run without error.

Recommendations:

  1. Verify objective values are reasonable (e.g., for a line topology, the optimal tour length is predictable: 2*(n-1) for the coordinate system used)
  2. Check that routes actually visit all locations
  3. Add edge cases: single-node TSP, two-node TSP

As per coding guidelines, "Write tests validating numerical correctness of optimization results (not just 'runs without error')."

Add validation after line 84:

// Verify numerical correctness
for (i_t i = 0; i < n_problems; ++i) {
  // For TSP on a line with coordinates 0,1,2,...,n-1
  // Optimal tour: 0 -> 1 -> 2 -> ... -> n-1 -> 0
  // Total distance: 2 * (n-1)
  f_t expected_objective = 2.0f * static_cast<f_t>(tsp_sizes[i] - 1);
  f_t tolerance = 0.01f;
  
  EXPECT_NEAR(solutions[i]->total_objective_, expected_objective, tolerance)
    << "TSP " << i << " (size " << tsp_sizes[i] 
    << ") has unexpected objective value";
}

Comment on lines +919 to +936
def BatchSolve(py_data_model_list, SolverSettings solver_settings):
"""
Solve multiple routing problems in batch mode using parallel execution.
Parameters
----------
py_data_model_list : list of DataModel
List of data model objects representing routing problems to solve.
solver_settings : SolverSettings
Solver settings to use for all problems.
Returns
-------
tuple
A tuple containing:
- list of Assignment: Solutions for each routing problem
- float: Total solve time in seconds
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring inconsistency: return type mismatch.

The docstring states the function returns a tuple containing (list of Assignment, float: Total solve time in seconds), but the implementation only returns the list of solutions (line 962). Either update the docstring or add the solve time to the return value.

     Returns
     -------
-    tuple
-        A tuple containing:
-        - list of Assignment: Solutions for each routing problem
-        - float: Total solve time in seconds
+    list of Assignment
+        Solutions for each routing problem.
     """
🤖 Prompt for AI Agents
In python/cuopt/cuopt/routing/vehicle_routing_wrapper.pyx around lines 919 to
936, the docstring claims BatchSolve returns a tuple (list of Assignment, float
total solve time) but the implementation only returns the list of solutions;
update the implementation to match the docstring by capturing the total solve
time (e.g., sum or measured elapsed time across the batch) and return a tuple
(solutions_list, total_solve_time). Ensure the function's return value and any
type annotations reflect the tuple so callers receive both the solutions and the
total time.

Comment on lines +1545 to +1583
@catch_cuopt_exception
def BatchSolve(data_model_list, solver_settings=None):
"""
Solves multiple routing problems in batch mode using parallel execution.
Parameters
----------
data_model_list: list of DataModel
List of data model objects representing routing problems to solve.
solver_settings: SolverSettings
Settings to configure solver configurations.
By default, it uses default solver settings to solve.
Returns
-------
tuple
A tuple containing:
- list of Assignment: Solutions for each routing problem
- float: Total solve time in seconds
Examples
--------
>>> from cuopt import routing
>>> import cudf
>>> # Create multiple data models
>>> data_models = []
>>> for i in range(5):
... cost_matrix = cudf.DataFrame([[0, 1, 2], [1, 0, 3], [2, 3, 0]])
... dm = routing.DataModel(3, 1)
... dm.add_cost_matrix(cost_matrix)
... data_models.append(dm)
>>> settings = routing.SolverSettings()
>>> settings.set_time_limit(1.0)
>>> solutions, solve_time = routing.BatchSolve(data_models, settings)
"""
if solver_settings is None:
solver_settings = SolverSettings()

return vehicle_routing_wrapper.BatchSolve(data_model_list, solver_settings)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add input validation for data_model_list parameter.

The BatchSolve function should validate the input data_model_list to prevent runtime errors and provide clear user feedback.

Add validation at the beginning of the function:

 @catch_cuopt_exception
 def BatchSolve(data_model_list, solver_settings=None):
     """
     Solves multiple routing problems in batch mode using parallel execution.
 
     Parameters
     ----------
     data_model_list: list of DataModel
         List of data model objects representing routing problems to solve.
     solver_settings: SolverSettings
         Settings to configure solver configurations.
         By default, it uses default solver settings to solve.
 
     Returns
     -------
     tuple
         A tuple containing:
         - list of Assignment: Solutions for each routing problem
         - float: Total solve time in seconds
 
     Examples
     --------
     >>> from cuopt import routing
     >>> import cudf
     >>> # Create multiple data models
     >>> data_models = []
     >>> for i in range(5):
     ...     cost_matrix = cudf.DataFrame([[0, 1, 2], [1, 0, 3], [2, 3, 0]])
     ...     dm = routing.DataModel(3, 1)
     ...     dm.add_cost_matrix(cost_matrix)
     ...     data_models.append(dm)
     >>> settings = routing.SolverSettings()
     >>> settings.set_time_limit(1.0)
     >>> solutions, solve_time = routing.BatchSolve(data_models, settings)
     """
+    if not isinstance(data_model_list, list):
+        raise ValueError("data_model_list must be a list of DataModel objects")
+    if len(data_model_list) == 0:
+        raise ValueError("data_model_list cannot be empty")
+    if not all(isinstance(dm, DataModel) for dm in data_model_list):
+        raise ValueError("All elements in data_model_list must be DataModel instances")
+    
     if solver_settings is None:
         solver_settings = SolverSettings()
 
     return vehicle_routing_wrapper.BatchSolve(data_model_list, solver_settings)
🤖 Prompt for AI Agents
In python/cuopt/cuopt/routing/vehicle_routing.py around lines 1545 to 1583, add
input validation at the top of BatchSolve: ensure data_model_list is not None,
is a list (or other iterable you support), is non-empty, and that every item is
an instance of the expected DataModel class (import or reference the existing
DataModel), raising a TypeError if the type is wrong and a ValueError if the
list is empty; also validate solver_settings when provided is a SolverSettings
instance and raise TypeError if not. Perform these checks before creating
default solver_settings and before calling vehicle_routing_wrapper.BatchSolve so
callers receive immediate, clear exceptions for bad inputs.

Comment on lines +19 to +42
def test_batch_solve_varying_sizes():
"""Test batch solving TSPs of varying sizes."""
tsp_sizes = [
5,
8,
10,
6,
7,
9,
12,
15,
11,
4,
13,
14,
8,
6,
10,
9,
7,
11,
5,
12,
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# First, find the test file and vehicle_routing.py
find . -name "test_batch_solve.py" -o -name "vehicle_routing.py" | head -20

Repository: NVIDIA/cuopt

Length of output: 158


🏁 Script executed:

#!/bin/bash

# Search for files related to BatchSolve and routing
fd -t f "test_batch_solve" 2>/dev/null || echo "test_batch_solve not found via fd"
fd -t f "vehicle_routing" 2>/dev/null | head -10 || echo "vehicle_routing not found via fd"

# Also try git ls-files
git ls-files | grep -E "(test_batch_solve|vehicle_routing)" | head -20

Repository: NVIDIA/cuopt

Length of output: 571


🏁 Script executed:

#!/bin/bash

# List Python files in routing test directory
git ls-files "python/**/routing/**/*.py" 2>/dev/null | head -30

Repository: NVIDIA/cuopt

Length of output: 104


🏁 Script executed:

#!/bin/bash

# Search for BatchSolve in Python files
rg "BatchSolve" --type py -l 2>/dev/null | head -20

Repository: NVIDIA/cuopt

Length of output: 485


🏁 Script executed:

#!/bin/bash

# Search for references to BatchSolve and its return type
rg "class BatchSolve|def BatchSolve" --type py -A 10 2>/dev/null | head -50

Repository: NVIDIA/cuopt

Length of output: 1955


🏁 Script executed:

#!/bin/bash

# Read the complete BatchSolve function in vehicle_routing.py
sed -n '/^def BatchSolve/,/^def /p' python/cuopt/cuopt/routing/vehicle_routing.py | head -100

Repository: NVIDIA/cuopt

Length of output: 1350


🏁 Script executed:

#!/bin/bash

# Read test_batch_solve.py lines 19-67 to see both test sections mentioned in the review
sed -n '1,70p' python/cuopt/cuopt/tests/routing/test_batch_solve.py

Repository: NVIDIA/cuopt

Length of output: 1728


🏁 Script executed:

#!/bin/bash

# Get more context by reading a larger section of vehicle_routing.py around BatchSolve
cat -n python/cuopt/cuopt/routing/vehicle_routing.py | grep -A 50 "def BatchSolve"

Repository: NVIDIA/cuopt

Length of output: 1616


Fix BatchSolve return type handling in test

The test treats routing.BatchSolve as returning a plain list of assignments, but the docstring and example both document it as returning a (solutions, solve_time) tuple. This causes the length assertion and solution iteration to fail at runtime.

Update line 54 to unpack both values:

solutions, solve_time = routing.BatchSolve(data_models, settings)

Apply the same fix to the second test section (lines 52–67).

🤖 Prompt for AI Agents
In python/cuopt/cuopt/tests/routing/test_batch_solve.py around lines 52–67, the
test treats routing.BatchSolve as returning a plain list but BatchSolve actually
returns a (solutions, solve_time) tuple; update both calls to unpack the tuple
(e.g., solutions, solve_time = routing.BatchSolve(data_models, settings)), then
use the solutions variable for the length assertion and iteration over returned
solutions in both test sections.

@anandhkb anandhkb added this to the 26.02 milestone Dec 22, 2025
@github-actions
Copy link

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants