Implement batch solve for routing models #728

hlinsen · 2025-12-17T20:33:23Z

This PR implements batch solve with a focus on small TSP problems <100 nodes sizes. The performance
for other variants is not tested.

The cuopt::host_copy wrapper always needs a stream. Hence the changes in many other files.
The routing cuda_graph_t class now uses cudaStreamCaptureModeThreadLocal
The pool allocator pool stream is deprecated in favor of the data_model_t stream given to the raft::handle_t

TODO:

Tune with green contexts based on input size and HW.

Summary by CodeRabbit

New Features
- Added batch solving capability for routing problems, enabling efficient parallel execution of multiple problem instances.
Refactor
- Improved CUDA stream management throughout the solver for better GPU operation synchronization and potential execution overlap.
- Updated pool allocator to use single-stream architecture instead of stream pool.
Tests
- Added batch TSP solver unit tests and Python integration tests to validate batch processing functionality.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

… batch mode is on

coderabbitai · 2025-12-17T20:40:10Z

📝 Walkthrough

Walkthrough

This pull request systematically propagates CUDA stream handling throughout the codebase by updating host_copy calls to accept explicit stream parameters, refactors the pool allocator to use a single stream, and introduces a new batch solving capability for routing problems via a parallel call_batch_solve function with Python bindings.

Changes

Cohort / File(s)	Change Summary
Stream propagation in linear programming `cpp/src/linear_programming/optimization_problem.cu`, `cpp/src/linear_programming/translate.hpp`	Updated `host_copy` calls to accept explicit CUDA streams obtained from `handle_ptr->get_stream()`, replacing default stream usage across methods like `get_n_integers`, `write_to_mps`, and constraint/objective data transfers.
Stream propagation in MIP presolve `cpp/src/mip/presolve/conditional_bound_strengthening.cu`, `cpp/src/mip/presolve/lb_probing_cache.cu`, `cpp/src/mip/presolve/probing_cache.cu`, `cpp/src/mip/presolve/trivial_presolve.cuh`, `cpp/src/mip/presolve/load_balanced_partition_helpers.cuh`	Introduced explicit stream variables and propagated them through `host_copy` calls for variable bounds, offsets, indices, and other data transfers within presolve operations.
Stream propagation in MIP core `cpp/src/mip/diversity/lns/rins.cu`, `cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh`, `cpp/src/mip/local_search/local_search.cu`, `cpp/src/mip/problem/problem.cu`, `cpp/src/mip/solution/solution.cu`, `cpp/src/mip/solve.cu`, `cpp/src/mip/solver.cu`	Updated `host_copy` calls across diverse operations (RINS, bound propagation, solution copies, problem data transfers) to use explicit streams instead of defaults.
Stream propagation in routing adapters and utilities `cpp/src/routing/adapters/adapted_sol.cuh`, `cpp/src/routing/adapters/assignment_adapter.cuh`, `cpp/src/routing/adapters/solution_adapter.cuh`, `cpp/src/routing/assignment.cu`, `cpp/src/routing/solution/solution.cu`	Propagated explicit stream handling to `host_copy` calls for route data, node information, truck IDs, and node type transfers in routing solution pathways.
Stream integration in routing core structures `cpp/src/routing/order_info.hpp`, `cpp/src/routing/fleet_info.hpp`, `cpp/src/routing/fleet_order_constraints.hpp`, `cpp/src/routing/local_search/cycle_finder/cycle.hpp`, `cpp/src/routing/local_search/cycle_finder/cycle_graph.hpp`	Updated `to_host()` method signatures across multiple classes to accept `rmm::cuda_stream_view stream` parameter and propagated stream through internal `host_copy` calls.
Stream updates in routing crossovers and cycle finding `cpp/src/routing/crossovers/ox_graph.hpp`, `cpp/src/routing/crossovers/ox_recombiner.cuh`, `cpp/src/routing/local_search/cycle_finder/cycle_finder.cu`, `cpp/src/routing/problem/problem.cu`	Updated `to_host()` and helper method signatures to accept streams; propagated streams through graph conversions, adjacency transfers, and cycle data retrievals.
Pool allocator refactoring `cpp/src/routing/solution/pool_allocator.cuh`	Refactored constructor to accept explicit `rmm::cuda_stream_view stream_` parameter; replaced `rmm::cuda_stream_pool stream_pool` member with single `rmm::cuda_stream_view stream` member; updated solution handle construction and `sync_all_streams` to use single stream.
CUDA graph capture mode update `cpp/src/routing/cuda_graph.cuh`	Changed `cudaStreamBeginCapture` mode from `cudaStreamCaptureModeGlobal` to `cudaStreamCaptureModeThreadLocal` with comment indicating support for multi-threaded batch execution.
GES solver stream initialization `cpp/src/routing/ges_solver.cu`	Updated `pool_allocator` construction to pass additional stream handle from `data_model.get_handle_ptr()->get_stream()`.
Batch solving implementation (C++) `cpp/include/cuopt/routing/cython/cython.hpp`, `cpp/src/routing/utilities/cython.cu`	Added new function `call_batch_solve` accepting vector of `data_model_view_t` pointers and `solver_settings_t` pointer; implemented OpenMP-parallelized batch solving with collection of results into vector of `unique_ptr<vehicle_routing_ret_t>`.
Copy helpers update `cpp/src/utilities/copy_helpers.hpp`	Removed convenience overloads for `host_copy` accepting `device_uvector<T>` references; updated `extract_host_bounds` to use explicit stream from `handle_ptr->get_stream()`.
Test suite stream updates `cpp/tests/linear_programming/unit_tests/solver_settings_test.cu`, `cpp/tests/linear_programming/utilities/pdlp_test_utilities.cuh`, `cpp/tests/mip/elim_var_remap_test.cu`, `cpp/tests/mip/load_balancing_test.cu`, `cpp/tests/mip/multi_probe_test.cu`, `cpp/tests/mip/presolve_test.cu`, `cpp/tests/mip/problem_test.cu`, `cpp/tests/mip/unit_test.cu`, `cpp/tests/mip/mip_utils.cuh`, `cpp/tests/qp/unit_tests/two_variable_test.cu`, `cpp/tests/routing/routing_test.cuh`, `cpp/tests/routing/unit_tests/vehicle_order_match.cu`	Updated test `host_copy` calls and method invocations to pass explicit CUDA streams obtained from handles or passed parameters.
New C++ batch test `cpp/tests/routing/unit_tests/batch_tsp.cu`	Added new CUDA C++ test for batch TSP solving with multiple cost matrices; validates parallel batch processing with correct statuses and vehicle counts.
Batch solving Python/Cython bindings `python/cuopt/cuopt/routing/vehicle_routing.pxd`, `python/cuopt/cuopt/routing/vehicle_routing.py`, `python/cuopt/cuopt/routing/vehicle_routing_wrapper.pyx`	Added Cython extern declaration for `call_batch_solve`; implemented `BatchSolve` Python function with decorator; added `create_assignment_from_vr_ret` helper for result conversion; extended imports for batch vector operations.
Routing module exports `python/cuopt/cuopt/routing/__init__.py`	Added `BatchSolve` to public imports alongside existing `DataModel`, `Solve`, and `SolverSettings`.
New Python batch test `python/cuopt/cuopt/tests/routing/test_batch_solve.py`	Added Python unit test validating batch TSP solving across varying problem sizes; includes helper for symmetric cost matrix construction.
CMake test configuration `cpp/tests/routing/CMakeLists.txt`	Added `batch_tsp.cu` to `ROUTING_UNIT_TEST` set.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Breadth of changes: 40+ files modified across multiple subsystems (LP, MIP, Routing, Python bindings, tests)
Mixed complexity: While most individual changes are homogeneous stream parameter additions, significant heterogeneous changes include pool allocator refactoring, new batch solving functionality with OpenMP parallelization, and Cython-Python integration
Cross-system coordination: Changes require understanding interactions between C++/CUDA kernel operations, stream management semantics, pool allocator lifetime, and Python binding layers
New functionality: Batch solving introduces new logic patterns that don't simply replicate existing code, particularly around parallel execution and result aggregation

Areas requiring extra attention:

cpp/src/routing/solution/pool_allocator.cuh: Architectural change from stream pool to single stream; verify synchronization semantics and no resource leaks
cpp/src/routing/utilities/cython.cu: OpenMP parallelization logic; verify thread safety and stream ordering
cpp/src/routing/ges_solver.cu: Stream initialization with new pool allocator signature; ensure compatibility with multi-threaded batch execution
Python/Cython binding integration (vehicle_routing_wrapper.pyx): Verify C++/Python result conversion and vector lifetime management
CUDA capture mode change (cuda_graph.cuh): Verify ThreadLocal capture mode doesn't break existing single-threaded paths
Stream propagation consistency: Verify all host_copy call sites receive appropriate stream objects and no implicit default stream assumptions remain

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.19% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Implement batch solve for routing models' clearly and concisely summarizes the main feature addition described throughout the changeset: batch solving capability for routing problems.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/linear_programming/translate.hpp (1)
174-174: Use the local stream variable for consistency.

Line 174 should use the local stream variable (captured at line 119) instead of calling problem.handle_ptr->get_stream() again:
-  problem.handle_ptr->get_stream().synchronize();
+  stream.synchronize();
This maintains consistency with the rest of the function and avoids an unnecessary repeated getter call.

🧹 Nitpick comments (17)

cpp/src/mip/presolve/load_balanced_partition_helpers.cuh (1)
155-164: Add CUDA error checking for kernel launches.

The three kernel launches lack error checking, which violates the coding guidelines for CUDA files. Kernel failures could go undetected, making debugging difficult.

As per coding guidelines: "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification"

Consider adding error checking after each kernel launch:
 count_bin_sizes<i_t><<<blocks, BLOCK_SIZE, 0, stream>>>(
   bin_count.data(), offsets, vertex_begin, vertex_end, active_bitmap);
+CUDA_CHECK(cudaGetLastError());

 exclusive_scan<<<1, 1, 0, stream>>>(bin_count.data(), bin_count_offsets.data());
+CUDA_CHECK(cudaGetLastError());

 i_t vertex_count = bin_count.back_element(stream);
 reorg_vertices.resize(vertex_count, stream);

 create_vertex_bins<i_t><<<blocks, BLOCK_SIZE, 0, stream>>>(
   reorg_vertices.data(), bin_count.data(), offsets, vertex_begin, vertex_end, active_bitmap);
+CUDA_CHECK(cudaGetLastError());
cpp/src/linear_programming/translate.hpp (1)
20-109: Consider capturing the stream once for consistency and clarity.

For consistency with translate_to_crossover_problem and to avoid repeated calls to handle_ptr->get_stream(), consider capturing the stream at the beginning of this function:
 template <typename i_t, typename f_t>
 static dual_simplex::user_problem_t<i_t, f_t> cuopt_problem_to_simplex_problem(
   raft::handle_t const* handle_ptr, detail::problem_t<i_t, f_t>& model)
 {
   dual_simplex::user_problem_t<i_t, f_t> user_problem(handle_ptr);
+  auto stream = handle_ptr->get_stream();
 
   int m                  = model.n_constraints;
   int n                  = model.n_variables;
   int nz                 = model.nnz;
   user_problem.num_rows  = m;
   user_problem.num_cols  = n;
-  user_problem.objective = cuopt::host_copy(model.objective_coefficients, handle_ptr->get_stream());
+  user_problem.objective = cuopt::host_copy(model.objective_coefficients, stream);
And apply the same pattern to the remaining host_copy calls in this function.
cpp/include/cuopt/routing/cython/cython.hpp (1)
86-88: Consider adding documentation for the new public API.

As per coding guidelines for public headers under cpp/include/cuopt/**/*, new public functions should have Doxygen-style documentation describing parameters, return values, thread-safety guarantees, and GPU requirements. This is especially important since batch solving involves parallel execution.
+/**
+ * @brief Wrapper for batch vehicle_routing to expose the API to cython.
+ *
+ * @param data_models Vector of pointers to data model views (ownership not transferred)
+ * @param settings Pointer to solver settings (shared across all solves)
+ * @return Vector of unique_ptr to vehicle_routing_ret_t, one per input data model
+ *
+ * @note Thread-safe: uses OpenMP for parallel execution
+ * @note GPU requirements: each data model must have a valid raft::handle_t with an associated CUDA stream
+ */
 std::vector<std::unique_ptr<vehicle_routing_ret_t>> call_batch_solve(
   std::vector<routing::data_model_view_t<int, float>*>, routing::solver_settings_t<int, float>*);
cpp/src/routing/utilities/cython.cu (3)
15-16: Remove unused includes.

<chrono> is included but not used in this file. Consider removing it to keep the codebase clean.
 #include <omp.h>
-#include <chrono>
107-115: Clean up commented-out code and document stream expectations.

The commented-out stream pool code is confusing. Either:

Remove it if each data_model is expected to have its own dedicated stream/handle, or

Implement proper stream management if there's a concern about stream contention

Per coding guidelines, concurrent CUDA operations should explicitly manage dedicated streams. Please clarify whether each data model's handle provides a unique stream for parallel execution.
   // Use OpenMP for parallel execution
   const int max_thread = std::min(static_cast<int>(size), omp_get_max_threads());
-  // rmm::cuda_stream_pool stream_pool(data_models.size());

 #pragma omp parallel for num_threads(max_thread)
   for (std::size_t i = 0; i < size; ++i) {
-    // data_models[i]->get_handle_ptr()->sync_stream();
-    // raft::resource::set_cuda_stream(*(data_models[i]->get_handle_ptr()),
-    // stream_pool.get_stream(i));
     auto routing_solution = cuopt::routing::solve(*data_models[i], *settings);
93-103: Consider adding NVTX range for profiling.

Since raft/common/nvtx.hpp is included, consider adding an NVTX range to the batch solve function to aid GPU profiling and observability.
 std::vector<std::unique_ptr<vehicle_routing_ret_t>> call_batch_solve(
   std::vector<routing::data_model_view_t<int, float>*> data_models,
   routing::solver_settings_t<int, float>* settings)
 {
+  raft::common::nvtx::range fun_scope("call_batch_solve");
   const std::size_t size = data_models.size();
   std::vector<std::unique_ptr<vehicle_routing_ret_t>> list(size);
cpp/src/mip/local_search/local_search.cu (1)
133-133: Consider replacing cudaDeviceSynchronize() with stream-specific synchronization.

The cudaDeviceSynchronize() at line 133 blocks all CUDA streams on the device, which can harm GPU pipeline throughput. Since you're already using explicit streams, consider replacing it with context.problem_ptr->handle_ptr->sync_stream() to synchronize only the relevant stream.

Based on learnings, eliminating unnecessary host-device synchronization in hot paths is important for GPU performance.
-  cudaDeviceSynchronize();
+  context.problem_ptr->handle_ptr->sync_stream();
cpp/tests/routing/unit_tests/batch_tsp.cu (1)

46-85: Consider adding edge case tests for batch solving.

To ensure robustness, add tests for edge cases such as:

Single-node TSP (trivial solution)

Two-node TSP (simple round trip)

Varying the number of problems in the batch (e.g., 1 problem, max parallelism)

As per coding guidelines, "Test with free variables, singleton problems, and extreme problem dimensions near resource limits to validate edge case handling."
python/cuopt/cuopt/routing/vehicle_routing.py (1)
1545-1583: Consider documenting parallel execution behavior.

The docstring mentions "parallel execution" but doesn't specify the parallelization mechanism (e.g., OpenMP, number of threads, CPU-bound vs GPU-bound).

Consider enhancing the docstring to clarify:
     """
     Solves multiple routing problems in batch mode using parallel execution.
+    
+    Note: The batch solver uses CPU parallelization to solve multiple problems
+    concurrently. Each problem instance runs on its own GPU stream to maximize
+    GPU utilization.
 
     Parameters
python/cuopt/cuopt/routing/vehicle_routing.pxd (1)

12-13: Confirm Cython handling of vector[unique_ptr[...]] ownership

The call_batch_solve extern returning vector[unique_ptr[vehicle_routing_ret_t]] is fine at the pxd level, but the .pyx wrapper must move each unique_ptr out before the temporary vector is destroyed to avoid dangling pointers or double frees. Please confirm the wrapper iterates the vector by value/move and never stores raw vehicle_routing_ret_t* beyond the lifetime of the returned container.

Also applies to: 132-141

cpp/src/routing/problem/problem.cu (1)

58-66: Routing problem host copies correctly wired to the handle stream

All new cuopt::host_copy and to_host calls now use handle_ptr->get_stream() (or a local alias), which aligns host-side structures (vehicle_types_h, fleet_info_h, order/depot/break metadata) with the same stream the device data is updated on. Behavior stays the same but with explicit stream bookkeeping, matching the updated host_copy API.

Also applies to: 103-108, 375-394, 399-404, 523-553, 578-579, 663-665

cpp/src/mip/problem/problem.cu (1)

1348-1354: Consistent use of the problem stream for MIP host-side views

All new cuopt::host_copy sites (variable_map, CSR triplets, bounds, variable types, objective coefficients) now use pb.handle_ptr->get_stream() / handle_ptr->get_stream(), matching the updated host_copy signature and ensuring host views are synchronized with the same stream that fills the device buffers. The surrounding logic (problem fixing, preprocessing, dual-simplex export) remains unchanged.

Also applies to: 1525-1529, 1546-1551, 1692-1709, 1787-1798
python/cuopt/cuopt/routing/vehicle_routing_wrapper.pyx (4)
842-916: Consider refactoring Solve to use create_assignment_from_vr_ret.

The new helper function duplicates ~70 lines of code from the existing Solve function (lines 765-839). The Solve function should be refactored to use this helper to eliminate duplication:
 def Solve(DataModel data_model, SolverSettings solver_settings):
     cdef data_model_view_t[int, float]* c_data_model_view = (
         data_model.c_data_model_view.get()
     )
     cdef solver_settings_t[int, float]* c_solver_settings = (
         solver_settings.c_solver_settings.get()
     )

     vr_ret_ptr = move(call_solve(
         c_data_model_view,
         c_solver_settings
     ))

-    vr_ret = move(vr_ret_ptr.get()[0])
-    vehicle_count = vr_ret.vehicle_count_
-    # ... ~70 lines of extraction code ...
-    return Assignment(...)
+    return create_assignment_from_vr_ret(vr_ret_ptr.get()[0])
This would also ensure consistency between single and batch solve result handling.

888-896: Extract get_type_from_int to module level.

This function is duplicated in both Solve (lines 811-819) and create_assignment_from_vr_ret. Consider defining it once at module level:
def _get_type_from_int(type_in_int):
    if type_in_int == int(NodeType.DEPOT):
        return "Depot"
    elif type_in_int == int(NodeType.PICKUP):
        return "Pickup"
    elif type_in_int == int(NodeType.DELIVERY):
        return "Delivery"
    elif type_in_int == int(NodeType.BREAK):
        return "Break"
948-954: Simplify: remove redundant variable.

The intermediate move from batch_solve_result to c_solutions is unnecessary:
     cdef vector[unique_ptr[vehicle_routing_ret_t]] batch_solve_result = (
         move(call_batch_solve(data_model_views, c_solver_settings))
     )

-    cdef vector[unique_ptr[vehicle_routing_ret_t]] c_solutions = (
-        move(batch_solve_result)
-    )
-
     solutions = []
-    for i in range(c_solutions.size()):
+    for i in range(batch_solve_result.size()):
         solutions.append(
-            create_assignment_from_vr_ret(c_solutions[i].get()[0])
+            create_assignment_from_vr_ret(batch_solve_result[i].get()[0])
         )
943-946: Consider adding input validation.

The function could benefit from basic validation to provide clearer error messages:
if not py_data_model_list:
    return []
This is optional since an empty input will simply return an empty list, but explicit handling may be clearer.
cpp/src/routing/solution/pool_allocator.cuh (1)
85-91: Consider updating the comment about destruction order.

The comment on lines 85-86 about keeping a member "above other members, so that it is destructed the last" was relevant for stream_pool (which owned streams). Since rmm::cuda_stream_view is non-owning, the destruction order concern no longer applies. Consider removing or updating this comment.
-  // rmm::cuda_stream_pool is non-movable as it contains an atomic variables
-  // KEEP THIS MEMBER ABOVE OTHER MEMBERS, so that it is destructed the last
-  // rmm::cuda_stream_pool stream_pool;
-
-  // problem description
   rmm::cuda_stream_view stream;
+  // problem description
   const Problem& problem;

coderabbitai · 2025-12-17T20:40:13Z

cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh

+      probing_config.probing_values = host_copy(probing_values, offspring.handle_ptr->get_stream());
      constraint_prop.apply_round(offspring, lp_run_time_after_feasible, timer, probing_config);


⚠️ Potential issue | 🔴 Critical

Add synchronization before apply_round in infeasible branch.

In the infeasible branch (lines 212–217), host_copy is asynchronous on the offspring's stream (line 216), but apply_round is called immediately on line 217 without a stream sync. If apply_round reads the host-side probing_config.probing_values buffer before the copy completes, this creates a race condition.

Add offspring.handle_ptr->sync_stream() after line 216 to ensure the host copy completes before apply_round consumes the buffer:

probing_config.probing_values = host_copy(probing_values, offspring.handle_ptr->get_stream()); + offspring.handle_ptr->sync_stream(); constraint_prop.apply_round(offspring, lp_run_time_after_feasible, timer, probing_config);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

probing_config.probing_values = host_copy(probing_values, offspring.handle_ptr->get_stream());

constraint_prop.apply_round(offspring, lp_run_time_after_feasible, timer, probing_config);

probing_config.probing_values = host_copy(probing_values, offspring.handle_ptr->get_stream());

offspring.handle_ptr->sync_stream();

constraint_prop.apply_round(offspring, lp_run_time_after_feasible, timer, probing_config);

🤖 Prompt for AI Agents

In cpp/src/mip/diversity/recombiners/bound_prop_recombiner.cuh around lines 216–217, the host_copy to probing_config.probing_values is asynchronous on the offspring stream and apply_round is called immediately, creating a race where apply_round may read the buffer before the copy completes; add a synchronization call offspring.handle_ptr->sync_stream() immediately after the host_copy (after line 216) to wait for the copy to finish before calling constraint_prop.apply_round.

coderabbitai · 2025-12-17T20:40:14Z

cpp/tests/routing/unit_tests/batch_tsp.cu

+TEST(batch_tsp, varying_sizes)
+{
+  std::vector<i_t> tsp_sizes = {5, 8, 10, 6, 7, 9};
+  const i_t n_problems       = static_cast<i_t>(tsp_sizes.size());
+
+  // Create handles and cost matrices for each problem
+  std::vector<std::unique_ptr<raft::handle_t>> handles;
+  std::vector<rmm::device_uvector<f_t>> cost_matrices_d;
+  std::vector<std::unique_ptr<cuopt::routing::data_model_view_t<i_t, f_t>>> data_models;
+  std::vector<cuopt::routing::data_model_view_t<i_t, f_t>*> data_model_ptrs;
+
+  for (i_t i = 0; i < n_problems; ++i) {
+    handles.push_back(std::make_unique<raft::handle_t>());
+    auto& handle = *handles.back();
+
+    auto cost_matrix_h = create_small_tsp_cost_matrix(tsp_sizes[i]);
+    cost_matrices_d.push_back(cuopt::device_copy(cost_matrix_h, handle.get_stream()));
+
+    data_models.push_back(std::make_unique<cuopt::routing::data_model_view_t<i_t, f_t>>(
+      &handle, tsp_sizes[i], 1, tsp_sizes[i]));
+    data_models.back()->add_cost_matrix(cost_matrices_d.back().data());
+    data_model_ptrs.push_back(data_models.back().get());
+  }
+
+  // Configure solver settings
+  cuopt::routing::solver_settings_t<i_t, f_t> settings;
+  settings.set_time_limit(5);
+
+  // Call batch solve
+  auto solutions = cuopt::cython::call_batch_solve(data_model_ptrs, &settings);
+
+  // Verify all solutions
+  ASSERT_EQ(solutions.size(), n_problems);
+  for (i_t i = 0; i < n_problems; ++i) {
+    EXPECT_EQ(solutions[i]->status_, cuopt::routing::solution_status_t::SUCCESS)
+      << "TSP " << i << " (size " << tsp_sizes[i] << ") failed";
+    EXPECT_EQ(solutions[i]->vehicle_count_, 1)
+      << "TSP " << i << " (size " << tsp_sizes[i] << ") used multiple vehicles";
+  }
+}


🛠️ Refactor suggestion | 🟠 Major

Add numerical correctness validation to the batch TSP test.

The test currently only verifies that solutions have SUCCESS status and use one vehicle. According to coding guidelines, tests should validate numerical correctness of optimization results, not just that they run without error.

Recommendations:

Verify objective values are reasonable (e.g., for a line topology, the optimal tour length is predictable: 2*(n-1) for the coordinate system used)

Check that routes actually visit all locations

Add edge cases: single-node TSP, two-node TSP

As per coding guidelines, "Write tests validating numerical correctness of optimization results (not just 'runs without error')."

Add validation after line 84:

// Verify numerical correctness for (i_t i = 0; i < n_problems; ++i) { // For TSP on a line with coordinates 0,1,2,...,n-1 // Optimal tour: 0 -> 1 -> 2 -> ... -> n-1 -> 0 // Total distance: 2 * (n-1) f_t expected_objective = 2.0f * static_cast<f_t>(tsp_sizes[i] - 1); f_t tolerance = 0.01f; EXPECT_NEAR(solutions[i]->total_objective_, expected_objective, tolerance) << "TSP " << i << " (size " << tsp_sizes[i] << ") has unexpected objective value"; }

coderabbitai · 2025-12-17T20:40:14Z

python/cuopt/cuopt/routing/vehicle_routing_wrapper.pyx

+def BatchSolve(py_data_model_list, SolverSettings solver_settings):
+    """
+    Solve multiple routing problems in batch mode using parallel execution.
+
+    Parameters
+    ----------
+    py_data_model_list : list of DataModel
+        List of data model objects representing routing problems to solve.
+    solver_settings : SolverSettings
+        Solver settings to use for all problems.
+
+    Returns
+    -------
+    tuple
+        A tuple containing:
+        - list of Assignment: Solutions for each routing problem
+        - float: Total solve time in seconds
+    """


⚠️ Potential issue | 🟡 Minor

Docstring inconsistency: return type mismatch.

The docstring states the function returns a tuple containing (list of Assignment, float: Total solve time in seconds), but the implementation only returns the list of solutions (line 962). Either update the docstring or add the solve time to the return value.

Returns ------- - tuple - A tuple containing: - - list of Assignment: Solutions for each routing problem - - float: Total solve time in seconds + list of Assignment + Solutions for each routing problem. """

🤖 Prompt for AI Agents

In python/cuopt/cuopt/routing/vehicle_routing_wrapper.pyx around lines 919 to 936, the docstring claims BatchSolve returns a tuple (list of Assignment, float total solve time) but the implementation only returns the list of solutions; update the implementation to match the docstring by capturing the total solve time (e.g., sum or measured elapsed time across the batch) and return a tuple (solutions_list, total_solve_time). Ensure the function's return value and any type annotations reflect the tuple so callers receive both the solutions and the total time.

coderabbitai · 2025-12-17T20:40:14Z

python/cuopt/cuopt/routing/vehicle_routing.py

+@catch_cuopt_exception
+def BatchSolve(data_model_list, solver_settings=None):
+    """
+    Solves multiple routing problems in batch mode using parallel execution.
+
+    Parameters
+    ----------
+    data_model_list: list of DataModel
+        List of data model objects representing routing problems to solve.
+    solver_settings: SolverSettings
+        Settings to configure solver configurations.
+        By default, it uses default solver settings to solve.
+
+    Returns
+    -------
+    tuple
+        A tuple containing:
+        - list of Assignment: Solutions for each routing problem
+        - float: Total solve time in seconds
+
+    Examples
+    --------
+    >>> from cuopt import routing
+    >>> import cudf
+    >>> # Create multiple data models
+    >>> data_models = []
+    >>> for i in range(5):
+    ...     cost_matrix = cudf.DataFrame([[0, 1, 2], [1, 0, 3], [2, 3, 0]])
+    ...     dm = routing.DataModel(3, 1)
+    ...     dm.add_cost_matrix(cost_matrix)
+    ...     data_models.append(dm)
+    >>> settings = routing.SolverSettings()
+    >>> settings.set_time_limit(1.0)
+    >>> solutions, solve_time = routing.BatchSolve(data_models, settings)
+    """
+    if solver_settings is None:
+        solver_settings = SolverSettings()
+
+    return vehicle_routing_wrapper.BatchSolve(data_model_list, solver_settings)


🛠️ Refactor suggestion | 🟠 Major

Add input validation for data_model_list parameter.

The BatchSolve function should validate the input data_model_list to prevent runtime errors and provide clear user feedback.

Add validation at the beginning of the function:

@catch_cuopt_exception def BatchSolve(data_model_list, solver_settings=None): """ Solves multiple routing problems in batch mode using parallel execution. Parameters ---------- data_model_list: list of DataModel List of data model objects representing routing problems to solve. solver_settings: SolverSettings Settings to configure solver configurations. By default, it uses default solver settings to solve. Returns ------- tuple A tuple containing: - list of Assignment: Solutions for each routing problem - float: Total solve time in seconds Examples -------- >>> from cuopt import routing >>> import cudf >>> # Create multiple data models >>> data_models = [] >>> for i in range(5): ... cost_matrix = cudf.DataFrame([[0, 1, 2], [1, 0, 3], [2, 3, 0]]) ... dm = routing.DataModel(3, 1) ... dm.add_cost_matrix(cost_matrix) ... data_models.append(dm) >>> settings = routing.SolverSettings() >>> settings.set_time_limit(1.0) >>> solutions, solve_time = routing.BatchSolve(data_models, settings) """ + if not isinstance(data_model_list, list): + raise ValueError("data_model_list must be a list of DataModel objects") + if len(data_model_list) == 0: + raise ValueError("data_model_list cannot be empty") + if not all(isinstance(dm, DataModel) for dm in data_model_list): + raise ValueError("All elements in data_model_list must be DataModel instances") + if solver_settings is None: solver_settings = SolverSettings() return vehicle_routing_wrapper.BatchSolve(data_model_list, solver_settings)

🤖 Prompt for AI Agents

In python/cuopt/cuopt/routing/vehicle_routing.py around lines 1545 to 1583, add input validation at the top of BatchSolve: ensure data_model_list is not None, is a list (or other iterable you support), is non-empty, and that every item is an instance of the expected DataModel class (import or reference the existing DataModel), raising a TypeError if the type is wrong and a ValueError if the list is empty; also validate solver_settings when provided is a SolverSettings instance and raise TypeError if not. Perform these checks before creating default solver_settings and before calling vehicle_routing_wrapper.BatchSolve so callers receive immediate, clear exceptions for bad inputs.

coderabbitai · 2025-12-17T20:40:14Z

python/cuopt/cuopt/tests/routing/test_batch_solve.py

+def test_batch_solve_varying_sizes():
+    """Test batch solving TSPs of varying sizes."""
+    tsp_sizes = [
+        5,
+        8,
+        10,
+        6,
+        7,
+        9,
+        12,
+        15,
+        11,
+        4,
+        13,
+        14,
+        8,
+        6,
+        10,
+        9,
+        7,
+        11,
+        5,
+        12,
+    ]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, find the test file and vehicle_routing.py find . -name "test_batch_solve.py" -o -name "vehicle_routing.py" | head -20

Repository: NVIDIA/cuopt

Length of output: 158

🏁 Script executed:

#!/bin/bash # Search for files related to BatchSolve and routing fd -t f "test_batch_solve" 2>/dev/null || echo "test_batch_solve not found via fd" fd -t f "vehicle_routing" 2>/dev/null | head -10 || echo "vehicle_routing not found via fd" # Also try git ls-files git ls-files | grep -E "(test_batch_solve|vehicle_routing)" | head -20

Repository: NVIDIA/cuopt

Length of output: 571

🏁 Script executed:

#!/bin/bash # List Python files in routing test directory git ls-files "python/**/routing/**/*.py" 2>/dev/null | head -30

Repository: NVIDIA/cuopt

Length of output: 104

🏁 Script executed:

#!/bin/bash # Search for BatchSolve in Python files rg "BatchSolve" --type py -l 2>/dev/null | head -20

Repository: NVIDIA/cuopt

Length of output: 485

🏁 Script executed:

#!/bin/bash # Search for references to BatchSolve and its return type rg "class BatchSolve|def BatchSolve" --type py -A 10 2>/dev/null | head -50

Repository: NVIDIA/cuopt

Length of output: 1955

🏁 Script executed:

#!/bin/bash # Read the complete BatchSolve function in vehicle_routing.py sed -n '/^def BatchSolve/,/^def /p' python/cuopt/cuopt/routing/vehicle_routing.py | head -100

Repository: NVIDIA/cuopt

Length of output: 1350

🏁 Script executed:

#!/bin/bash # Read test_batch_solve.py lines 19-67 to see both test sections mentioned in the review sed -n '1,70p' python/cuopt/cuopt/tests/routing/test_batch_solve.py

Repository: NVIDIA/cuopt

Length of output: 1728

🏁 Script executed:

#!/bin/bash # Get more context by reading a larger section of vehicle_routing.py around BatchSolve cat -n python/cuopt/cuopt/routing/vehicle_routing.py | grep -A 50 "def BatchSolve"

Repository: NVIDIA/cuopt

Length of output: 1616

Fix BatchSolve return type handling in test

The test treats routing.BatchSolve as returning a plain list of assignments, but the docstring and example both document it as returning a (solutions, solve_time) tuple. This causes the length assertion and solution iteration to fail at runtime.

Update line 54 to unpack both values:

solutions, solve_time = routing.BatchSolve(data_models, settings)

Apply the same fix to the second test section (lines 52–67).

🤖 Prompt for AI Agents

In python/cuopt/cuopt/tests/routing/test_batch_solve.py around lines 52–67, the test treats routing.BatchSolve as returning a plain list but BatchSolve actually returns a (solutions, solve_time) tuple; update both calls to unpack the tuple (e.g., solutions, solve_time = routing.BatchSolve(data_models, settings)), then use the solutions variable for the length assertion and iteration over returned solutions in both test sections.

github-actions · 2025-12-30T09:09:57Z

🔔 Hi @anandhkb, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

hlinsen added 7 commits December 16, 2025 04:44

Pass stream everywhere

052e454

Implement batch solve api

95136b3

Add test that uses call batch solve api

6e8282c

Add python api

dbeb83b

Deprecate stream pool in pool allocator and create multiple stramf if…

bca194d

… batch mode is on

Use thread local mode in routing cuda graph class

e0c9d3d

Remove solve time from return

78efc63

hlinsen requested review from a team as code owners December 17, 2025 20:33

hlinsen added the feature request New feature or request label Dec 17, 2025

hlinsen requested a review from a team as a code owner December 17, 2025 20:33

hlinsen added the non-breaking Introduces a non-breaking change label Dec 17, 2025

hlinsen requested review from Kh4ster, rg20 and rgsl888prabhu December 17, 2025 20:33

coderabbitai bot reviewed Dec 17, 2025

View reviewed changes

Merge branch 'main' of github.com:NVIDIA/cuopt into batch-tsp

f28e6f8

anandhkb added this to the 26.02 milestone Dec 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement batch solve for routing models #728

Implement batch solve for routing models #728

Uh oh!

hlinsen commented Dec 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 17, 2025

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 17, 2025

Uh oh!

coderabbitai bot Dec 17, 2025

Uh oh!

coderabbitai bot Dec 17, 2025

Uh oh!

coderabbitai bot Dec 17, 2025

Uh oh!

coderabbitai bot Dec 17, 2025

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		probing_config.probing_values = host_copy(probing_values, offspring.handle_ptr->get_stream());
		constraint_prop.apply_round(offspring, lp_run_time_after_feasible, timer, probing_config);

Implement batch solve for routing models #728

Are you sure you want to change the base?

Implement batch solve for routing models #728

Uh oh!

Conversation

hlinsen commented Dec 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 17, 2025

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hlinsen commented Dec 17, 2025 •

edited by coderabbitai bot

Loading