enh: add cpu distributed support #3420

Alexandr-Solovev · 2025-10-27T20:32:35Z

Description

This pr is a copy of #3255 with extensions.

Enhancement: Add CPU distributed support

This PR introduces several improvements for CPU-based distributed computing in oneDAL:

Added support for distributed CPU policies.
Added distributed linear regression on CPU.
Added Multi-CPU samples.

Checklist:

Completeness and readability

I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

cpp/oneapi/dal/algo/linear_regression/backend/cpu/train_kernel_norm_eq.cpp

samples/oneapi/dpc/mpi/sources/linear_regression_distr_mpi.cpp

Alexandr-Solovev · 2025-11-27T13:56:14Z

@david-cortes-intel @david-cortes-intel Can you please review this pr one more time? I will add conda-recipe testing and optimizing dataset usage in the next pr.

Alexandr-Solovev · 2025-11-27T13:59:00Z

/intelci: run

INSTALL.md

makefile

david-cortes-intel · 2025-11-28T09:46:20Z

@Alexandr-Solovev Could you please add another line after this one:

oneDAL/INSTALL.md

Line 350 in b217715

cmake `# required to build the examples only`

With requirements like this:

    cmake `# required to build the examples only` \
    impi-devel impi_rt `# required to build the samples only`

david-cortes-intel · 2025-11-28T14:39:25Z

@Alexandr-Solovev I see the CPU distributed sample depends on the DPC component, even though it doesn't use it:

-- Missed required DAL component: onedal_dpc
--   _dal_lib-NOTFOUND must exist.
-- Missed required DAL component: onedal_parameters_dpc
--   _dal_lib-NOTFOUND must exist.

david-cortes-intel · 2025-11-28T15:03:17Z

Also I get this error despite having the MPI dependencies installed:

-- Found MPI_C: /export/users/dcortes/miniforge3/envs/icxconda/lib/libmpi.so (found version "4.1")
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) 
CMake Error at /export/users/dcortes/miniforge3/envs/icxconda/share/cmake-3.31/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
  Could NOT find MPI (missing: MPI_CXX_FOUND) (found version "4.1")
Call Stack (most recent call first):
  /export/users/dcortes/miniforge3/envs/icxconda/share/cmake-3.31/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
  /export/users/dcortes/miniforge3/envs/icxconda/share/cmake-3.31/Modules/FindMPI.cmake:1842 (find_package_handle_standard_args)
  /export/users/dcortes/repos/oneDAL/__release_lnx/daal/latest/samples/cmake/setup_samples.cmake:31 (find_package)
  CMakeLists.txt:35 (find_dependencies)

theComputeKid · 2025-11-28T21:42:44Z

Are there tests added to ensure this runs on AArch64 as expected?

Alexandr-Solovev · 2025-12-02T14:43:03Z

/intelci: run

Alexandr-Solovev · 2025-12-03T12:43:30Z

@Alexandr-Solovev I see the CPU distributed sample depends on the DPC component, even though it doesn't use it:
-- Missed required DAL component: onedal_dpc
--   _dal_lib-NOTFOUND must exist.
-- Missed required DAL component: onedal_parameters_dpc
--   _dal_lib-NOTFOUND must exist.

I have fixed this issue! Thanks, now it has no deps on SYCL/dpc

Alexandr-Solovev · 2025-12-03T12:43:53Z

Also I get this error despite having the MPI dependencies installed:

-- Found MPI_C: /export/users/dcortes/miniforge3/envs/icxconda/lib/libmpi.so (found version "4.1")
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) 
CMake Error at /export/users/dcortes/miniforge3/envs/icxconda/share/cmake-3.31/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
  Could NOT find MPI (missing: MPI_CXX_FOUND) (found version "4.1")
Call Stack (most recent call first):
  /export/users/dcortes/miniforge3/envs/icxconda/share/cmake-3.31/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
  /export/users/dcortes/miniforge3/envs/icxconda/share/cmake-3.31/Modules/FindMPI.cmake:1842 (find_package_handle_standard_args)
  /export/users/dcortes/repos/oneDAL/__release_lnx/daal/latest/samples/cmake/setup_samples.cmake:31 (find_package)
  CMakeLists.txt:35 (find_dependencies)

Can you doublecheck please with the latest commit?

Alexandr-Solovev · 2025-12-03T12:45:22Z

Are there tests added to ensure this runs on AArch64 as expected?

There are no tests, but I believe it should work, I will doublecheck it

david-cortes-intel · 2025-12-03T14:55:44Z

INSTALL.md


 DPC++ examples (running on devices supported by SYCL, such as GPU) from oneAPI are also auto-generated within these folders when oneDAL is built with DPC++ support (target `oneapi` in the Makefile), but be aware that it requires a DPC++ compiler such as ICX, and executing the examples requires the DPC++ runtime as well as the GPGPU drivers. The DPC++ examples can be found under `examples/oneapi/dpc`.

+oneDAL samples are also auto-generated in `daal/latest/samples/oneapi/cpp/`(Multi-CPU) and `daal/latest/samples/oneapi/dpc/`(Multi-GPU) when oneDAL is built with DPC++ support (target oneapi in the Makefile). Note that building and running the samples requires a DPC++ compiler such as ICX, and MPI/CCL.


I guess this last part shouldn't apply anymore to the CPU samples?

Note that building and running the samples requires a DPC++ compiler such as ICX, and MPI/CCL.

samples/oneapi/cpp/ccl/CMakeLists.txt

david-cortes-intel · 2025-12-03T15:00:10Z

Also I get this error despite having the MPI dependencies installed:

-- Found MPI_C: /export/users/dcortes/miniforge3/envs/icxconda/lib/libmpi.so (found version "4.1")
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) 
CMake Error at /export/users/dcortes/miniforge3/envs/icxconda/share/cmake-3.31/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
  Could NOT find MPI (missing: MPI_CXX_FOUND) (found version "4.1")
Call Stack (most recent call first):
  /export/users/dcortes/miniforge3/envs/icxconda/share/cmake-3.31/Modules/FindPackageHandleStandardArgs.cmake:603 (_FPHSA_FAILURE_MESSAGE)
  /export/users/dcortes/miniforge3/envs/icxconda/share/cmake-3.31/Modules/FindMPI.cmake:1842 (find_package_handle_standard_args)
  /export/users/dcortes/repos/oneDAL/__release_lnx/daal/latest/samples/cmake/setup_samples.cmake:31 (find_package)
  CMakeLists.txt:35 (find_dependencies)

Can you doublecheck please with the latest commit?

Example works for me now.

david-cortes-intel · 2025-12-03T15:02:14Z

samples/oneapi/cpp/ccl/sources/linear_regression_distr_ccl.cpp

+        dal::preview::infer(comm, lr_desc, x_test_vec.at(rank_id), result_train.get_model());
+
+    if (comm.get_rank() == 0) {
+        std::cout << "Prediction results:\n" << result_infer.get_responses() << std::endl;


Perhaps it could make it more clear here that it is printing only the data from the rank with ID=0. And maybe it could print the coefficients.

For now I aligned this output with relevant GPU sample https://github.com/uxlfoundation/oneDAL/blob/main/samples/oneapi/dpc/mpi/sources/linear_regression_distr_mpi.cpp

david-cortes-intel

LGTM, pending the changes that will be left for the next PR.

Alexandr-Solovev · 2025-12-04T08:56:51Z

/intelci: run

Alexandr-Solovev added 3 commits October 27, 2025 13:31

fixes

2433ebc

Merge branch 'main' into dev/asolovev_spmd_cpu

a70133b

fixes

587e56a

Alexandr-Solovev marked this pull request as ready for review October 28, 2025 14:54

Alexandr-Solovev requested review from Alexsandruss, KateBlueSky, Vika-F, avolkov-intel, david-cortes-intel, ethanglaser, icfaust and maria-Petrova as code owners October 28, 2025 14:54

Alexandr-Solovev added the dpc++ Issue/PR related to DPC++ functionality label Oct 28, 2025

fixes

18de8a2

Vika-F mentioned this pull request Oct 29, 2025

oneDAL API for SPMD linear and ridge regression on CPU #3255

Closed

13 tasks

Alexandr-Solovev added 2 commits October 30, 2025 13:15

Merge branch 'main' into dev/asolovev_spmd_cpu

d73e127

minor fix

686c39e

Alexandr-Solovev marked this pull request as draft November 4, 2025 14:49

Alexandr-Solovev added 2 commits November 4, 2025 06:49

fixes

cd72243

fixes

cb98883

Alexandr-Solovev force-pushed the dev/asolovev_spmd_cpu branch from a2b408e to cb98883 Compare November 4, 2025 15:26

Alexandr-Solovev added 4 commits November 4, 2025 16:26

Merge branch 'main' into dev/asolovev_spmd_cpu

faad3eb

fixes

c936caf

fixes

74dadc1

fixes

3b9c2d4

Alexandr-Solovev marked this pull request as ready for review November 5, 2025 09:13

Alexandr-Solovev marked this pull request as draft November 5, 2025 09:21

Vika-F reviewed Nov 5, 2025

View reviewed changes

cpp/oneapi/dal/algo/linear_regression/backend/cpu/train_kernel_norm_eq.cpp Show resolved Hide resolved

Vika-F reviewed Nov 5, 2025

View reviewed changes

samples/oneapi/dpc/mpi/sources/linear_regression_distr_mpi.cpp Outdated Show resolved Hide resolved

Merge branch 'main' into dev/asolovev_spmd_cpu

b7720f2

Alexandr-Solovev added 3 commits November 26, 2025 05:40

fixes

6e4f3f8

fixes

2197800

docs update

4ad6339

Alexandr-Solovev requested a review from syakov-intel as a code owner November 27, 2025 13:51

minor update

3ea4eb7

david-cortes-intel reviewed Nov 27, 2025

View reviewed changes

INSTALL.md Outdated Show resolved Hide resolved

INSTALL.md Show resolved Hide resolved

makefile Outdated Show resolved Hide resolved

fixes

22517dc

fixes

e8aae0c

Alexandr-Solovev added 3 commits December 1, 2025 10:11

Merge branch 'main' into dev/asolovev_spmd_cpu

1ed1f7f

Merge branch 'main' into dev/asolovev_spmd_cpu

00bfc1b

fixes

729af95

Alexandr-Solovev added 2 commits December 2, 2025 07:30

fixes

3f82f6a

Merge branch 'main' into dev/asolovev_spmd_cpu

d191a04

david-cortes-intel reviewed Dec 3, 2025

View reviewed changes

docs fixes

d0dcfbc

david-cortes-intel approved these changes Dec 4, 2025

View reviewed changes

Alexandr-Solovev merged commit 14ba9e2 into uxlfoundation:main Dec 4, 2025
22 of 27 checks passed


		DPC++ examples (running on devices supported by SYCL, such as GPU) from oneAPI are also auto-generated within these folders when oneDAL is built with DPC++ support (target `oneapi` in the Makefile), but be aware that it requires a DPC++ compiler such as ICX, and executing the examples requires the DPC++ runtime as well as the GPGPU drivers. The DPC++ examples can be found under `examples/oneapi/dpc`.

		oneDAL samples are also auto-generated in `daal/latest/samples/oneapi/cpp/`(Multi-CPU) and `daal/latest/samples/oneapi/dpc/`(Multi-GPU) when oneDAL is built with DPC++ support (target oneapi in the Makefile). Note that building and running the samples requires a DPC++ compiler such as ICX, and MPI/CCL.

enh: add cpu distributed support #3420

enh: add cpu distributed support #3420

Uh oh!

Conversation

Alexandr-Solovev commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Enhancement: Add CPU distributed support

Uh oh!

Uh oh!

Uh oh!

Alexandr-Solovev commented Nov 27, 2025

Uh oh!

Alexandr-Solovev commented Nov 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david-cortes-intel commented Nov 28, 2025

Uh oh!

david-cortes-intel commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-cortes-intel commented Nov 28, 2025

Uh oh!

theComputeKid commented Nov 28, 2025

Uh oh!

Alexandr-Solovev commented Dec 2, 2025

Uh oh!

Alexandr-Solovev commented Dec 3, 2025

Uh oh!

Alexandr-Solovev commented Dec 3, 2025

Uh oh!

Alexandr-Solovev commented Dec 3, 2025

Uh oh!

david-cortes-intel Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

david-cortes-intel commented Dec 3, 2025

Uh oh!

david-cortes-intel Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Alexandr-Solovev Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Alexandr-Solovev commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Alexandr-Solovev commented Oct 27, 2025 •

edited

Loading

david-cortes-intel commented Nov 28, 2025 •

edited

Loading