Skip to content

Conversation

@olegkkruglov
Copy link
Contributor

Description

Add doc update regarding n_jobs parameter. Contains content from #2453 and addresses comments from there.


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

@codecov
Copy link

codecov bot commented Nov 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure ?
github 82.10% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

`the calculation of the 'n_jobs' parameter value <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`__.

When Scikit-learn's utilities with built-in parallelism are used (for example, `GridSearchCV` or `VotingClassifier`),
|sklearnex| tries to determine the optimal number of threads per job using hints provided by `joblib`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you point to the code where this happens? How does it detect that it is running under joblib? (they have multiple threading backends).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def get_suggested_n_threads(n_cpus):

If several instances of sklearnex are run via joblib, n_threads is equal to number of cpu / number of instances

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I played a bit with it, and from what I can tell, it works under both joblib jobs and threadpool contexts. Including contexts that only limit BLAS like this:

threadpoolctl.threadpool_limits(limits=2, user_api='blas')

.. which actually contradicts some of the other points here:

|sklearnex| threading doesn't automatically avoid nested parallelism when used in conjunction with OpenMP and/or with joblib or python threads.

@david-cortes-intel
Copy link
Contributor

Thanks for looking into it. A couple points from the earlier PR:

  • It's not clear to me what happens if one would try to control BLAS/MKL threads through threadpoolctl independently of sklearn settings. I would guess it'd have no effect but this would be important to document, since it also differs from sklearn.
  • In the same vein, I guess using mkl_service also wouldn't have any effect on the static-linked MKL used by oneDAL.
  • It's missing some settings that are global, like the daal4py threads. I think currently T-SNE is the only algorithm whose code takes number of threads from different sources in oneDAL, but not sure if there's any effect there.
    • In this regard, it could mention also what would happen if using sklearnex estimators in python threads. I think passing n_jobs ends up modifying global settings regardless, which means there'd be issues if passing different n_jobs from different python threads (perhaps @Vika-F might have some insights on what would happen).
  • Since TBB works differently from joblib, it could mention here what happens if executing sequential calls to estimators with different numbers of threads. I think currently there is some logic when first passing a large number of threads and then a smaller one that the initial process-wide thread pool is not re-created and can have an impact on performance, but perhaps @avolkov-intel could comment.
    • And it could also mention that the first call to something multi-threaded will need to set up the process-wide thread pool, which adds some overhead to the first call of whatever runs multi-threaded (since this is also different from how joblib parallelization works).


* `n_jobs` parameter is supported for all estimators patched by |sklearnex|,
while |sklearn| enables it for selected estimators only.
* `n_jobs` estimator parameter sets the number of threads used by the underlying |oneDAL|.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `n_jobs` estimator parameter sets the number of threads used by the underlying |oneDAL|.
* `n_jobs` estimator parameter sets the number of threads used by the underlying |onedal|.

Macros are case-sensitive.

* If `n_jobs` is not specified |sklearnex| uses all available threads whereas |sklearn| is single-threaded by default.

|sklearnex| follows the same rules as |sklearn| for
`the calculation of the :term:`n_jobs` parameter value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`the calculation of the :term:`n_jobs` parameter value.
the calculation of the :term:`n_jobs` parameter value.

|sklearnex| follows the same rules as |sklearn| for
`the calculation of the :term:`n_jobs` parameter value.

When Scikit-learn's utilities with built-in parallelism are used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When Scikit-learn's utilities with built-in parallelism are used
When |sklearn|'s utilities with built-in parallelism are used

|sklearnex| threading doesn't automatically avoid nested parallelism when used in conjunction with OpenMP and/or with joblib or python threads.

To track the actual number of threads used by estimators from the |sklearnex|,
set the `DEBUG` :ref:`verbosity setting <verbose>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any log with the number of threads when doing this.

Example:

import os
os.environ["SKLEARNEX_VERBOSE"] = "DEBUG"
import numpy as np
from sklearnex.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
rng = np.random.default_rng(seed=123)
X = rng.standard_normal(size=(100,10))
y = rng.standard_normal(X.shape[0])
Ridge().fit(X, y)
DEBUG:sklearnex: Assigned method '<host_backend>.linear_model.regression.train' to 'BaseLinearRegression.train'
DEBUG:sklearnex: Assigned method '<host_backend>.linear_model.regression.infer' to 'BaseLinearRegression.infer'
DEBUG:sklearnex: Assigned method '<host_backend>.linear_model.regression.model' to 'BaseLinearRegression.model'
DEBUG:sklearnex: Assigned method '<host_backend>.linear_model.regression.partial_train_result' to 'BaseIncrementalLinear.partial_train_result'
DEBUG:sklearnex: Assigned method '<host_backend>.linear_model.regression.partial_train' to 'BaseIncrementalLinear.partial_train'
DEBUG:sklearnex: Assigned method '<host_backend>.linear_model.regression.finalize_train' to 'BaseIncrementalLinear.finalize_train'
DEBUG:sklearnex: Assigned method '<host_backend>.logistic_regression.classification.train' to 'LogisticRegression.train'
DEBUG:sklearnex: Assigned method '<host_backend>.logistic_regression.classification.infer' to 'LogisticRegression.infer'
DEBUG:sklearnex: Assigned method '<host_backend>.logistic_regression.classification.model' to 'LogisticRegression.model'
INFO:sklearnex: sklearn.linear_model.Ridge.fit: running accelerated version on CPU
DEBUG:sklearnex: Dispatching function 'linear_model.regression.train' with policy <onedal._onedal_py_host.host_policy object at 0x7fb689e02770> to Backend(<module 'onedal._onedal_py_host' from '/home/dcortes/repos/scikit-learn-intelex/onedal/_onedal_py_host.cpython-311-x86_64-linux-gnu.so'>, is_dpc=False, is_spmd=False)


When Scikit-learn's utilities with built-in parallelism are used
(for example, :obj:`sklearn.model_selection.GridSearchCV` or :obj:`sklearn.model_selection.VotingClassifier`),
|sklearnex| tries to determine the optimal number of threads per job using hints proded by `joblib`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|sklearnex| tries to determine the optimal number of threads per job using hints proded by `joblib`.
|sklearnex| tries to determine the optimal number of threads per job using hints proded by :mod:`joblib` / ``threadpoolctl``.

Seems to work with both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants