【发布时间】:2019-11-16 04:38:12
【问题描述】:
我看到XGBClassifier() 和sklearn.model_selection.RandomizedSearchCV() 都有参数n_jobs。我执行了 CV,我看到通过设置 n_jobs = -1(在两者中)我利用了我拥有的 16 个工人:
Fitting 5 folds for each of 30 candidates, totalling 150 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 13.7min
[Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 20.4min
[Parallel(n_jobs=-1)]: Done 29 tasks | elapsed: 23.7min
[Parallel(n_jobs=-1)]: Done 40 tasks | elapsed: 28.7min
[Parallel(n_jobs=-1)]: Done 53 tasks | elapsed: 36.1min
[Parallel(n_jobs=-1)]: Done 66 tasks | elapsed: 43.4min
[Parallel(n_jobs=-1)]: Done 81 tasks | elapsed: 47.6min
[Parallel(n_jobs=-1)]: Done 96 tasks | elapsed: 50.8min
[Parallel(n_jobs=-1)]: Done 113 tasks | elapsed: 60.0min
[Parallel(n_jobs=-1)]: Done 135 out of 150 | elapsed: 73.1min remaining: 8.1min
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed: 85.7min finished
我现在不能重复分析,但我假设发生并行化是因为RandomizedSearchCV() 中的n_jobs=1。
我对并行计算知之甚少。我知道RandomizedSearchCV() 独立运行每个参数设置,但是在并行化时它是如何具体工作的?那么n_jobs=-1 和XGBClassifier() 呢?两个函数都设置这个参数有意义吗?
【问题讨论】:
标签: python parallel-processing scikit-learn cross-validation xgboost