【问题标题】:Why is cross_val_score not producing consistent results?为什么 cross_val_score 没有产生一致的结果?
【发布时间】:2023-01-20 21:50:03
【问题描述】:

当这段代码执行时,结果是不一致的。 随机性从何而来?

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

seed = 42
iris = datasets.load_iris()
X = iris.data
y = iris.target

pipeline = Pipeline([('std', StandardScaler()), 
                     ('pca', PCA(n_components = 4)), 
                     ('Decision_tree', DecisionTreeClassifier())], 
                    verbose = False)

kfold = KFold(n_splits = 10, random_state = seed, shuffle = True)
results = cross_val_score(pipeline, X, y, cv = kfold)
print(results.mean())


0.9466666666666667
0.9266666666666665
0.9466666666666667
0.9400000000000001
0.9266666666666665

【问题讨论】:

    标签: python scikit-learn cross-validation


    【解决方案1】:

    DecisionTreeClassifier 不使用所有列,而是默认使用每个拆分的列数的 sqrt。您将种子分配给了KFold,但没有分配给DecisionTreeClassifier。因此每次运行都会选择不同的列。 PCA 也接受随机状态。

    参见DecisionTreeClassifierPCA

    【讨论】:

      猜你喜欢
      • 2012-10-21
      • 2022-11-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-09-23
      • 2019-02-03
      相关资源
      最近更新 更多