【发布时间】:2018-05-23 20:01:57
【问题描述】:
这个问题与另一个问题有关:How to binarize RandomForest to plot a ROC in python? 而且我还使用了 Scikit 中提供的代码:ROC multiclass problem
所以我想绘制 ROC。但是当我进行 10x10 交叉验证时,我是否必须计算概率的平均值(“predict_proba”),因为我将有 100 个 y_score?每个都是一个3x15的数组?
检查代码中的这一行:
y_score = clf.fit(x_train, y_train).predict_proba(x_test)
代码从这里开始
# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Binarize the output
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]
result_list = [] #stores the average of the inner loops - Preliminar
yscore_list = []
clf = Pipeline([('rcl', RobustScaler()),
('clf', OneVsRestClassifier(RandomForestClassifier(random_state=0, n_jobs=-1)))])
print("4 epochs x subject in test_size", "\n")
xSSSmean84 = [] # 4 epochs x subject =» test_size=84 o 0.1%
for i in range(1):
sss = StratifiedShuffleSplit(2, test_size=0.1, random_state=i)
scoresSSS = model_selection.cross_val_score(clf, X, y, cv=sss)
xSSSmean84.append(scoresSSS.mean())
for train_index, test_index in sss.split(X, y):
x_train, x_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
y_score = clf.fit(x_train, y_train).predict_proba(x_test)
yscore_list.append(y_score)
print(y_score)
print("")
这就是 y_score 的样子。通过交叉验证,我会有很多:
[[ 0. 1. 0.1]
[ 0. 0. 1. ]
[ 0. 1. 0. ]
[ 0. 0. 1. ]
[ 1. 0. 0. ]
[ 0. 0. 1. ]
[ 0. 0. 1. ]
[ 0. 1. 0.1]
[ 0. 1. 0. ]
[ 1. 0. 0. ]
[ 0. 0. 1. ]
[ 1. 0. 0. ]
[ 1. 0. 0. ]
[ 1. 0. 0. ]
[ 0. 1. 0. ]]
【问题讨论】:
-
我回答你的问题了吗
标签: python numpy scikit-learn random-forest roc