Scikit Learn：RandomForest：clf.predict 适用于 float，但不适用于 clf.score答案

【问题标题】：Scikit Learn: RandomForest: clf.predict works with float, but not clf.scoreScikit Learn：RandomForest：clf.predict 适用于 float，但不适用于 clf.score
【发布时间】：2016-04-24 07:36:19
【问题描述】：

我正在处理一个分类问题。我试图预测的标签：

df3['relevance'].unique()
array([ 3.  ,  2.5 ,  2.33,  2.67,  2.  ,  1.  ,  1.67,  1.33,  1.25,
        2.75,  1.75,  1.5 ,  2.25])

当我使用我制作的功能调用 predict 时，它可以正常工作：

clf = RandomForestClassifier()
clf.fit(df3[features], df['relevance'])
pd.crosstab(clf.predict(df3[features]), df3['relevance'])

但是当我调用 clf.score 时：

clf.score(df3['features'], df3['relevance'])

我明白了 ValueError: 不支持连续

我是否应该将我试图预测的相关性标签分类为另一种数据类型？谢谢你的帮助。

【问题讨论】：

标签： pandas scikit-learn random-forest

【解决方案1】：

您面临的问题很可能是因为您的relevance 列由连续数字组成。

如果您尝试预测连续数字，我建议切换到RandomForestRegressor()。否则，根据某个阈值将变量转换为 1 和 0。

【讨论】：

tthanks @ericmjl - 回归器确实有效。现在阅读它。

【解决方案2】：

只需将标签编码为整数，一切都会运行良好。浮点数表示回归。

特别是你可以使用 LabelEncoder http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

>>> from sklearn.ensemble import RandomForestClassifier as RF
>>> import numpy as np
>>> X = np.array([[0], [1], [1.2]])
>>> y = [0.5, 1.2, -0.1]
>>> clf = RF()
>>> clf.fit(X, y)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
>>> print clf.score(y, X)
Traceback (most recent call last):
[.....]
ValueError: continuous is not supported
>>> y = [0, 1, 2]
>>> clf.fit(X, y)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
>>> print clf.score(X, y)
1.0

或自己计算.score，因为这是非常微不足道的函数

print np.mean(clf.predict(X) == y)

【讨论】：