【问题标题】:how to i handle a multiclass decision tree?我如何处理多类决策树?
【发布时间】:2020-10-17 02:51:25
【问题描述】:

我是 python 和 ML 的新手,但我正在尝试使用 sklearn 来构建决策树。我有许多分类特征,我已将它们转换为数值变量。但是,我的目标功能是一个多类,我遇到了一个错误。我应该如何处理多类目标?

ValueError:目标是多类但平均值='二进制'。请选择另一个平均设置,[None, 'micro', 'macro', 'weighted'] 之一。

from sklearn.model_selection import train_test_split

#SPLIT DATA INTO TRAIN AND TEST SET
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size =0.30, #by default is 75%-25%
                                                    #shuffle is set True by default,
                                                    stratify=y, #preserve target propotions 
                                                    random_state= 123) #fix random seed for replicability

print(X_train.shape, X_test.shape)


from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(criterion='gini', max_depth=3, min_samples_split=4, min_samples_leaf=2)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# criterion : "gini", "entropy"
# max_depth : The maximum depth of the tree.
# min_samples_split : The minimum number of samples required to split an internal node:
# min_samples_leaf : The minimum number of samples required to be at a leaf node. 

#DEFINE YOUR CLASSIFIER and THE PARAMETERS GRID
from sklearn.tree import DecisionTreeClassifier
import numpy as np

classifier = DecisionTreeClassifier()
parameters = {'criterion': ['entropy','gini'], 
              'max_depth': [3,4,5],
              'min_samples_split': [5,10],
              'min_samples_leaf': [2]}

from sklearn.model_selection import GridSearchCV
gs = GridSearchCV(classifier, parameters, cv=3, scoring = 'f1', verbose=50, n_jobs=-1, refit=True)

enter image description here

【问题讨论】:

    标签: python machine-learning decision-tree sklearn-pandas gridsearchcv


    【解决方案1】:

    您应该手动指定分数函数:

    from sklearn.metrics import f1_score, make_scorer
    
    f1 = make_scorer(f1_score, average='weighted')
    
    ....
    
    gs = GridSearchCV(classifier, parameters, cv=3, scoring=f1, verbose=50, n_jobs=-1, refit=True)
    

    【讨论】:

    • 感谢您的建议,我刚刚尝试了同样的错误。
    • 我刚刚调整了我的代码示例。可以试试吗?
    【解决方案2】:

    非常感谢您的帮助。我想到了。它实际上在gs线上。在得分方面,我需要调整你提到的内容。所以我修改了评分 = f1_macro

    gs = GridSearchCV(classifier, parameters, cv=3, scoring=f1_macro, verbose=50, n_jobs=-1, refit=True)
    

    【讨论】:

    • 不客气!如果将答案标记为解决方案,我将不胜感激。
    猜你喜欢
    • 1970-01-01
    • 2019-06-13
    • 2020-05-19
    • 2018-04-02
    • 2016-04-24
    • 2017-11-09
    • 2018-04-22
    • 2012-01-30
    • 2017-12-11
    相关资源
    最近更新 更多