【发布时间】:2016-05-12 01:15:46
【问题描述】:
使用 pandas 和 sklearn 创建一个决策树来学习数据,我对树的修剪方法是重试不同的最大深度。我相信我一切正常,但我似乎无法通过 pyplot 输出它。有人可以帮我解决这个问题吗
import numpy as np
import pandas as pd
from sklearn import tree
from sklearn import cross_validation
from sklearn.cross_validation import KFold
import matplotlib.pyplot as plt
features = ['birad','age','Shape','margin','density','severity']
df = pd.read_csv('mammographic_masses.data',header=None,names=features)
df= df[df.birad != '?']
df= df[df.age != '?']
df= df[df.Shape != '?']
df= df[df.margin != '?']
df= df[df.density != '?']
#df= df[df.severity != '?']
x = df[features[:-1]]
y = df['severity']
x_train,x_test,y_train,y_test = cross_validation.train_test_split(x,y,test_size=0.4,random_state=0)
depth = []
best_depth = 3
best_score = 0
best_clf = []
for i in range(1,20):
clf = tree.DecisionTreeClassifier(max_depth=i)
clf = clf.fit(x_train,y_train)
scores = cross_validation.cross_val_score(clf,x_train,y_train,cv=10)
ascore = clf.score(x_test,y_test)
depth.append((i,clf.score(x_test,y_test)))
if ascore > best_score:
best_score,best_depth = ascore,i
best_clf.append(clf)
print best_depth,' ',best_score
【问题讨论】:
标签: python machine-learning decision-tree cross-validation