【问题标题】:Decision Tree nltk决策树 nltk
【发布时间】:2017-04-30 01:40:44
【问题描述】:

我正在尝试不同的学习方法(决策树、NaiveBayes、MaxEnt)来比较它们的相对性能,以了解其中最好的方法。 如何实现决策树并获得其准确性?

import string
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
import nltk, nltk.classify.util, nltk.metrics
from nltk.classify import MaxentClassifier
from nltk.collocations import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
from nltk.probability import FreqDist, ConditionalFreqDist
from sklearn import cross_validation
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews

from nltk.classify import MaxentClassifier
from nltk.corpus import movie_reviews
from nltk.corpus import movie_reviews as mr

stop = stopwords.words('english')
words = [([w for w in mr.words(i) if w.lower() not in stop and w.lower() not in string.punctuation], i.split('/')[0]) for i in mr.fileids()]

def word_feats(words):
 return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
DecisionTree_classifier = DecisionTreeClassifier.train(trainfeats, binary=True, depth_cutoff=20, support_cutoff=20, entropy_cutoff=0.01)
print(accuracy(DecisionTree_classifier, testfeats))

【问题讨论】:

  • 您的语句末尾有一个多余的右括号。

标签: python classification decision-tree text-classification maxent


【解决方案1】:

您将不得不查看 nltk3 的代码(或文档字符串)。 nltk 书中给出的示例也有可能无需任何更改即可工作。见http://www.nltk.org/book/ch06.html#DecisionTrees

或者您可以只运行一个测试样本并自己计算误报率和误报率

这是你的准确性。

【讨论】:

    猜你喜欢
    • 2012-08-20
    • 2015-02-06
    • 2013-08-29
    • 2018-05-04
    • 1970-01-01
    • 1970-01-01
    • 2020-04-15
    • 2012-10-31
    • 2015-10-22
    相关资源
    最近更新 更多