【发布时间】:2016-03-15 00:43:23
【问题描述】:
我正在尝试估计 nltk 电影评论语料库的朴素贝叶斯分类的准确性。
from nltk.corpus import movie_reviews
import random
import nltk
from sklearn import cross_validation
from nltk.corpus import stopwords
import string
from nltk.classify import apply_features
def document_features(document):
document_words = set(document)
features = {}
for word in unigrams:
features['contains({})'.format(word)] = (word in document_words)
return features
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
stop = stopwords.words('english')
all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words() if w.lower() not in stop and w.lower() not in string.punctuation)
unigrams = list(all_words)[:200]
featuresets = [(document_features(d), c) for (d,c) in documents]
我正在尝试执行 10 折交叉验证,我从 sklearn 中获取了一个示例。
training_set = nltk.classify.apply_features(featuresets, documents)
cv = cross_validation.KFold(len(training_set), n_folds=10, shuffle=True, random_state=None)
for traincv, testcv in cv:
classifier = nltk.NaiveBayesClassifier.train(training_set[traincv[0]:traincv[len(traincv)-1]])
result = nltk.classify.util.accuracy(classifier, training_set[testcv[0]:testcv[len(testcv)-1]])
print 'Accuracy:', result
但我在行中遇到错误
classifier = nltk.NaiveBayesClassifier.train(training_set[traincv[0]:traincv[len(traincv)-1]])
'list' 对象不可调用
任何想法我做错了什么?
【问题讨论】:
标签: python scikit-learn