【问题标题】:TypeError from MultinomialNB: float() argument must be a string or a numberMultinomialNB 的 TypeError:float() 参数必须是字符串或数字
【发布时间】:2018-08-31 02:18:22
【问题描述】:

我正在尝试比较多项式、二项式和伯努利分类器的性能,但出现错误:

TypeError: float() 参数必须是字符串或数字,而不是'set'

下面的代码是MultinomialNB

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)

#print(documents[1])

all_words = []

for w in movie_reviews.words():
    all_words.append(w.lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def look_for_features(document):
    words = set(document)
    features = {}
    for x in word_features:
        features[x] = {x in words}
    return features

#feature set will be finding features and category
featuresets = [(look_for_features(rev), category) for (rev, category) in documents]

training_set = featuresets[:1400]
testing_set = featuresets[1400:]

#Multinomial
MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(training_set)
print ("Accuracy: ", (nltk.classify.accuracy(MNB_classifier,testing_set))*100)

错误似乎在MNB_classifier.train(training_set)。 此代码中的错误类似于错误here

【问题讨论】:

    标签: python-3.x machine-learning scikit-learn text-classification naivebayes


    【解决方案1】:

    改变...

    features[x] = {x in words}
    

    到...

    features[x] = x in words
    

    第一行创建featuresets(word, {True})(word, {False}) 的列表,即第二个元素是setSklearnClassifier 不希望这是一个标签。


    代码看起来很像"Creating a module for Sentiment Analysis with NLTK" 中的代码。作者在那里使用了一个元组(x in words),但它与x in words没有什么不同。

    【讨论】:

      猜你喜欢
      • 2019-05-14
      • 2022-01-09
      • 2021-12-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-05-21
      • 2017-08-29
      相关资源
      最近更新 更多