Python 训练和测试错误答案

【问题标题】：Python Training and Testing ErrorPython 训练和测试错误
【发布时间】：2017-04-11 23:56:19
【问题描述】：

当我尝试运行以下代码的最后一点时，我收到一个错误，我无法弄清楚原因。

import random
combined_list = h_sub_text + s_sub_text
print(len(combined_list))
random.shuffle(combined_list)

training_part = int(len(combined_list) * .7)
print(len(combined_list))
training_set = combined_list[:training_part]
test_set =  combined_list[training_part:]
print (len(train_set))
print (len(test_set))

import nltk.classify.util
from nltk.classify import NaiveBayesClassifier

classifier = NaiveBayesClassifier.train(train_set)

accuracy = nltk.classify.util.accuracy(classifier, test_set)

print("Accuracy is: ", accuracy * 100)

我得到这个错误：

ValueError             Traceback (most recent call last)
<ipython-input-57-151936e75238> in <module>()
  2 from nltk.classify import NaiveBayesClassifier

----> 4 classifier = NaiveBayesClassifier.train(training_set)

  C:\Program Files (x86)\Anaconda3\lib\site-packages\nltk\classify\naivebayes.py in train(cls, labeled_featuresets, estimator)

--> 194         for featureset, label in labeled_featuresets:
195             label_freqdist[label] += 1
196             for fname, fval in featureset.items():

ValueError: too many values to unpack (expected 2)

提前致谢。

【问题讨论】：

用training_set替换train_set？ train_set 未在您提供的代码中的任何位置定义。
对不起，它的 "NaiveBayesClassifier.train(training_set)" 。在错误中它显示了正确的对象。

标签： python python-3.x machine-learning anaconda training-data

【解决方案1】：

问题的根源在于传递给 NaiveBayesClassifier.train() 的 train_set 的值。要真正知道我们会知道它的外观。不管是什么都会导致“nltk”模块出现错误。

来自http://www.nltk.org/_modules/nltk/classify/naivebayes.html的NLTK源代码：

@classmethod
def train(cls, labeled_featuresets, estimator=ELEProbDist):
   """
   :param labeled_featuresets: A list of classified featuresets,
       i.e., a list of tuples ``(featureset, label)``.

train() 的参数是一个元组列表。因此，考虑到您在预期只有 2 个值时尝试解压缩太多值的错误，这不是您要传入的值。普通数组或大于 2 的数组数组。

【讨论】：

也许 NLTK 方法不适合这个。如何在代码中应用 sklearn-train-test-split？
我不熟悉 NLTK。首先，您可以打印出 training_set 以查看传入的内容。还可以阅读文档以了解应该是什么。