【发布时间】:2020-03-16 21:34:56
【问题描述】:
我正在使用 NLTK NaiveBayesClassifier 进行情绪分类。我用标记的数据训练和测试了模型。现在我想预测未标记数据的情绪。但是,我遇到了错误。 给出错误的行是:
score_1 = analyzer.evaluate(list(zip(new_data['Articles'])))
错误是:
ValueError: 没有足够的值来解包(预期 2,得到 1)
下面是代码:
import random
import pandas as pd
data = pd.read_csv("label data for testing .csv", header=0)
sentiment_data = list(zip(data['Articles'], data['Sentiment']))
random.shuffle(sentiment_data)
new_data = pd.read_csv("Japan Data.csv", header=0)
train_x, train_y = zip(*sentiment_data[:350])
test_x, test_y = zip(*sentiment_data[350:])
from unidecode import unidecode
from nltk import word_tokenize
from nltk.classify import NaiveBayesClassifier
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import extract_unigram_feats
TRAINING_COUNT = 350
def clean_text(text):
text = text.replace("<br />", " ")
return text
analyzer = SentimentAnalyzer()
vocabulary = analyzer.all_words([(word_tokenize(unidecode(clean_text(instance))))
for instance in train_x[:TRAINING_COUNT]])
print("Vocabulary: ", len(vocabulary))
print("Computing Unigran Features ...")
unigram_features = analyzer.unigram_word_feats(vocabulary, min_freq=10)
print("Unigram Features: ", len(unigram_features))
analyzer.add_feat_extractor(extract_unigram_feats, unigrams=unigram_features)
# Build the training set
_train_X = analyzer.apply_features([(word_tokenize(unidecode(clean_text(instance))))
for instance in train_x[:TRAINING_COUNT]], labeled=False)
# Build the test set
_test_X = analyzer.apply_features([(word_tokenize(unidecode(clean_text(instance))))
for instance in test_x], labeled=False)
trainer = NaiveBayesClassifier.train
classifier = analyzer.train(trainer, zip(_train_X, train_y[:TRAINING_COUNT]))
score = analyzer.evaluate(list(zip(_test_X, test_y)))
print("Accuracy: ", score['Accuracy'])
score_1 = analyzer.evaluate(list(zip(new_data['Articles'])))
print(score_1)
我知道问题的出现是因为我必须给出两个参数是给出错误但我不知道该怎么做的行。
提前致谢。
【问题讨论】:
标签: nltk python-3.7 sentiment-analysis predict naivebayes