如何计算数据集中正负句子的总数？答案

【问题标题】：How to count total number of positive and negative sentences from a data set?如何计算数据集中正负句子的总数？
【发布时间】：2018-11-23 15:41:03
【问题描述】：

我想从我对其进行测试的数据集中获得肯定句和否定句的总数。那么如何计算正负句子的总数呢？

import sklearn
from sklearn.datasets import load_files
moviedirt = r'C:\\Users\\premier\\Downloads\\Reviews\\test'
movie_test = load_files(moviedirt , shuffle=True)
movie_test.target_names
movie_test.data[0:10000]
from sklearn.pipeline import Pipeline # use pipeline for feature extraction and algorithm
pipeline = Pipeline([('vect',CountVectorizer(stop_words='english')), 
('tfidf',TfidfTransformer()),('clf',MultinomialNB(fit_prior=False))])
clf = pipeline.fit(movie_train.data , movie_train.target) # classifier is train  
predict1 = clf.predict(movie_test.data)
for review, category in zip(movie_test.data , predict1): #use loop 
print('%r => %s' % (review, movie_train.target_names[category]))

这是完整的测试代码。这是输出：

b"Don't hate Heather Graham because she's beautiful, hate her because she's 
fun to watch in this movie. Like the hip clothing and funky surroundings, the 
actors in this flick work well together. Casey Affleck is hysterical and 
Heather Graham literally lights up the screen. The minor characters - Goran 
Visnjic {sigh} and Patricia Velazquez are as TALENTED as they are gorgeous. 
Congratulations Miramax & Director Lisa Krueger!" => pos

b'I don\'t know how this movie has received so many positive comments. One 
can call it "artistic" and "beautifully filmed", but those things don\'t make 
up for the empty plot that was filled with sexual innuendos. I wish I had not 
wasted my time to watch this movie. Rather than being biographical, it was a 
poor excuse for promoting strange and lewd behavior. It was just another 
Hollywood attempt to convince us that that kind of life is normal and OK. 
From the very beginning I asked my self what was the point of this movie,and 
I continued watching, hoping that it would change and was quite disappointed 
that it continued in the same vein. I am so glad I did not spend the money to 
see this in a theater!' => neg

【问题讨论】：

你可以在predict1上使用计数器。
我明白了。谢谢！
@VivekKumar 先生，但请告诉我如何以图形方式显示？

标签： python-3.x scikit-learn

【解决方案1】：

import numpy as np

# Number of pos/neg samples in your training set
print(np.unique(movie_train.target, return_counts=True))

# Number of pos/neg samples in your predictions
print(np.unique(predict1, return_counts=True))

【讨论】：