【发布时间】:2018-11-23 15:41:03
【问题描述】:
我想从我对其进行测试的数据集中获得肯定句和否定句的总数。那么如何计算正负句子的总数呢?
import sklearn
from sklearn.datasets import load_files
moviedirt = r'C:\\Users\\premier\\Downloads\\Reviews\\test'
movie_test = load_files(moviedirt , shuffle=True)
movie_test.target_names
movie_test.data[0:10000]
from sklearn.pipeline import Pipeline # use pipeline for feature extraction and algorithm
pipeline = Pipeline([('vect',CountVectorizer(stop_words='english')),
('tfidf',TfidfTransformer()),('clf',MultinomialNB(fit_prior=False))])
clf = pipeline.fit(movie_train.data , movie_train.target) # classifier is train
predict1 = clf.predict(movie_test.data)
for review, category in zip(movie_test.data , predict1): #use loop
print('%r => %s' % (review, movie_train.target_names[category]))
这是完整的测试代码。 这是输出:
b"Don't hate Heather Graham because she's beautiful, hate her because she's
fun to watch in this movie. Like the hip clothing and funky surroundings, the
actors in this flick work well together. Casey Affleck is hysterical and
Heather Graham literally lights up the screen. The minor characters - Goran
Visnjic {sigh} and Patricia Velazquez are as TALENTED as they are gorgeous.
Congratulations Miramax & Director Lisa Krueger!" => pos
b'I don\'t know how this movie has received so many positive comments. One
can call it "artistic" and "beautifully filmed", but those things don\'t make
up for the empty plot that was filled with sexual innuendos. I wish I had not
wasted my time to watch this movie. Rather than being biographical, it was a
poor excuse for promoting strange and lewd behavior. It was just another
Hollywood attempt to convince us that that kind of life is normal and OK.
From the very beginning I asked my self what was the point of this movie,and
I continued watching, hoping that it would change and was quite disappointed
that it continued in the same vein. I am so glad I did not spend the money to
see this in a theater!' => neg
【问题讨论】:
-
你可以在
predict1上使用计数器。 -
我明白了。谢谢!
-
@VivekKumar 先生,但请告诉我如何以图形方式显示?