【问题标题】:Classification of text using naive bayes in python在 python 中使用朴素贝叶斯对文本进行分类
【发布时间】:2018-03-06 10:15:41
【问题描述】:

我创建了一个模型,我在其中运行朴素贝叶斯以获得预期的输出。

from textblob.classifiers import NaiveBayesClassifier as NBC
from textblob import TextBlob
training_corpus = [
('Agree Completely Agree Strongly Agree Somewhat Disagree Somewhat Disagree Strongly Completely Disagree','TRUE'),
('Concerned 2 3 4 5 6 7 - Comfortable','TRUE'),
('1 - disagree strongly 2 - disagree somewhat 3 - neither agree nor disagree 4 - agree somewhat 5 - agree strongly','TRUE'),
('1 - doesn\'t apply at all 2 3 4 5 6 7 - applies completely','TRUE'),
('1 - extremely new and different 2 3 4 5 6 7 - not at all new & different','TRUE'),
('1 - extremely relevant 2 3 4 5 6 7 - not at all relevant','TRUE'),
('1 - I don\'t want brands to engage with me at all on social media 2 3 4 5 6 7 - I love to engage with brands on social media','TRUE'),
    ('1 - Most Important 2 3 4 5 - Least Important','TRUE'),    
    ('pepsi','FALSE'),
    ('coca cola','FALSE'),
    ('hyundai','FALSE'),        
    ('Audio quality','FALSE'),
    ('Product features ','FALSE'),
    ('Content ','FALSE')
]
test_corpus = [
    ('1 - Agree Completely 2 - Agree Strongly 3 - Agree Somewhat 4 - Disagree Somewhat 5 - Disagree Strongly 6 - Completely Disagree','TRUE'),
    ('1 - Concerned 2 3 4 5 6 7 - Comfortable','TRUE'),
    ('Content ','FALSE'),
    ('Ease of navigation','FALSE')
]
model = NBC(training_corpus) 
print(model.classify('pepsi'))
print(model.accuracy(test_corpus)*100)

当我运行这段代码时,它显示出 100% 的效率,但每次都返回 FALSE。我不确定出了什么问题,但这不是预期的输出。

【问题讨论】:

    标签: python machine-learning naivebayes


    【解决方案1】:

    您的模型没问题,只是您的数据和分类器。
    我的意思是你提供的训练数据,效果很好,让我们测试一下:

    def test(s):
        prob_dist = model.prob_classify(s)
        print("classifiying", s)
        print("possibility of being FALSE:", round(prob_dist.prob("FALSE"), 2), 
              "possibility of being TRUE:" ,round(prob_dist.prob("TRUE"), 2))
        print('-'*70)
    
    test_cases = ['1', '1 - ', '2', '2 3 4 5', '1- 2 3 4 5', 'pepsi', 'coca', 'BMW']
    for tc in test_cases:
        test(tc)
    

    现在是输出了,挺好的,

    classifiying 1
    possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
    ----------------------------------------------------------------------
    classifiying 1 - 
    possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
    ----------------------------------------------------------------------
    classifiying 2
    possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
    ----------------------------------------------------------------------
    classifiying 2 3 4 5
    possibility of being FALSE: 0.05 possibility of being TRUE: 0.95
    ----------------------------------------------------------------------
    classifiying 1- 2 3 4 5
    possibility of being FALSE: 0.0 possibility of being TRUE: 1.0
    ----------------------------------------------------------------------
    classifiying pepsi
    possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
    ----------------------------------------------------------------------
    classifiying coca
    possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
    ----------------------------------------------------------------------
    classifiying BMW
    possibility of being FALSE: 1.0 possibility of being TRUE: 0.0
    --------------------------------------------------------------------
    

    好的,现在你想知道为什么分类器会这样工作吗? 看看你的代码,你在哪里提到了特征向量?没有,所以它使用默认函数来提取特征向量为explained here。 (你可以看看source code

    例如,您的模型特征可以如下所示:

    model.show_informative_features()
    
    
    >>> Most Informative Features
                 contains(4) = False           FALSE : TRUE   =      5.6 : 1.0
                 contains(3) = False           FALSE : TRUE   =      5.6 : 1.0
                 contains(5) = False           FALSE : TRUE   =      5.6 : 1.0
                 contains(2) = False           FALSE : TRUE   =      5.6 : 1.0
                 contains(1) = False           FALSE : TRUE   =      3.3 : 1.0
                 contains(7) = False           FALSE : TRUE   =      2.4 : 1.0
                 contains(6) = False           FALSE : TRUE   =      2.4 : 1.0
                contains(at) = False           FALSE : TRUE   =      1.9 : 1.0
               contains(all) = False           FALSE : TRUE   =      1.9 : 1.0
               contains(not) = False           FALSE : TRUE   =      1.3 : 1.0
    

    【讨论】:

    • 谢谢伊曼...我正在努力,如果有任何问题会告诉你。
    猜你喜欢
    • 2019-01-22
    • 2012-05-17
    • 2021-11-15
    • 2012-07-02
    • 2016-08-02
    • 2018-07-29
    • 2014-04-14
    • 2013-06-21
    • 2013-12-02
    相关资源
    最近更新 更多