【问题标题】:TextBlob and NLTK POS tagging accuracyTextBlob 和 NLTK POS 标记准确性
【发布时间】:2019-03-24 18:19:11
【问题描述】:

到目前为止,我在下面有这段代码

from textblob import TextBlob
class BrinBot:

    def __init__(self, message): #Accepts the message from the user as the argument
        parse(message)

class parse:
    def __init__(self, message):
        self.message = message
        blob = TextBlob(self.message)
        print(blob.tags)

BrinBot("Handsome Bob's dog is a beautiful Chihuahua")

这是输出:

[('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Chihuahua', 'NNP')]

我的问题是 TextBlob 显然认为“Handsome”是一个单数专有名词,这是不正确的,因为“Handsome”应该是一个形容词。有没有办法解决这个问题,我也在 NLTK 上尝试过,但得到了相同的结果。

【问题讨论】:

    标签: python python-3.x nlp nltk textblob


    【解决方案1】:

    发生这种情况是因为 Handsome 的大写导致它被视为 Bob 名字的一部分。这不一定是一个不正确的分析,但如果你想强制进行形容词分析,你可以像下面的 text2 和 text4 那样去掉“帅”的大写。

    text = "Handsome Bob's dog is a beautiful chihuahua"
    
    BrinBot(text)
    [('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Chihuahua', 'NNP')]
    
    text2 = "handsome bob's dog is a beautiful chihuahua"
    
    BrinBot(text2)
    [('handsome', 'JJ'), ('bob', 'NN'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN')]
    
    text3 = "That beautiful chihuahua is handsome Bob's dog"
    
    BrinBot(text3)
    [('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('handsome', 'JJ'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]
    
    text4 = "That beautiful chihuahua is Handsome Bob's dog"
    
    BrinBot(text4)
    [('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]
    

    【讨论】:

      猜你喜欢
      • 2016-07-02
      • 2017-06-21
      • 2015-03-25
      • 1970-01-01
      • 2013-01-08
      • 1970-01-01
      • 2013-12-18
      • 2014-09-22
      • 2011-09-26
      相关资源
      最近更新 更多