【问题标题】:finding the POS of the root of a noun_chunk with spacy用 spacy 找到名词块的根的 POS
【发布时间】:2020-06-08 23:58:16
【问题描述】:

使用 spacy 时,您可以轻松地遍历文本的 noun_phrases,如下所示:

S='This is an example sentence that should include several parts and also make clear that studying Natural language Processing is not difficult'
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)

[chunk.text for chunk in doc.noun_chunks]
# = ['an example sentence', 'several parts', 'Natural language Processing']

你也可以得到名词块的“根”:

[chunk.root.text for chunk in doc.noun_chunks]
# = ['sentence', 'parts', 'Processing']

如何获得每个单词的词性(即使名词短语的词根始终是名词),以及如何获得该特定单词的引理、形状和单数单词。

这可能吗?

谢谢。

【问题讨论】:

    标签: nlp root spacy chunks lemmatization


    【解决方案1】:

    每个chunk.root 都是一个Token,您可以在其中获得不同的属性,包括lemma_pos_(或者tag_,如果您更喜欢PennTreekbak POS 标签)。

    import spacy
    S='This is an example sentence that should include several parts and also make ' \
      'clear that studying Natural language Processing is not difficult'
    nlp = spacy.load('en_core_web_sm')
    doc = nlp(S)
    for chunk in doc.noun_chunks:
        print('%-12s %-6s  %s' % (chunk.root.text, chunk.root.pos_, chunk.root.lemma_))
    
    sentence     NOUN    sentence
    parts        NOUN    part
    Processing   NOUN    processing
    

    顺便说一句...在这句话中,“处理”是一个名词,所以它的引理是“处理”,而不是“过程”,它是动词“处理”的引理。

    【讨论】:

    • omg,每个 chunk.root 都是一个令牌,它就在我眼前。谢谢。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-03-10
    • 1970-01-01
    相关资源
    最近更新 更多