用 spacy 单数化名词短语答案

【问题标题】：singularize noun phrases with spacy用 spacy 单数化名词短语
【发布时间】：2020-06-09 20:37:46
【问题描述】：

我正在寻找一种用 spacy 单数化名词块的方法

S='There are multiple sentences that should include several parts and also make clear that studying Natural language Processing is not difficult '
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)

[chunk.text for chunk in doc.noun_chunks]
# = ['an example sentence', 'several parts', 'Natural language Processing']

你也可以得到名词块的“根”：

[chunk.root.text for chunk in doc.noun_chunks]
# = ['sentences', 'parts', 'Processing']

我正在寻找一种方法来单一化这些块的根。

目标：单一化：['sentence', 'part', 'Processing']

有什么明显的方法吗？这总是取决于每个词根的词性吗？

谢谢

注意：我发现了这个：https://www.geeksforgeeks.org/nlp-singularizing-plural-nouns-and-swapping-infinite-phrases/ 但在我看来，这种方法会导致许多不同的方法，当然每种语言也不同。（我在 EN、FR、DE 工作）

【问题讨论】：

这不是你昨天问的question的重复吗？
嗨@bivouac0 不，不是。一个是关于找到 POS（词性），另一个是关于将标记或名词短语转换为单数。欢呼

标签： nlp spacy chunks lemmatization

【解决方案1】：

要获取每个单词的基本形式，可以使用chunk的“.lemma_”属性或token属性

我使用 Spacy 版本 2.x

import spacy
nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner'])
doc = nlp('did displaying words')
print (" ".join([token.lemma_ for token in doc]))

和输出：

do display word

希望对你有帮助:)

【讨论】：

【解决方案2】：

有！您可以在每个名词块中获取中心词的引理。

[chunk.root.lemma_ for chunk in doc.noun_chunks]                       
Out[82]: ['sentence', 'part', 'processing']

【讨论】：