从数组列表中提取具有特定标签的单词答案

【问题标题】：Extract words with specific tags from a list of arrays从数组列表中提取具有特定标签的单词
【发布时间】：2019-08-27 00:07:41
【问题描述】：

我想从列表中获取只有“NNP”标签的单词，如下所示：

[[('原创', 'JJ'), [('尊敬的', 'RB'), (',', ',')], [（'侦探'，'NNP'）， ('.', '.'), ('H.', 'NNP'), ('!', '.'), （'汤普森'，'NNP'）， ('#', '#'), ('1032', 'CD')]]

我试过了：

nouns = [word for (word, pos) in pos_sentences if pos == 'NNP']
Traceback (most recent call last):

  File "<ipython-input-187-0de3a4db4bba>", line 1, in <module>
    nouns = [word for (word, pos) in pos_sentences if pos == 'NNP']

  File "<ipython-input-187-0de3a4db4bba>", line 1, in <listcomp>
    nouns = [word for (word, pos) in pos_sentences if pos == 'NNP']

ValueError: too many values to unpack (expected 2)

我只想要带有“NNP”标签的单词，但不知道像这样遍历列表的方法

【问题讨论】：

您可能想先将您的双列表扁平化为一个列表，然后您的上述代码应该可以工作。您有一个 list of list of tuples，但您的代码是为 list of tuples 编写的
您提供的 pos_sentences 示例没有标准格式：[[('Original', 'JJ'), [('Respectfully', 'RB'), (',', ', ')], [('侦探', 'NNP'), ('.', '.'), ('H.', 'NNP'), ('!', '.'), ('Thompson' , 'NNP'), ('#', '#'), ('1032', 'CD')]] 所以你过滤列表项目的尝试将非常困难，如果不是完全不可能的话。你修复你的样本输入了吗？
@jcmack 你说得对。我无法从文本中获取正确的标签。我有来自 .docx 文件的输入。我尝试了不同的编码，但没有将其转换为正确的格式。你有什么建议吗？

标签： python list pos-tagger

【解决方案1】：

先展平列表：

import functools
pos_sentences = functools.reduce(lambda x, y: x + y, pos_sentences) # Flattens the list
nouns = [word for (word, pos) in pos_sentences if pos == 'NNP'] # Do as you did before

【讨论】：