如何在 NLTK 中分块后获得句子？答案

【问题标题】：How to get sentence after chunking in NLTK?如何在 NLTK 中分块后获得句子？
【发布时间】：2023-03-10 01:28:01
【问题描述】：

我有一句话如下：

txt =  "i am living in the West Bengal and my brother live in New York. My name is John Smith"

我需要的是：

获取以 GPE/位置为标签的块，并使用“_”组合这些块
获取带有 PERSON 标签的块并删除这些块。

我需要的输出：

preprocessed_txt =  "i am living in the West_Bengal and my brother live in New_York. My name is "

我使用来自NLTK Named Entity recognition to a Python list 的代码来获取块的标签。

import nltk
for sent in nltk.sent_tokenize(sentence):
   for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
      if hasattr(chunk, 'label'):
         print(chunk.label(), '_'.join(c[0] for c in chunk))

这将输出返回为：

LOCATION West_Bengal
GPE New_York
PERSON John_Smith

下一步该做什么？

【问题讨论】：

试试'_'.join(c[0] for c in chunk))
这给出的输出为：LOCATION West_Bengal GPE New_York PERSON John_Smith
您必须重新编码，捕获列表中的标记，然后提取名称并用标记列表中的原始名称替换它们
@YashvanderBamel... 怎么做？这就是我的问题所在。

标签： python nlp nltk

【解决方案1】：

这应该就是你所需要的：

new = list()
for chunk in nltk.ne_chunk(nltk.pos_tag(tokens)):
  try:
    if chunk.label().lower() == 'person':
      continue
    else:
      new.append('_'.join(c[0] for c in chunk))

  except AttributeError:
    new.append(chunk[0])

print(' '.join(new))

【讨论】：