【问题标题】:how to extract specific word from string using regex in python如何在python中使用正则表达式从字符串中提取特定单词
【发布时间】:2017-09-02 23:17:59
【问题描述】:

我有两个字符串包含单词及其类型:

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'

我喜欢从带有/NN 标签的单词中提取任何带有/NNP/CDP 标签的单词。到目前为止,这是我的代码(仍然只适用于 /NNP 标签):

import re

def entityExtractPreposition(text):
    text = re.findall(r'([^\s/]*/IN\b[^/]*(?:/(?!IN\b)[^/]*)*/NNP\b)', text)
    return text

text1 = 'Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP'
prepo1 = entityExtractPreposition(text1)

text2 = 'Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN'
prepo2 = entityExtractPreposition(text2)

print text1
print prepo1
print ''
print text2
print prepo2

到目前为止的代码结果:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP']

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']

正如我们看到的第一个字符串 (text1),entityExtractPreposition 仍然无法获得 33/CDP。如何使用 text1 中的 /CDP 标签或 text2 中的 /NNP 使 entityExtractPreposition 正常工作?

预期结果是:

Mau/VBT ngasih/NN hadiah/NN untuk/IN Anniv/NN ,/, Graduation/NN ,/, Birthday/NN ,/, Wedding/NN ,/, dll/VBT ?/. Nih/DT ,/, ada/VBI hadiah/NN kece/JJ yang/SC at/IN Yasmin/NNP 33/CDP
['at/IN Yasmin/NNP 33/CDP']

Yang/SC kelaparan/NN habis/VBI latihan/NN ilovenaylambem/NN at/IN Jl/NNP Halimun/NNP Raya/NNP ,/, Menteng/NN
['at/IN Jl/NNP Halimun/NNP Raya/NNP']

谢谢

【问题讨论】:

    标签: python regex string text-extraction


    【解决方案1】:
    \b[^\s/]+/IN\b(?:(?!/IN\b).)*/(?:NNP|CDP)\b
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2017-12-17
      • 2017-08-09
      • 1970-01-01
      • 2019-03-13
      • 2011-03-09
      • 2021-10-27
      • 1970-01-01
      相关资源
      最近更新 更多