【问题标题】:Find word after a match if word in list in python如果python列表中的单词在匹配后查找单词
【发布时间】:2018-04-10 07:11:12
【问题描述】:

如果单词在列表中,如何在匹配后找到单词? 例如,如果这个词在列表中,我想在 ma​​tch1 之后找到这个词:

r = ["word1", "word2", "word3"]

如果找到,则返回 word(i)。如果不是,则返回 Unknown

玩具示例:

Text1 = "This is a match1 for example match1 random text match1 anotherword"
Text2 = "This is a match1 word1 example"
Text3 = "This is an example without word of interest"

如果这个词在列表中 r = ["word1", "word2", "word3"],我想查看 ma​​tch1 之后的词 预期结果: 对于 Text1,我希望获得 Unknown,对于 Text2 word1,对于 Text3 未知

到目前为止,我已经设法仅在前两次出现的情况下提取“word1”,但是如果我们有 Text4(如下)我无法提取它,因为我只是要去直到我第二次看到比赛,并且继续使用 if-else 语句继续深入,我不认为这是要走的路,因为 word1 甚至根本不存在。 p>

Text4 = "example match1 example match1 example match1 word1"

def get_labels(text):
    q = ["match1"] #Here the idea is to have several, but its the same logic
    r = ["word1", "word2", "word3"]
    labels = []
    for i,item in enumerate(q):
        label = text[text.find(q[i])+len(q[i]):].split()[0]
        if label in r:
            labels.append(label)
        else:
            texto_temp = text[text.find(q[i])+len(q[i]):]
            label2 = texto_temp[texto_temp.find(q[i])+len(q[i]):].split()[0]
            labels.append(label2)
    return labels

任何想法都会受到赞赏。

【问题讨论】:

  • 嵌套的 for 循环?你可以在 for 循环内有一个 for 循环。

标签: python string


【解决方案1】:

使用可以使用regular expressions 来查找匹配项。

代码

from __future__ import print_function
import re

def get_labels(text, match, words)
    tmp = re.findall(r'(?<={})\s+({})'.format(match, '|'.join(words)), text)

    return tmp if tmp else "Unknown"

Text1 = "This is a match1 for example match1 random text match1 anotherword"
Text2 = "This is a match1 word1 example"
Text3 = "This is an example without word of interest"
Text4 = "example match1 example match1 example match1 word1"

match = "match1"
words = ["word1", "word2", "word3"]

print(get_labels(Text1, match, words))
print(get_labels(Text2, match, words))
print(get_labels(Text3, match, words))
print(get_labels(Text4, match, words))

控制台输出

Unknown
['word1']
Unknown
['word1']

如有需要,请询问详情...

【讨论】:

  • 谢谢@sven-krüger,现在如果找不到,我怎么能添加“未知”,比如Text1和Text3?谢谢!
  • @juanman 随意接受我的回答(上下投票下的绿色钩子)。
  • 感谢 Sven,我在迭代要匹配的单词列表时使用了这种方法。
【解决方案2】:

如果我理解正确的话。这应该有效:

def get_labels(text):
    q = ['match1']
    r = ['word1', 'word2', 'word3']
    labels = []
    terms = text.split()
    for i, term in enumerate(terms[:-1]):
        if term in q and terms[i+1] in r:
            labels.append(terms[i+1])
    return labels if labels else 'Unknown'

【讨论】:

  • 谢谢@wjk2a1 但现在如何打印“未知”之类的东西,如果没有找到?谢谢!
  • 如果标签为空,您可以返回未知:如果标签为“未知”,则返回标签
【解决方案3】:

你可以试试Positive Lookbehind (?&lt;=match1\s)

import re
pattern=r'(?<=match1\s)[a-zA-Z0-9]+'

Text1 = "This is a match1 for example match1 random text match1 anotherword"
Text2 = "This is a match1 word1 example"
Text3 = "This is an example without word of interest"
Text4 = "example match1 example match1 example match1 word1"

r = ["word1", "word2", "word3"]

def word_checker(list_):
    data=re.findall(pattern,list_)
    list_data=[i for i in data if i in r]
    if list_data:
        return list_data[0]
    else:
        return 'Unknown'

输出:

print(word_checker(Text1))
print(word_checker(Text2))
print(word_checker(Text3))
print(word_checker(Text4))

输出:

Unknown
word1
Unknown
word1

【讨论】:

    猜你喜欢
    • 2018-10-13
    • 1970-01-01
    • 2022-06-10
    • 2013-08-12
    • 2022-01-21
    • 1970-01-01
    • 1970-01-01
    • 2017-05-25
    • 1970-01-01
    相关资源
    最近更新 更多