如何在文本列表中返回多个匹配项？答案

【问题标题】：How return more than one match on a list of text?如何在文本列表中返回多个匹配项？
【发布时间】：2022-01-17 02:31:23
【问题描述】：

我目前有一个函数可以产生一个术语和它出现的句子。此时，该函数只是从术语列表中检索第一个匹配项。我希望能够检索所有匹配项，而不仅仅是第一个。

例如，list_of_matches = ["heart attack", "cardiovascular", "hypoxia"] 一个句子是text_list = ["A heart attack is a result of cardiovascular...", "Chronic intermittent hypoxia is the..."]

理想的输出是：

['heart attack', 'a heart attack is a result of cardiovascular...'],
['cardiovascular', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']

# this is the current function
def find_word(list_of_matches, line):
    for words in list_of_matches:
        if any([words in line]):
            return words, line

# returns list of 'term, matched string'
key_vals = [list(find_word(list_of_matches, line.lower())) for line in text_list if 
find_word(list_of_matches, line.lower()) != None]

# output is currently 
['heart attack', 'a heart attack is a result of cardiovascular...'],
['hypoxia', 'chronic intermittent hypoxia is the...']

【问题讨论】：

标签： python search nlp

【解决方案1】：

你会想在这里使用正则表达式。

import re

def find_all_matches(words_to_search, text):
    matches = []
    for word in words_to_search:
        matched_text = re.search(word, text).group()
        matches.append(matched_text)
    return [matches, text]

请注意，这将返回所有匹配项的嵌套列表。

【讨论】：

嗨，山姆，感谢您的回复！但是如何调用该函数？如果 find_all_matches(list_of_matches, line.lower()) != None] 我试图用 list(find_all_matches(list_of_matches, line.lower())) for line in text_list 替换它，并收到以下错误：“AttributeError : 'NoneType' 对象没有属性 'group'"
假设您有要搜索的关键字：words_to_search = ["heart attack", "cardiovascular", "hypoxia"] 文本为：text = 'a heart attack is心血管的结果...' 然后调用：result = find_all_matches(words_to_search, text)
如果文本是一个列表，有没有办法遍历它？
您能否发布更多输入和所需输出的示例，以便我们了解您的意思？

【解决方案2】：

解决方案需要2个步骤：

修复功能
处理输出

鉴于您不想要的输出遵循模式

输出 = [ [单词1，句子1]， [单词2，句子1]， [单词3，句子2]， ]

修复功能： 你应该改变'for'循环的de return来迭代list_of_matches的每个单词，以获取所有匹配的单词，而不仅仅是第一个

。它应该是这样的：

def find_word（list_of_matches，行）：答案 = [] 对于 list_of_matches 中的单词：如果有的话（[行中的单词]）： answer.append([单词，行]) 返回答案

使用上面的函数，输出将是：

key_vals = [ [ ['心脏病发作'，'心脏病发作是心血管疾病的结果......']， ['心血管'，'心脏病发作是心血管的结果......'] ], [ ['缺氧'，'慢性间歇性缺氧是……'] ] ]

处理输出：现在您需要获取 var "key_vals" 并处理使用以下代码处理的每个句子的所有列表：

输出 = [] 对于 key_vals 中的 word_sentence_list：对于 word_sentence_list 中的 word_sentence： output.append(word_sentence)

最后，输出将是：

输出 = [ ['心脏病发作'，'心脏病发作是心血管疾病的结果......']， ['心血管'，'心脏病发作是心血管的结果......']， ['缺氧'，'慢性间歇性缺氧是……'] ]

【讨论】：

非常感谢！第一步的输出效果很好！但是当我尝试执行第 2 步时，第二次出现，在这种情况下，“心血管”消失了。
我发布的这个输出是从函数中复制/粘贴的。如果您没有得到相同的答案，您可能对输出变量感到困惑。第一步的输出应该是第二步中的 key_vals 才能正常工作
我编辑了答案，因此输出变量名称将正确匹配
知道了，谢谢！！！！！！