在文本中匹配单词的快速方法是什么？答案

【问题标题】：what is the fast way to match words in text?在文本中匹配单词的快速方法是什么？
【发布时间】：2022-12-17 19:17:06
【问题描述】：

我有一个正则表达式列表，例如：

regex_list = [".+rive.+",".+ll","[0-9]+ blue car.+"......] ## list of length 3000

将所有这些正则表达式与我的文本匹配的最佳方法是什么

例如：

text : Hello, Owning 2 blue cars for a single driver

所以在输出中，我想要一个匹配词列表：

matched_words = ["Hello","4 blue cars","driver"]  ##Hello <==>.+llo

【问题讨论】：

标签： python regex string list

【解决方案1】：

好吧，首先，您可能想要调整您的regex_list，因为现在，匹配这些字符串会给您整个文本作为匹配项。这是因为 .+，它声明可以在任意时间后跟随任何字符。我在这里所做的如下：

import re

regex_list = [".rive.",".+ll.","[0-9]+ blue car."]
text = "Hello, Owning 2 blue cars for a single driver"

# Returns all the spans of matched regex items in text
spans = [re.search(regex_item,text).span() for regex_item in regex_list]

# Sorts the spans on first occurence (so, first element in item for every item in span).
spans.sort()

# Retrieves the text via index of spans in text.
matching_texts = [text[x[0]:x[1]] for x in spans]

print(matching_texts)

我稍微调整了你的regex_list，所以它与整个文本不匹配。然后，我从与文本的匹配中检索所有范围。此外，我在第一次出现时对跨度进行排序。最后，我通过跨度的索引检索文本并将其打印出来。您将得到以下内容

['Hello', '2 blue cars', 'driver']

注意：我不确定您为什么要匹配“4 辆蓝色汽车”，因为这不在您的文本中。

【讨论】：

您好，非常感谢您的帮助，但是通过搜索，列表中的 3000 个正则表达式需要花费大量时间