使用正则表达式匹配列表中的单词，答案

【问题标题】：Using Regex to match words in a list,使用正则表达式匹配列表中的单词，
【发布时间】：2020-08-03 11:55:41
【问题描述】：

我对正则表达式非常陌生，正在尝试在列表中查找以单词的每个字母开头的所有单词。

例如我有一个列表：

[' MRI', 'fMRI ', 'PPE', 'FFE']

我正在尝试使用与这些匹配的单词中的字母在文本中查找单词，如果不匹配则忽略它。

所以对于上面的列表，查找文本是否包含

Magnetic resonance imaging
functional Magnetic resonance imaging
personal protection equipment
None

我找到了几种方法可以做到这一点，但当单词在列表中时却没有。

有人能帮忙吗，不胜感激。

【问题讨论】：

您对此测试用例的预期输出究竟是什么？例如第三行是否与 PPE 匹配，因为大小写不正确？
我的预期输出是一个包含未缩写词的列表。我明白你的意思，希望搜索不区分大小写

标签： python regex list

【解决方案1】：

使用re 库。不区分大小写时，在其中使用flags=re.I 选项。

import re
acronyms=['  MRI', 'fMRI', 'PPE', 'FFE']
text="""pull porous experiment
 public protection expertise
personal protective 
equipment
here is a magnetic resonance interglobular section
with a certain energy measure is on a table"""
matched={}
for a in acronyms:
  pattern=''
  for letter in a.strip():
    pattern+='[ ]*{}[^ \n]+[ \n]+'.format(letter)
  pattern+=''
  print(a.strip(),pattern)
  matched.update({a.strip():re.findall(pattern,text,flags=re.I)})

print(matched)

matched 现在应该包含一个字典，其中包含每个首字母缩写词和每个首字母缩写词的匹配列表。

输出 matched 现在是（注意首字母缩略词已去除前导和尾随空格）

{'MRI': [' magnetic resonance interglobular '], 'fMRI': [], 'PPE': ['pull porous experiment\n ', 'public protection expertise\n', 'personal protective \nequipment\n'], 'FFE': []}

这允许结果跨越多行，但那些行尾字符 (\n) 会包含在匹配结果中。如果您更喜欢那些是空格，您可以使用例如re.sub 将[\n ]+ 替换为。

这是re 库的参考：https://docs.python.org/3/library/re.html。这是正则表达式的许多可能有用的通用解释之一：https://docs.python.org/3/howto/regex.html#regex-howto。

【讨论】：

这看起来在正确的行虽然我收到这个错误：re.error: missing), unterminated subpattern at position 22
您介意解释一下repattern，以便我了解它的含义。谢谢
你需要在pattern += ' )'之前插入标签
我尝试了这段代码，但它不起作用......我做错了什么吗？
奇怪的是这并没有返回任何东西。是否会影响某些文本是否包含破折号，例如“BOLD”是“血氧水平依赖”