如何在 Python 中编写通用/灵活的正则表达式？答案

【问题标题】：How do I write a universal/flexible regex in Python?如何在 Python 中编写通用/灵活的正则表达式？
【发布时间】：2021-07-15 16:44:17
【问题描述】：

我正在学习正则表达式。如您所知，人们可能有或没有中间名。我想编写一个灵活的正则表达式以供将来编译和使用。但是，我无法这样做。任何建议和/或帮助将不胜感激。下面是我没有中间名的名字的正则表达式。

import re
p = re.compile(r"\W+\s+(?P<firstname>\w+)\s+(?P<lastname>\w+)")
name = "John Drell"
m = p.search(name)

我对没有中间名的名字没有任何问题。但是，我无法为可能有或没有中间名的名称编写正确的灵活名称。这是我的测试代码之一。

import re
p = re.compile(r"\W+\s+(?P<firstname>\w+)\s+(?:P<middlename>[A-Z]*)(?P<lastname>\w+)")
name = "John M. Drell"
m = p.search(name)

此脚本只允许使用中间名的名称，否则我会收到错误消息：'NonType' object has no attribute 'groups'。

如果您能纠正我，我将不胜感激。

【问题讨论】：

试试^(?P<firstname>\S+)(?:\s+(?P<middlename>\S+))?\s+(?P<lastname>\S+)$，见demo。另外，为什么不只是.split()？
是的，在空格上分割是解决这个问题的正确方法。正则表达式的第一课是尽可能避免它；）
Falsehoods Programmers Believe About Names
但即使在空格上吐字时也要注意，人们可以有多个中间名，而不仅仅是零或一，而且姓氏可以包含空格。
@WiktorStribiżew 谢谢！我得到了它。如果您可以将您的 cmets 作为答案，我会将其标记为最佳答案。

标签： python python-3.x regex

【解决方案1】：

使用split():

names = ["John M. Drell", "John Drell"]
for name in names:
    firstname, *middlenames, lastname = name.split()
    print(f'First name: {firstname}, Middle name(s): {" ".join(middlenames)}, Last name: {lastname}')

见Python proof。

通过正则表达式，学习使用可选组和\S 来匹配任何非空白字符：

^(?P<firstname>\S+)(?:\s+(?P<middlename>\S+(?: +\S+)*))?\s+(?P<lastname>\S+)$

见regex proof。

解释

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?P<firstname>           group and capture to "firstname":
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of "firstname"
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?P<middlename>            group and capture to "middlename":
--------------------------------------------------------------------------------
      \S+                      non-whitespace (all but \n, \r, \t,
                               \f, and " ") (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more
                               times (matching the most amount
                               possible)):
--------------------------------------------------------------------------------
         +                       ' ' (1 or more times (matching the
                                 most amount possible))
--------------------------------------------------------------------------------
        \S+                      non-whitespace (all but \n, \r, \t,
                                 \f, and " ") (1 or more times
                                 (matching the most amount possible))
--------------------------------------------------------------------------------
      )*                       end of grouping
--------------------------------------------------------------------------------
    )                        end of "middlename"
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?P<lastname>             group and capture to "lastname":
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of "lastname"
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

【讨论】：