【问题标题】:How do I write a universal/flexible regex in Python?如何在 Python 中编写通用/灵活的正则表达式?
【发布时间】:2021-07-15 16:44:17
【问题描述】:

我正在学习正则表达式。如您所知,人们可能有或没有中间名。我想编写一个灵活的正则表达式以供将来编译和使用。但是,我无法这样做。任何建议和/或帮助将不胜感激。下面是我没有中间名的名字的正则表达式。

import re
p = re.compile(r"\W+\s+(?P<firstname>\w+)\s+(?P<lastname>\w+)")
name = "John Drell"
m = p.search(name)

我对没有中间名的名字没有任何问题。但是,我无法为可能有或没有中间名的名称编写正确的灵活名称。这是我的测试代码之一。

import re
p = re.compile(r"\W+\s+(?P<firstname>\w+)\s+(?:P<middlename>[A-Z]*)(?P<lastname>\w+)")
name = "John M. Drell"
m = p.search(name)

此脚本只允许使用中间名的名称,否则我会收到错误消息:'NonType' object has no attribute 'groups'。

如果您能纠正我,我将不胜感激。

【问题讨论】:

  • 试试^(?P&lt;firstname&gt;\S+)(?:\s+(?P&lt;middlename&gt;\S+))?\s+(?P&lt;lastname&gt;\S+)$,见demo。另外,为什么不只是.split()
  • 是的,在空格上分割是解决这个问题的正确方法。正则表达式的第一课是尽可能避免它;)
  • 但即使在空格上吐字时也要注意,人们可以有多个中间名,而不仅仅是零或一,而且姓氏可以包含空格。
  • @WiktorStribiżew 谢谢!我得到了它。如果您可以将您的 cmets 作为答案,我会将其标记为最佳答案。

标签: python python-3.x regex


【解决方案1】:

使用split():

names = ["John M. Drell", "John Drell"]
for name in names:
    firstname, *middlenames, lastname = name.split()
    print(f'First name: {firstname}, Middle name(s): {" ".join(middlenames)}, Last name: {lastname}')

Python proof

通过正则表达式,学习使用可选组和\S 来匹配任何非空白字符:

^(?P<firstname>\S+)(?:\s+(?P<middlename>\S+(?: +\S+)*))?\s+(?P<lastname>\S+)$

regex proof

解释

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?P<firstname>           group and capture to "firstname":
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of "firstname"
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?P<middlename>            group and capture to "middlename":
--------------------------------------------------------------------------------
      \S+                      non-whitespace (all but \n, \r, \t,
                               \f, and " ") (1 or more times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
      (?:                      group, but do not capture (0 or more
                               times (matching the most amount
                               possible)):
--------------------------------------------------------------------------------
         +                       ' ' (1 or more times (matching the
                                 most amount possible))
--------------------------------------------------------------------------------
        \S+                      non-whitespace (all but \n, \r, \t,
                                 \f, and " ") (1 or more times
                                 (matching the most amount possible))
--------------------------------------------------------------------------------
      )*                       end of grouping
--------------------------------------------------------------------------------
    )                        end of "middlename"
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?P<lastname>             group and capture to "lastname":
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of "lastname"
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

【讨论】:

    猜你喜欢
    • 2011-09-13
    • 1970-01-01
    • 2020-06-12
    • 1970-01-01
    • 2010-10-11
    • 1970-01-01
    • 1970-01-01
    • 2017-06-24
    • 2020-11-06
    相关资源
    最近更新 更多