【问题标题】:Unable to get customized result using regex无法使用正则表达式获得自定义结果
【发布时间】:2019-07-07 15:03:50
【问题描述】:

我正在尝试弄清楚如何修改现有的正则表达式模式或创建一个新模式以获取其中包含 dear 的所有行。如果匹配,则脚本应打印从: 到这些行末尾的所有行。在这里,字符串操作不是获取结果的选项。

我试过了:

import re

instr = """
Expression: It's been a while man.
Expression: How have you been moron?
Expression: Good to see you dear.
Greeting: How is everything dear?
Greeting: Hi dear, how are you?
"""
pattern = r'.*(?<=dear)'

for item in instr.splitlines():
    if re.search(pattern, item):
        print(item)

我得到的结果:

Expression: Good to see you dear.
Greeting: How is everything dear?
Greeting: Hi dear, how are you?

我想得到什么:

Good to see you dear.
How is everything dear?
Hi dear, how are you?

如何使用正则表达式获得自定义结果?

【问题讨论】:

    标签: python regex python-3.x


    【解决方案1】:

    您可以使用正向 Lookbehind 来仅捕获冒号之后的内容。这样的事情应该可以工作:

    (?<=:).*\bdear\b.*
    

    Demo

    我使用了单词边界断言\b 来避免匹配诸如“deaerator”之类的东西。如果这不是所需的行为,请随意删除它们。

    【讨论】:

      【解决方案2】:

      另一种选择是使用锚点^ 和捕获组:

      ^[^:]*:\s*(.*\bdear\b.*)
      

      说明

      • ^ 字符串开始
      • [^:]*:Match 0+ times not:, then:`
      • \s* 匹配 0+ 次空格字符
      • (抓包组
        • .*\bdear\b.* 匹配单词边界和它左右两边的任何字符
      • )关闭捕获组

      Regex demo | Python demo

      例如:

      import re
      
      instr = """
      Expression: It's been a while man.
      Expression: How have you been moron?
      Expression: Good to see you dear.
      Greeting: How is everything dear?
      Greeting: Hi dear, how are you?
      """
      pattern = r'^[^:]*:\s*(.*\bdear\b.*)'
      
      for item in instr.splitlines():
          res = re.search(pattern, item)
          if res:
              print(res.group(1))
      

      结果

      Good to see you dear.
      How is everything dear?
      Hi dear, how are you?
      

      【讨论】:

        【解决方案3】:
        >>> for m in re.finditer(r'^[^:]+:\s*(.*dear.*)', instr, flags=re.M):
        ...     print(m[1])
        ... 
        Good to see you dear.
        How is everything dear?
        Hi dear, how are you?
        
        • re.finditer 遍历所有匹配项
        • flags=re.M 以便 ^$ 锚点将每行匹配,而不是每个字符串匹配一次
        • ^[^:]+:\s* 覆盖从行首到 : 和可选空格的字符串
        • (.*dear.*) 匹配包含dear 的行的其余部分(注意. 默认不会匹配换行符)
        • 由于所需的字符串在捕获组内,m[1] 将只给出该部分而不是整行
          • 如果 Python 版本低于 3.6,请使用 m.group(1)

        【讨论】:

          【解决方案4】:

          这个表情,

          (?=:.*\bdear\b):\s*(.*)
          

          可能在这里工作。

          表达式在this demo 的右上角进行了解释,如果您想进一步探索或修改它,在this link 中,您可以逐步观察它如何与一些示例输入进行匹配,如果您喜欢。

          re.findall测试

          import re
          
          regex = r"(?=:.*\bdear\b):\s*(.*)"
          
          test_str = ("Expression: It's been a while man.\n"
              "Expression: How have you been moron?\n"
              "Expression: Good to see you dear.\n"
              "Greeting:      How is everything dear?\n"
              "Greeting: Hi dear, how are you?\n"
              "Greeting:   Hi dear, how are you?\n"
              "dear: Hi there, how are you?")
          
          print(re.findall(regex, test_str))
          

          re.finditer测试

          import re
          
          regex = r"(?=:.*\bdear\b):\s*(.*)"
          
          test_str = ("Expression: It's been a while man.\n"
              "Expression: How have you been moron?\n"
              "Expression: Good to see you dear.\n"
              "Greeting:      How is everything dear?\n"
              "Greeting: Hi dear, how are you?\n"
              "Greeting:   Hi dear, how are you?\n"
              "dear: Hi there, how are you?")
          
          matches = re.finditer(regex, test_str, re.MULTILINE)
          
          for matchNum, match in enumerate(matches, start=1):
          
              print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
          
              for groupNum in range(0, len(match.groups())):
                  groupNum = groupNum + 1
          
                  print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 2021-04-07
            • 1970-01-01
            • 1970-01-01
            • 2018-06-27
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            相关资源
            最近更新 更多