【问题标题】:How to ignore strings that start with certain pattern using regular expression in python?如何在python中使用正则表达式忽略以特定模式开头的字符串?
【发布时间】:2021-06-10 22:10:34
【问题描述】:

接受并返回@something,但拒绝first@last。

r'@([A-Z][A-Z0-9_]*[A-Z0-9])

上面的正则表达式将接受@something(以字母开头,以字母或数字结尾,中间可能有下划线,至少2个字符)并返回@符号之后的部分。

我不想返回在 @ 符号之前包含一些字母或数字 A-Z0-9 的字符串。

@ 之前的空格、换行符、特殊字符等是允许的。

代码:

re.findall(r'@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I)

【问题讨论】:

    标签: python python-3.x regex


    【解决方案1】:

    使用

    re.findall(r'(?<![A-Z0-9])@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I)
    

    regex proof

    解释

    --------------------------------------------------------------------------------
      (?<!                     look behind to see if there is not:
    --------------------------------------------------------------------------------
        [A-Z0-9]                 any character of: 'A' to 'Z', '0' to '9'
    --------------------------------------------------------------------------------
      )                        end of look-behind
    --------------------------------------------------------------------------------
      @                        '@'
    --------------------------------------------------------------------------------
      (                        group and capture to \1:
    --------------------------------------------------------------------------------
        [A-Z]                    any character of: 'A' to 'Z'
    --------------------------------------------------------------------------------
        [A-Z0-9_]*               any character of: 'A' to 'Z', '0' to
                                 '9', '_' (0 or more times (matching the
                                 most amount possible))
    --------------------------------------------------------------------------------
        [A-Z0-9]                 any character of: 'A' to 'Z', '0' to '9'
    --------------------------------------------------------------------------------
      )                        end of \1
    

    【讨论】:

      【解决方案2】:

      你可以使用

      \B@([A-Z][A-Z0-9_]*[A-Z0-9])
      

      模式匹配:

      • \B断言单词边界不匹配的位置
      • @ 字面匹配
      • ( 捕获第 1 组
        • [A-Z][A-Z0-9_]*[A-Z0-9]
      • )关闭第一组

      Regex demo

      import re
      
      text = "Accept and return @something but reject first@last."
      print(re.findall(r'\B@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I))
      

      输出

      ['something']
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-05-06
        • 2016-01-14
        • 2014-10-21
        • 2014-03-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多