【问题标题】:R regular expression repetition ignores upper boundR正则表达式重复忽略上限
【发布时间】:2014-04-16 17:51:39
【问题描述】:

我尝试制作正则表达式来帮助我过滤字符串,例如

blah_blah_suffix

其中后缀是长度为 2 到 5 个字符的任何字符串。所以我想接受字符串

blah_blah_aa
blah_blah_abcd

但丢弃

blah_blah_a
blah_aaa
blah_blah_aaaaaaa

我使用 grepl 的方式如下:

samples[grepl("blah_blah_.{2,5}", samples)]

但它忽略了重复的上限 (5)。所以它丢弃了字符串 blah_blah_a, blah_aaa,但接受字符串 blah_blah_aaaaaaa。

我知道有一种方法可以在不使用正则表达式的情况下过滤字符串,但我想了解如何正确使用 grepl。

【问题讨论】:

    标签: regex r grepl


    【解决方案1】:

    你需要将表达式绑定到行首和行尾:

    ^blah_blah_.{2,5}$
    

    ^ 匹配行首,$ 匹配行尾。在此处查看一个工作示例:Regex101

    如果要将表达式绑定到字符串的开头和结尾(不是多行),请使用\A\Z 而不是^$

    Anchors Tutorial

    【讨论】:

      【解决方案2】:
      /^[\w]+_[\w]+_[\w]{2,5}$/
      

      DEMO

      Options: dot matches newline; case insensitive; ^ and $ match at line breaks
      
      Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
      Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      Match the character “_” literally «_»
      Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      Match the character “_” literally «_»
      Match a single character that is a “word character” (letters, digits, and underscores) «[\w]{2,5}»
         Between 2 and 5 times, as many times as possible, giving back as needed (greedy) «{2,5}»
      Assert position at the end of a line (at the end of the string or before a line break character) «$»
      

      【讨论】:

        猜你喜欢
        • 2019-11-17
        • 2019-02-08
        • 2013-08-13
        • 2015-10-02
        • 2012-10-22
        • 1970-01-01
        • 2012-05-15
        • 2018-01-21
        • 2018-08-24
        相关资源
        最近更新 更多