R正则表达式重复忽略上限答案

【问题标题】：R regular expression repetition ignores upper boundR正则表达式重复忽略上限
【发布时间】：2014-04-16 17:51:39
【问题描述】：

我尝试制作正则表达式来帮助我过滤字符串，例如

blah_blah_suffix

其中后缀是长度为 2 到 5 个字符的任何字符串。所以我想接受字符串

blah_blah_aa
blah_blah_abcd

但丢弃

blah_blah_a
blah_aaa
blah_blah_aaaaaaa

我使用 grepl 的方式如下：

samples[grepl("blah_blah_.{2,5}", samples)]

但它忽略了重复的上限 (5)。所以它丢弃了字符串 blah_blah_a， blah_aaa，但接受字符串 blah_blah_aaaaaaa。

我知道有一种方法可以在不使用正则表达式的情况下过滤字符串，但我想了解如何正确使用 grepl。

【问题讨论】：

标签： regex r grepl

【解决方案1】：

你需要将表达式绑定到行首和行尾：

^blah_blah_.{2,5}$

^ 匹配行首，$ 匹配行尾。在此处查看一个工作示例：Regex101

如果要将表达式绑定到字符串的开头和结尾（不是多行），请使用\A 和\Z 而不是^ 和$。

Anchors Tutorial

【讨论】：

【解决方案2】：

/^[\w]+_[\w]+_[\w]{2,5}$/

DEMO

Options: dot matches newline; case insensitive; ^ and $ match at line breaks

Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “_” literally «_»
Match a single character that is a “word character” (letters, digits, and underscores) «[\w]{2,5}»
   Between 2 and 5 times, as many times as possible, giving back as needed (greedy) «{2,5}»
Assert position at the end of a line (at the end of the string or before a line break character) «$»

【讨论】：