Ruby Regex：字符串开头和结尾的空格答案

【问题标题】：Ruby Regex: empty space at beginning and end of stringRuby Regex：字符串开头和结尾的空格
【发布时间】：2021-03-22 15:40:01
【问题描述】：

我想查找名字开头或结尾有空格的所有用户。它可能看起来像：“朱丽叶”或“朱丽叶” 现在我只有在空格位于字符串末尾时才匹配正则表达式： ^[ab]:[[:space:]]|$ 我没有找到如何匹配字符串开头的空格，我不知道是否可以在一个正则表达式中完成这两个条件？感谢您的帮助。

【问题讨论】：

尝试使用^ +\w+|\w+ +$ - regex101.com/r/XYy30w/1。这是您需要的正则表达式吗？
你为什么需要它？您是要遍历所有用户还是要从数据库中查询这些用户？
你的正则表达式中的 [ab]: 是什么？
/\A | \z/ 可以工作——它匹配以空格开头或结尾的字符串。
你当然可以简单地写str.start_with?(' ') || str.end_with?(' ')（读起来很好）或str[0] == ' ' || str[-1] == ' '。

标签： regex ruby

【解决方案1】：

在没有 Regexp 的情况下测试可剥离的空白

您可以使用String#strip! 的一个小技巧，如果找不到要删除的空格，它会返回nil。例如：

# return true if str has leading/trailing whitespace;
# otherwise returns false
def strippable? str
  { str => !!str.dup.strip! }
end

# leading space, trailing space, no space
test_values = [ ' foo', 'foo ', 'foo' ]

test_values.map { |str| strippable? str }
#=> [{" foo"=>true}, {"foo "=>true}, {"foo"=>false}]

这不依赖于正则表达式，而是依赖于字符串的属性和倒置#strip! 的布尔结果。不管 Ruby 引擎是否在后台使用正则表达式，这些类型的 String 方法通常比可比较的 Regexp 匹配更快，但您的里程和具体用例可能会有所不同。

正则表达式的替代方案

使用与上面相同的测试数据，您可以使用正则表达式执行类似的操作。例如：

# leading space, trailing space, no space
test_values = [ ' foo', 'foo ', 'foo' ]

# test start/end of string
test_values = [ ' foo', 'foo ', 'foo' ].grep /\A\s+|\s+\z/
#=> [" foo", "foo "]

# test start/end of line
test_values = [ ' foo', 'foo ', 'foo' ].grep /^\s+|\s+$/
#=> [" foo", "foo "]

基准

require 'benchmark'

ITERATIONS  = 1_000_000
TEST_VALUES = [ ' foo', 'foo ', 'foo' ]

def regex_grep array
  array.grep /^\s+|\s+$/
end

def string_strip array
  array.map { |str| { str => !!str.dup.strip! } }
end

Benchmark.bmbm do |x|
  n = ITERATIONS
  x.report('regex') { n.times { regexp_grep  TEST_VALUES } }
  x.report('strip') { n.times { string_strip TEST_VALUES } }
end

            user     system      total        real
regex   1.539269   0.001325   1.540594 (  1.541438)
strip   1.256836   0.001357   1.258193 (  1.259955)

100 万次迭代中的四分之一秒可能看起来差别不大，但在大得多的数据集或迭代中，它可以加起来。是否足以关心这个特定的用例取决于你，但一般模式是原生 String 方法（不管它们是如何由引擎盖下的解释器实现的）通常比正则表达式模式匹配更快。当然也有边缘情况，但这就是基准测试的目的！

【讨论】：

【解决方案2】：

你可以使用

/\A([a-zA-Z]+ | [a-zA-Z]+)\z/
/\A(?:[[:alpha:]]+[[:space:]]|[[:space:]][[:alpha:]]+)\z/
/\A(?:\p{L}+[\p{Z}\t]|[\p{Z}\t]\p{L}+)\z/

查看Rubular demo（使用线锚而不是用于演示目的的字符串锚）

详情：

\A - 字符串起始锚
(...) - 一个捕获组
(?:...) - 非捕获组（这里首选，因为您不提取，只是验证）
[a-zA-Z]+ - 任意一个或多个 ASCII 字母
\p{L}+ - 任何一个或多个 Unicode 字母
| - 或
\z - 字符串锚点结束。

【讨论】：