如何验证字符串的第一个字符不是整数？答案

【问题标题】：How to validate that the first char of a string is not a integer?如何验证字符串的第一个字符不是整数？
【发布时间】：2012-12-03 05:54:37
【问题描述】：

我正在使用 Ruby on Rails 3.2.9，我想验证名称的第一个字符（String）是否不是数字（Integer）。我正在尝试使用以下代码：

class User < ActiveRecord::Base
  validates_each :name do |record, attr, value|
    record.errors.add(attr, 'cannot begin with a number') if ... # the first char is a number
  end
end

我怎样才能做到这一点？

【问题讨论】：

在将行插入数据库之前是否要进行验证？
@suresh.g - 是的，在将行插入数据库之前进行验证。

标签： ruby-on-rails ruby regex ruby-on-rails-3 validation

【解决方案1】：

record.errors.add(attr, 'cannot begin with a number') if value =~ /^[0-9].*/

将匹配第一个字符为数字的任何字符串

【讨论】：

你能解释一下你的正则表达式吗？例如，它如何匹配第一个字符？
在正则表达式中，^字符表示“从字符串的开头匹配，[0-9]表示在这个位置，字符可以是0-9内的任何字符，通常你使用0-9, AZ 和 az，然后 .* 说在那之后，只匹配任何东西。所以正则表达式说“匹配任何第一个字符是数字的字符串，其余的我不关心：P”跨度>
从技术上讲，它并没有真正做到这一点。你真的想要：/\A\d/
刚学到一个新东西:) 所以 "ffdsa\n3fdsa\n" =~ /^[0-9].*/ 会说有错误，因为用^代替\A在每个新行上再次运行正则表达式....关于 \d 和 [0-9] 它们是相同的，\d 只是一个缩写:)
[0-9] 和\d 在技术上是相同的，但\d 更简单，更容易让我们一目了然地解析和理解，因此更不容易出错。添加.* 只是表明您不了解模式的工作原理； .* 没有为这个任务完成任何任务，但是为引擎添加了额外的指令。

【解决方案2】：

查看答案，有多种不同的测试方法。有些让我想知道他们是否会神奇地更快，所以，像往常一样，我做了一个基准测试：

require 'benchmark'

puts "Ruby = #{ RUBY_VERSION }"
str = 'foobar'

puts 'correct result should be false...'
puts !!( str =~ /^\d/                           )
puts !!( str =~ /\A\d/                          )
puts !!( str =~ /^[0-9].*/                      )
puts !!( str.split('').first.to_i.is_a?(Fixnum) )
puts !!( (48..57).include?(str[0])              )

puts !!( ('0'..'9') === str[0]                  )
puts !!( str[/^\d/]                             )
puts !!( str[/\A\d/]                            )
puts !!( str[/\A[0-9]/]                         )
puts !!( str =~ /\A[0-9]/                       )

puts

n = 1_000_000

puts "n = 1_000_000"
puts "str = 'foobar'"
Benchmark::bm(17) do |b|
  b.report('^\d regex')         { n.times { str =~ /^\d/                           } }
  b.report('\A\d regex')        { n.times { str =~ /\A\d/                          } }
  b.report('^[0-9].* regex')    { n.times { str =~ /^[0-9].*/                      } }
  b.report('start_with?')       { n.times { str.start_with?(*('0'..'9'))           } }
  b.report("split('')")         { n.times { str.split('').first.to_i.is_a?(Fixnum) } }
  b.report("(48..57).include?") { n.times { (48..57).include?(str[0])              } }

  b.report('range')           { n.times { ('0'..'9') === str[0] } }
  b.report('str[/^\d/]')      { n.times { str[/^\d/]            } }
  b.report('str[/\A\d/]')     { n.times { str[/\A\d/]           } }
  b.report('str[\A[0-9]')     { n.times { str[/\A[0-9]/]        } }
  b.report('\A[0-9] regex')   { n.times { str =~ /\A[0-9]/      } }
end

puts

str = 'foobar' * 1000
puts "str = 'foobar' * 1000"
Benchmark::bm(17) do |b|
  b.report('^\d regex')         { n.times { str =~ /^\d/                 } }
  b.report('\A\d regex')        { n.times { str =~ /\A\d/                } }
  b.report('^[0-9].* regex')    { n.times { str =~ /^[0-9].*/            } }
  b.report('start_with?')       { n.times { str.start_with?(*('0'..'9')) } }
  b.report("(48..57).include?") { n.times { (48..57).include?(str[0])    } }

  b.report('range')           { n.times { ('0'..'9') === str[0] } }
  b.report('str[/^\d/]')      { n.times { str[/^\d/]            } }
  b.report('str[/\A\d/]')     { n.times { str[/\A\d/]           } }
  b.report('str[\A[0-9]')     { n.times { str[/\A[0-9]/]        } }
  b.report('\A[0-9] regex')   { n.times { str =~ /\A[0-9]/      } }
end

测试结果：

Ruby = 1.9.3
correct result should be false...
false
false
false
true
false
false
false
false
false
false

基准测试结果：

n = 1_000_000
str = 'foobar'
                        user     system      total        real
^\d regex           0.590000   0.000000   0.590000 (  0.593534)
\A\d regex          0.560000   0.000000   0.560000 (  0.556304)
^[0-9].* regex      0.580000   0.000000   0.580000 (  0.577662)
start_with?         4.020000   0.000000   4.020000 (  4.025604)
split('')           6.850000   0.000000   6.850000 (  6.872157)
(48..57).include?  17.260000   0.780000  18.040000 ( 18.038887)
range               1.260000   0.000000   1.260000 (  1.258191)
str[/^\d/]          0.680000   0.000000   0.680000 (  0.680291)
str[/\A\d/]         0.660000   0.000000   0.660000 (  0.663305)
str[\A[0-9]         0.670000   0.000000   0.670000 (  0.670242)
\A[0-9] regex       0.570000   0.000000   0.570000 (  0.574152)

为了测试\A 是否比^ 快，看看长字符串会有什么影响，我增加了字符串大小。 "split('')" 被拉出，因为它在 60 多秒后没有完成：

str = 'foobar' * 1000
                        user     system      total        real
^\d regex          15.010000   0.000000  15.010000 ( 15.020488)
\A\d regex          0.540000   0.010000   0.550000 (  0.539736)
^[0-9].* regex     15.000000   0.000000  15.000000 ( 15.011137)
start_with?         4.010000   0.000000   4.010000 (  4.010340)
(48..57).include?  17.320000   0.770000  18.090000 ( 18.124795)
range               1.250000   0.000000   1.250000 (  1.255724)
str[/^\d/]         15.120000   0.010000  15.130000 ( 15.142242)
str[/\A\d/]         0.650000   0.000000   0.650000 (  0.656198)
str[\A[0-9]         0.650000   0.000000   0.650000 (  0.652306)
\A[0-9] regex       0.550000   0.000000   0.550000 (  0.544415)

我用 1.8.7 重新测试：

Ruby = 1.8.7
correct result should be false...
false
false
false
true
false
false
false
false
false
false

n = 1_000_000
str = 'foobar'
                       user     system      total        real
^\d regex          0.570000   0.000000   0.570000 (  0.565397)
\A\d regex         0.550000   0.000000   0.550000 (  0.552270)
^[0-9].* regex     0.570000   0.000000   0.570000 (  0.574705)
start_with?       38.180000   0.070000  38.250000 ( 39.864171)
split('')          9.750000   0.040000   9.790000 ( 11.025962)
(48..57).include?  0.580000   0.000000   0.580000 (  0.917499)
range              2.420000   0.020000   2.440000 (  3.170774)
str[/^\d/]         0.700000   0.000000   0.700000 (  0.760180)
str[/\A\d/]        0.680000   0.000000   0.680000 (  0.762636)
str[\A[0-9]        0.660000   0.010000   0.670000 (  0.795043)
\A[0-9] regex      0.600000   0.000000   0.600000 (  0.684566)

str = 'foobar' * 1000
                       user     system      total        real
^\d regex          7.900000   0.040000   7.940000 ( 10.735175)
\A\d regex         0.600000   0.010000   0.610000 (  0.784001)
^[0-9].* regex     7.850000   0.020000   7.870000 (  8.251673)
(48..57).include?  0.580000   0.000000   0.580000 (  0.683730)
range              2.380000   0.020000   2.400000 (  2.738234)
str[/^\d/]         7.930000   0.010000   7.940000 (  8.227906)
str[/\A\d/]        0.670000   0.000000   0.670000 (  0.682169)
str[\A[0-9]        0.680000   0.000000   0.680000 (  0.697340)
\A[0-9] regex      0.580000   0.000000   0.580000 (  0.645136)

你们自己讨论。

【讨论】：

为什么范围示例在这里对长字符串有效？
老实说，我不知道。我预计锚定的正则表达式会以最快的速度逃跑，并且对range 感到惊讶。
/^\d/ 仍然需要检查整个字符串，而 /\A\d/ 可能会立即失败

【解决方案3】：

刚刚发现你可以使用 Range：

"1hello".start_with?(*('0'..'9')) #=> true

【讨论】：

是的，你可以，但是它将它变成一个数组，这将比一个简单的正则表达式效率低。
@the Tin Man：不是数组（会导致错误），而是参数列表。对它进行基准测试，是的，它慢了大约 5 倍。有趣的是，在较长的字符串上 ("1hello"*100) start_with? 更快
嗯，参数列表或数组，它在 IRB 中看起来像这样：irb(main):004:0> asdf = *('0'..'9') => ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]，然后嘎嘎叫：irb(main):005:0> asdf.class => Array

【解决方案4】：

可以通过数组访问获取字符串的第一个字符并比较ascii值：

1.8.7 :008 > (48..57).include?("5ssdfsdf"[0])
 => true 
1.8.7 :009 > (48..57).include?("ssdfsdf"[0])
 => false 
1.8.7 :010 > (48..57).include?("0sdfsdf"[0])
 => true 
1.8.7 :011 > (48..57).include?("9sdfsdf"[0])
 => true

【讨论】：

这在 1.8.7 中可以，但在 1.9+ 中表现不佳。查看基准。