正斜杠后带有单词边界的 MySQL REGEXP答案

【问题标题】：MySQL REGEXP with word boundary after forward slash正斜杠后带有单词边界的 MySQL REGEXP
【发布时间】：2016-05-31 15:45:34
【问题描述】：

我在数据库表中有一些没有http:// 的 URL：

        url
row #1: 10.1.127.4/
row #2: 10.1.127.4/something

现在，下面的过滤器给了我第 2 行 - 很好：

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4/something[[:>:]]'

但是下面的过滤器没有给我第 1 行，但不应该吗？

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4/[[:>:]]'

我应该注意，通过反斜杠转义正斜杠也不会返回所需的第 1 行：

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4\/[[:>:]]'

【问题讨论】：

试试'[[:<:]]10.1.127.4/'
好吧，我很想要右边的词边界:)
没有意义，/和字符串结尾之间没有字边界。
请具体说明具体要求。如果要在字符串末尾或非数字字符之前进行匹配，请使用'[[:<:]]10[.]1[.]127[.]4/([[:>:]]|$)'。点应转义或放入括号表达式中。
如果您需要匹配任何数字（1 个或多个），请使用 [0-9]+，实际上 /([[:>:]] 将在 alnum 字符之前匹配 /。如果你需要相反的，使用'[[:<:]]10[.]1[.]127[.]4/([^[:alnum:]]|$)'

标签： mysql regex trailing-slash word-boundary

【解决方案1】：

根据文档：http://dev.mysql.com/doc/refman/5.7/en/regexp.html

[[:<:>:]]

这些标记代表单词边界。它们匹配开头和词尾，分别。单词是单词字符的序列前面或后面没有单词字符。一个字字符是 alnum 类中的字母数字字符或下划线 (_)。

/ 不是 alnum 成员，因此它不是单词边界。

【讨论】：

/ 不是 alnum 成员，因此它不是单词边界。 - 不，没有单词边界 between /和字符串的结尾。字边界是零宽度断言。

【解决方案2】：

SELECT * FROM mytable WHERE mycolumn REGEXP "[[:<:]][0-9]{1,3}\\.([0-9]{1,3}.?){3}((\\/)?[^ ]*)?[[:>:]]";

[[:<:]][0-9]{1,3}\.([0-9]{1,3}.?){3}((\/)?[^ ]*)?[[:>:]]

Assert position at the beginning of a word (position followed by but not preceded by an ASCII letter, digit, or underscore) «[[:<:]]»
Match a single character in the range between “0” and “9” «[0-9]{1,3}»
   Between one and 3 times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «{1,3}»
Match the character “.” literally «\.»
Match the regex below and capture its match into backreference number 1 «([0-9]{1,3}.?){3}»
   Exactly 3 times «{3}»
      You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «{3}»
   Match a single character in the range between “0” and “9” «[0-9]{1,3}»
      Between one and 3 times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «{1,3}»
   Match any single character that is NOT a line break character (line feed) «.?»
      Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
Match the regex below and capture its match into backreference number 2 «((\/)?[^ ]*)?»
   Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
   Match the regex below and capture its match into backreference number 3 «(\/)?»
      Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
      Match the character “/” literally «\/»
   Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^ ]*»
      Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
      The literal character “ ” « »
Assert position at the end of a word (position preceded by but not followed by an ASCII letter, digit, or underscore) «[[:>:]]»

【讨论】：

【解决方案3】：

发现 [[:>:]] 需要一个单词字符到 left，反之亦然 [[:<:>

简单的测试验证：

SELECT 'bla,,123' REGEXP '[[:<:]]bla,[[:>:]]' -- no match
SELECT 'bla,,123' REGEXP '[[:<:]]bla[[:>:]]' -- match
SELECT 'bla,,123' REGEXP '[[:<:]]bla,,123[[:>:]]' -- match

我认为这样的文档是有道理的，但我误解了好几年：

[...] 字边界。它们匹配单词的开头和结尾， [...]

所以，需要一个词的边界

一侧的非单词字符
另一边还有一个单词字符

【讨论】：