正则表达式与字符串结尾不匹配答案

【问题标题】：Regular expression not matching end of string正则表达式与字符串结尾不匹配
【发布时间】：2018-02-03 03:41:53
【问题描述】：

我正在尝试解析以下字符串，{} 中的任何内容都需要保留在一个块中。其余符号需要保留，但在它们自己的数组键中。

$string_1 = ({Product.Depth+1.25=2.5&2.0+2.0=4}+{1.0+2.5=3.0&2.0+3.0=5.0})-(16.75+10.9375)
$string_2 = Product.Width+[{1.0+1.0=2.0|2.0+3.0=4}?100.00:0.00]
$string_3 = [1+1=2?10.00:Product.Depth]

这是我目前得到的，适用于前两个字符串，但不适用于第三个字符串。

preg_match_all("/[()=]|\\{[^\}]+\\}|[+-]|[^=]+$/", $string_to_parse, $matches);

现在它返回类似这样的内容...您可以看到它在键 7、8、9 之间缺少一些数字。键 14 也删除了一些数字。

array(1) {
[0]=>
array(15) {
[0]=>
string(1) "="
[1]=>
string(1) "("
[2]=>
string(13) "{1+1=2&2+2=4}"
[3]=>
string(1) "+"
[4]=>
string(13) "{1+2=3&2+3=5}"
[5]=>
string(1) ")"
[6]=>
string(1) "="
[7]=>
string(1) "+"
[8]=>
string(1) "+"
[9]=>
string(1) "+"
[10]=>
string(13) "{1+1=2|2+3=4}"
[11]=>
string(1) "-"
[12]=>
string(1) "+"
[13]=>
string(1) "="
[14]=>
string(7) "2?10:0]"
}
}

我该如何解决这个问题？

【问题讨论】：

我发现很难准确理解您在这里尝试的内容。您是否尝试匹配符号/运算符/组，而不是单个数字？或许更清楚一点，以及正在测试的实际代码（例如向我们展示 $string_to_parse 而不是测试字符串 - 实际代码）。
您能否向我们提供您提到的所有 3 个字符串的预期输出？

标签： php regex

【解决方案1】：

模式：~\{[^}]*\}|\d+|.~Pattern Demo

代码：(Demo)

$strings = [
    '({1+1=2&2+2=4}+{1+2=3&2+3=5})-(16+10)',
    '10+[{1+1=2|2+3=4}?100:0]',
    '[1+1=2?10:0]'
];
foreach ($strings as $string) {
    var_export(preg_match_all('~\{[^}]*\}|\d+|.~', $string, $out) ? $out[0] : []);
    echo "\n";
}

输出：

array (
  0 => '(',
  1 => '{1+1=2&2+2=4}',
  2 => '+',
  3 => '{1+2=3&2+3=5}',
  4 => ')',
  5 => '-',
  6 => '(',
  7 => '16',
  8 => '+',
  9 => '10',
  10 => ')',
)
array (
  0 => '10',
  1 => '+',
  2 => '[',
  3 => '{1+1=2|2+3=4}',
  4 => '?',
  5 => '100',
  6 => ':',
  7 => '0',
  8 => ']',
)
array (
  0 => '[',
  1 => '1',
  2 => '+',
  3 => '1',
  4 => '=',
  5 => '2',
  6 => '?',
  7 => '10',
  8 => ':',
  9 => '0',
  10 => ']',
)

至于您的问题扩展标准，只需针对letter-dot-letter 序列以及浮点值调整模式即可。

preg_match_all()(Demo)：

preg_match_all('~\{[^}]*\}|\d*\.?\d+|[a-z]+\.[a-z]+|.~i', $string, $out) ? $out[0] : []

或者如果您想查看preg_split() (Demo)：

preg_split('~(\{[^}]*\}|[^\w.])~', $string, 0, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY)

*注意，如果您想/需要通过将前导符号与数字绑定（但不匹配+ 和- 运算符）来识别带符号的数字（正/负），则需要进行额外的调整。除非您明确说明这是您实际项目的要求，否则我不会进入这个兔子洞。

至于解释这些模式，只要您将输入字符串和模式写入 regex101.com（或类似网站），就会自动提供正式的解释。

除此之外，我可以提供一些随意的解释：

~               #Pattern delimiter (There are many valid delimiters, this is a wise choice because the tilde is not used inside the actual pattern.  This avoids having to perform any unnecessary escaping.)
\{[^}]*\}       #Match (as much as possible) { followed by zero or more characters that are not } then match }
|               #Or
\d*\.?\d+       #Match (as much as possible) zero or more digits, followed by an optional dot, followed by one or more digits. (This allows "0.999" and ".1" but not "99." )
|               #Or
[a-z]+\.[a-z]+  #Match (as much as possible) one or more letters, followed by a dot, followed by one or more letters.
|               #Or
.               #Match any single non-newline character (this is intended to pick up all of the symbols/left-overs).
~               #Pattern Delimiter
i               #Case-insensitive pattern modifier: this makes the regex engine treat every [a-z] like [a-zA-Z]

...再深吸一口气...

preg_split() 是explode() 的多功能版本。该模式告诉它应该发生爆炸的每个实例。

~          #Pattern delimiter
(          #Start capture group
\{[^}]*\}  #Match (as much as possible) { followed by zero or more characters that are not } then match }
|          #Or
[^\w.]     #Match any single character that is not a letter, number, underscore, or dot (same effect as: "[a-zA-Z0-9_.]").  This is intended to "catch" all of the symbols that are meant to be singled-out.
)          #End capture group
~          #Pattern delimiter

换句话说，这个爆炸在每个大括号表达式或符号上。仅此一项不能按要求工作 - 必须在此函数调用上声明标志。

参数 3 是 0 这告诉 preg_split() 匹配无限次。这是函数的默认行为，但要使参数 4 起作用，我们需要使用此占位符。

参数 4 有两个部分。声明多个标志需要使用管道| 来分隔它们。

PREG_SPLIT_DELIM_CAPTURE ：这告诉函数保留用作“爆炸点”的子字符串。没有这个标志，输出数组将不包含任何大括号表达式或符号。如果我们不打算使用这个标志，那么捕获组括号在模式中就不需要了。
PREG_SPLIT_NO_EMPTY：当两个“爆炸点”并排时，结果是一个空数组元素。在许多情况下（特别是这种情况），这些空元素是不可取的；这个标志消除了调用array_filter() 来清理混乱的需要。

【讨论】：

太完美了！非常感谢。我很想了解它是如何工作的，因为阅读文档并不能教会像我这样的学习障碍
如果您愿意，我很乐意与您聊天。我相信如果有人回答有关正则表达式的问题，我最终会更好地理解它。
如果您不介意最后一次帮助的话，我忽略了一件事。我需要能够捕获诸如Product.Width 之类的变量。我已经更新了 OP 中的字符串。变量应保持在一起，类似于{} 语句
我之前的问题是否正确执行了此操作。 \{[^}]*\}|\d+|\w+\.\w+|.
我的模式是假设你的字符串中的所有数字都是整数（没有小数）。如果这是真的，那么您的新模式将正常工作。我不知道我今天是否有时间聊天（当然不是我的手机——打字太慢/不舒服）。当我到达我的电脑时，我会用解释更新我的答案。