模式:~\{[^}]*\}|\d+|.~Pattern Demo
代码:(Demo)
$strings = [
'({1+1=2&2+2=4}+{1+2=3&2+3=5})-(16+10)',
'10+[{1+1=2|2+3=4}?100:0]',
'[1+1=2?10:0]'
];
foreach ($strings as $string) {
var_export(preg_match_all('~\{[^}]*\}|\d+|.~', $string, $out) ? $out[0] : []);
echo "\n";
}
输出:
array (
0 => '(',
1 => '{1+1=2&2+2=4}',
2 => '+',
3 => '{1+2=3&2+3=5}',
4 => ')',
5 => '-',
6 => '(',
7 => '16',
8 => '+',
9 => '10',
10 => ')',
)
array (
0 => '10',
1 => '+',
2 => '[',
3 => '{1+1=2|2+3=4}',
4 => '?',
5 => '100',
6 => ':',
7 => '0',
8 => ']',
)
array (
0 => '[',
1 => '1',
2 => '+',
3 => '1',
4 => '=',
5 => '2',
6 => '?',
7 => '10',
8 => ':',
9 => '0',
10 => ']',
)
至于您的问题扩展标准,只需针对letter-dot-letter 序列以及浮点值调整模式即可。
preg_match_all()(Demo):
preg_match_all('~\{[^}]*\}|\d*\.?\d+|[a-z]+\.[a-z]+|.~i', $string, $out) ? $out[0] : []
或者如果您想查看preg_split() (Demo):
preg_split('~(\{[^}]*\}|[^\w.])~', $string, 0, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY)
*注意,如果您想/需要通过将前导符号与数字绑定(但不匹配+ 和- 运算符)来识别带符号的数字(正/负),则需要进行额外的调整。除非您明确说明这是您实际项目的要求,否则我不会进入这个兔子洞。
至于解释这些模式,只要您将输入字符串和模式写入 regex101.com(或类似网站),就会自动提供正式的解释。
除此之外,我可以提供一些随意的解释:
~ #Pattern delimiter (There are many valid delimiters, this is a wise choice because the tilde is not used inside the actual pattern. This avoids having to perform any unnecessary escaping.)
\{[^}]*\} #Match (as much as possible) { followed by zero or more characters that are not } then match }
| #Or
\d*\.?\d+ #Match (as much as possible) zero or more digits, followed by an optional dot, followed by one or more digits. (This allows "0.999" and ".1" but not "99." )
| #Or
[a-z]+\.[a-z]+ #Match (as much as possible) one or more letters, followed by a dot, followed by one or more letters.
| #Or
. #Match any single non-newline character (this is intended to pick up all of the symbols/left-overs).
~ #Pattern Delimiter
i #Case-insensitive pattern modifier: this makes the regex engine treat every [a-z] like [a-zA-Z]
...再深吸一口气...
preg_split() 是explode() 的多功能版本。该模式告诉它应该发生爆炸的每个实例。
~ #Pattern delimiter
( #Start capture group
\{[^}]*\} #Match (as much as possible) { followed by zero or more characters that are not } then match }
| #Or
[^\w.] #Match any single character that is not a letter, number, underscore, or dot (same effect as: "[a-zA-Z0-9_.]"). This is intended to "catch" all of the symbols that are meant to be singled-out.
) #End capture group
~ #Pattern delimiter
换句话说,这个爆炸在每个大括号表达式或符号上。仅此一项不能按要求工作 - 必须在此函数调用上声明标志。
参数 3 是 0 这告诉 preg_split() 匹配无限次。这是函数的默认行为,但要使参数 4 起作用,我们需要使用此占位符。
参数 4 有两个部分。声明多个标志需要使用管道| 来分隔它们。
-
PREG_SPLIT_DELIM_CAPTURE :这告诉函数保留用作“爆炸点”的子字符串。没有这个标志,输出数组将不包含任何大括号表达式或符号。如果我们不打算使用这个标志,那么捕获组括号在模式中就不需要了。
-
PREG_SPLIT_NO_EMPTY:当两个“爆炸点”并排时,结果是一个空数组元素。在许多情况下(特别是这种情况),这些空元素是不可取的;这个标志消除了调用array_filter() 来清理混乱的需要。