尽管正则表达式不是最好的方法,但这里有一个递归匹配的解决方案:
(?>(?>\([^()]*(?R)?[^()]*\))|(?>\[[^[\]]*(?R)?[^[\]]*\])|(?>{[^{}]*(?R)?[^{}]*})|(?>"[^"]*")|(?>[^(){}[\]", ]+))(?>[ ]*(?R))*
如果我们把它分解,里面有一个组,里面有一些东西,后面是更多相同类型的匹配,用可选的空格分隔。
(?> <---- start matching
... <---- some stuff inside
) <---- end matching
(?>
[ ]* <---- optional spaces
(?R) <---- match the entire thing again
)* <---- can be repeated
从你的例子0, (1,2), (1,2,(1,2)) [1,2,3,[1,2]], [1,2,3],...,我们想要匹配:
0
(1,2)
(1,2,(1,2)) [1,2,3,[1,2]]
[1,2,3]
...
对于第三个匹配,里面的东西会匹配(1,2,(1,2))和[1,2,3,[1,2]],它们之间用空格隔开。
里面的东西是一系列的选项:
(?>
(?>...)| <---- will match balanced ()
(?>...)| <---- will match balanced []
(?>...)| <---- will match balanced {}
(?>...)| <---- will match "..."
(?>...) <---- will match anything else without space or comma
)
以下是选项:
\( <---- literal (
[^()]* <---- any number of chars except ( or )
(?R)? <---- match the entire thing optionally
[^()]* <---- any number of chars except ( or )
\) <---- literal )
\[ <---- literal [
[^[\]]* <---- any number of chars except [ or ]
(?R)? <---- match the entire thing optionally
[^[\]]* <---- any number of chars except [ or ]
\] <---- literal ]
{ <---- literal {
[^{}]* <---- any number of chars except { or }
(?R)? <---- match the entire thing optionally
[^{}]* <---- any number of chars except { or }
} <---- literal }
" <---- literal "
[^"]* <---- any number of chars except "
" <---- literal "
[^(){}[\]", ]+ <---- one or more chars except comma, or space, or these: (){}[]"
注意这不匹配一个逗号分隔的列表,而是这样一个列表中的项目。在上面的最后一个选项中排除逗号和空格会导致它在逗号或空格处停止匹配(除了我们在重复匹配之间明确允许的空格)。