正则表达式 - 获取元素以呈现 if 语句答案

【问题标题】：Regex - get elements to render if statement正则表达式 - 获取元素以呈现 if 语句
【发布时间】：2016-10-12 20:15:32
【问题描述】：

我正在设计一个脚本并尝试在 php 中使用 without eval 的 if 构造。

仍然不完整，但通过，它是做一个模板引擎，引擎的“if”部分。不允许赋值运算符，但我需要测试值 不允许 php 代码注入，准确地说 不使用 eval 它需要在变量之间进行单独的操作以防止注入攻击。

正则表达式必须捕获

[if:(a+b-c/d*e)|(x-y)&!(z%3=0)]
    output
[elseif:('b'+'atman'='batman')]
    output2
[elseif:('b'+'atman'='batman')]
    output3
[elseif:('b'+'atman'='batman')]
    output4
[else]
    output5
[endif]

[if:(a+b-c/d*e)|(x-y)&!(z%3=0)]
    output6
[else]
    output7
[endif]

以下方法可以获取 if、elseif、else 和 endif 块以及条件语句：

$regex = '^\h*\[if:(.*)\]\R(?<if>(?:(?!\[elseif)[\s\S])+)\R^\h*\[elseif:(.*)\]\R(?<elseif>(?:(?!\[else)[\s\S])+)\R^\h*\[else.*\]\R(?<else>(?:(?!\[endif)[\s\S])+)\R^\[endif\]~xm';

请帮忙选择 elseif 和 else。

然后使用条件语句，我可以得到操作：

$regex = '~([^\^=<>+\-%/!&|()*]+)([\^+\-%/!|&*])([^\^=<>+\-%/!&|()*]*)~';

但是，它只会将它们配对，缺少每个第三个运算符...

感谢您的帮助。

【问题讨论】：

这个构造 [\S \t]+ 匹配除 [\r\n\f] 之外的任何字符，如果要匹配的块超过 1 个（贪婪），您的 [\s\S]+ 将不会停止计算。请提供更好的输入/预期输出样本。你可以试试(?s)(\[if:([^\]]+))\](.*?)\[endif\]
请看赏金cmets
PCRE 正则表达式很容易放在一起，甚至是复杂的递归。问题源于从代码中判断意图。如果你认为你知道你需要什么，只需用简单的伪代码用英语说出来，而不需要宿主语言代码的详细信息。它必须是分开的。正则表达式本身就是一种语言。一心一意。还有一件事，使用和执行正则表达式递归还需要递归使用宿主语言。
我对使用托管语言的代码很好，如果看起来不是这样，请原谅我，我的意思是我不想要 eval 或 php 注入，因此测试代码的每个部分。
是的，看起来您正在尝试获取运算符，并且它直接围绕非运算符/父/等于字符。所以基本上，所有那些相似正则表达式和代码都可以使用这个~([^\^=<>+\-%/!&|()*]+)([\^+\-%/!|&*])([^\^=<>+\-%/!&|()*]*)~ 压缩成一个 preg_match_all()。其中捕获组 2 包含运算符，您可以使用 if-then-else 逻辑对其进行测试。我希望我能提供进一步的帮助，但我不太明白你在做什么。

标签： php regex preg-match-all

【解决方案1】：

你在这里有不同的可能性。

正则表达式版本

^\h*\[if.*\]\R                        # if in the first line
(?<if>(?:(?!\[elseif)[\s\S])+)\R      # output
^\h*\[elseif.*\]\R                    # elseif
(?<elseif>(?:(?!\[else)[\s\S])+)\R    # output
^\h*\[else.*\]\R                      # elseif
(?<else>(?:(?!\[endif)[\s\S])+)\R     # output
^\[endif\]

之后，您将拥有三个命名的捕获组（if、elseif 和 else）。
见a demo for this one on regex101.com。

在PHP 中，这将是：

<?php
$code = <<<EOF
[if:(a+b-c/d*e)|(x-y)&!(z%3=0)]
output
[elseif:('b'+'atman'='batman')]
output2
out as well
[else]
output3
some other output here
[endif]
EOF;

$regex = '~
            ^\h*\[if.*\]\R                        # if in the first line
            (?<if>(?:(?!\[elseif)[\s\S])+)\R      # output
            ^\h*\[elseif.*\]\R                    # elseif
            (?<elseif>(?:(?!\[else)[\s\S])+)\R    # output
            ^\h*\[else.*\]\R                      # elseif
            (?<else>(?:(?!\[endif)[\s\S])+)\R     # output
            ^\[endif\]
          ~xm';

preg_match_all($regex, $code, $parts);
print_r($parts);
?>

编程逻辑

也许最好先浏览一下行并寻找[if...]，然后在一个字符串中捕获直到[elseif...] 的任何内容，然后将它们粘合在一起。

<?php

$code = <<<EOF
[if:(a+b-c/d*e)|(x-y)&!(z%3=0)]
output
[elseif:('b'+'atman'='batman')]
output2
out as well
[else]
output3
some other output here
[endif]
EOF;

// functions, shamelessly copied from http://stackoverflow.com/questions/834303/startswith-and-endswith-functions-in-php
function startsWith($haystack, $needle) {
    // search backwards starting from haystack length characters from the end
    return $needle === "" || strrpos($haystack, $needle, -strlen($haystack)) !== false;
}

function endsWith($haystack, $needle) {
    // search forward starting from end minus needle length characters
    return $needle === "" || (($temp = strlen($haystack) - strlen($needle)) >= 0 && strpos($haystack, $needle, $temp) !== false);
}

$code = explode("\n", $code);
$buffer = array("if" => null, "elseif" => null, "else" => null);

$pointer = false;
for ($i=0;$i<count($code);$i++) {
	$save = true;
	if (startsWith($code[$i], "[if")) {$pointer = "if"; $save = false;}
	elseif (startsWith($code[$i], "[elseif")) {$pointer = "elseif"; $save = false; }
	elseif (startsWith($code[$i], "[else")) {$pointer = "else"; $save = false; }
	elseif (startsWith($code[$i], "[endif")) {$pointer = false; $save = false; }

	if ($pointer && $save) $buffer[$pointer] .= $code[$i] . "\n";

}
print_r($buffer);

?>

【讨论】：

谢谢，这太棒了，但这是成功的一半，需要正则表达式来获取运算符/值/变量，将它们分开并执行 PEDMAS 并执行测试，我要求相当多，您为此付出了努力，----0---如果您给我完整的答案，则加分:)：编辑，无法提高赏金:(
它是否考虑了递归？（在 if 中可以有 if 吗？）
PEDMAS 是什么意思？
PEMDAS，抱歉：括号、指数、乘除法、加减法、运算优先级。
获取所有值（字符串或数字）/变量/运算符的数组，并通过测试条件解析。（对于变量，它们是我模板的一部分，不用管它们，只测试值，例如：[if:c=c] 为真 [if:1=1] 为真 [if:1.5+2.5=8 /2] 为真 [if:'string1' = 'string1'] 为真 [if:'string1' != 'string1']

【解决方案2】：

（edit在底部添加了一个简单的 if/elseif body 解析正则表达式）

使用 PCRE，我认为这个正则表达式递归应该处理嵌套
if/elseif/else/endif 构造。

在目前的形式中，它是一个松散的解析，因为它没有定义
很好[if/elseif: body ]的形式。
例如，[if: 是开始分隔符构造，] 是结尾吗？如果发生错误等。如果需要 strict 解析，可以这样做。
现在它基本上是使用[if: body ] 作为开始分隔符
和[endif] 作为查找嵌套结构的结束分隔符。

此外，它松散地将body 定义为[^\]]*，经过认真的解析
情况，必须充实以考虑报价和其他内容。
就像我说的那样，把它拆开是可行的，但更多
涉及。我已经在语言层面上做到了这一点，这并不是微不足道的。

底部有一个宿主语言使用伪代码示例。
语言递归演示了如何提取嵌套内容
正确。

正则表达式匹配核心的当前 outter shell。核心在哪里
是内部嵌套内容。

对 ParseCore() 的每次调用都是在 ParseCore() 本身内部发起的
（除了来自 main() 的初始调用。

由于范围似乎未指定，我做出了可以看到的假设
乱扔垃圾。

捕获的if/elseif 正文有一个占位符
然后可以解析 (operations) 部分，这实际上是第 2 部分
这个练习我还没有做。
注意 - 我会尝试做这个，但是我今天没有时间。

如果您有任何问题，请告诉我..

(?s)(?:(?<Content>(?&_content))|\[elseif:(?<ElseIf_Body>(?&_ifbody)?)\]|(?<Else>(?&_else))|(?<Begin>\[if:(?<If_Body>(?&_ifbody)?)\])(?<Core>(?&_core)|)(?<End>\[endif\])|(?<Error>(?&_keyword)))(?(DEFINE)(?<_ifbody>(?>[^\]])+)(?<_core>(?>(?<_content>(?>(?!(?&_keyword)).)+)|(?(<_else>)(?!))(?<_else>(?>\[else\]))|(?(<_else>)(?!))(?>\[elseif:(?&_ifbody)?\])|(?>\[if:(?&_ifbody)?\])(?:(?=.)(?&_core)|)\[endif\])+)(?<_keyword>(?>\[(?:(?:if|elseif):(?&_ifbody)?|endif|else)\])))

Formatted and tested:

 (?s)                               # Dot-all modifier

 # =====================
 # Outter Scope
 # ---------------

 (?:
      (?<Content>                        # (1), Non-keyword CONTENT
           (?&_content) 
      )
   |                                   # OR,
      # --------------
      \[ elseif:                         # ELSE IF
      (?<ElseIf_Body>                    # (2), else if body
           (?&_ifbody)? 
      )
      \]
   |                                   # OR
      # --------------
      (?<Else>                           # (3), ELSE
           (?&_else) 
      )
   |                                   # OR
      # --------------
      (?<Begin>                          # (4), IF
           \[ if: 
           (?<If_Body>                        # (5), if body
                (?&_ifbody)? 
           )
           \]
      )
      (?<Core>                           # (6), The CORE
           (?&_core) 
        |  
      )
      (?<End>                            # (7)
           \[ endif \]                        # END IF
      )
   |                                   # OR
      # --------------
      (?<Error>                          # (8), Unbalanced If, ElseIf, Else, or End
           (?&_keyword) 
      )
 )

 # =====================
 #  Subroutines
 # ---------------

 (?(DEFINE)

      # __ If Body ----------------------
      (?<_ifbody>                        # (9)
           (?> [^\]] )+
      )

      # __ Core -------------------------
      (?<_core>                          # (10)
           (?>
                #
                # __ Content ( non-keywords )
                (?<_content>                       # (11)
                     (?>
                          (?! (?&_keyword) )
                          . 
                     )+
                )
             |  
                #
                # __ Else
                # Guard:  Only 1 'else'
                # allowed in this core !!

                (?(<_else>)
                     (?!)
                )
                (?<_else>                          # (12)
                     (?> \[ else \] )
                )
             |  
                #
                # __ ElseIf
                # Guard:  Not Else before ElseIf
                # allowed in this core !!

                (?(<_else>)
                     (?!)
                )
                (?>
                     \[ elseif:
                     (?&_ifbody)? 
                     \]
                )
             |  
                #
                # IF  (block start)
                (?>
                     \[ if: 
                     (?&_ifbody)? 
                     \]
                )
                # Recurse core
                (?:
                     (?= . )
                     (?&_core) 
                  |  
                )
                # END IF  (block end)
                \[ endif \] 
           )+
      )

      # __ Keyword ----------------------
      (?<_keyword>                       # (13)
           (?>
                \[ 
                (?:
                     (?: if | elseif )
                     : (?&_ifbody)? 
                  |  endif
                  |  else
                )
                \]
           )
      )
 )

宿主语言伪代码

 bool bStopOnError = false;
 regex RxCore("....."); // Above regex ..

 bool ParseCore( string sCore, int nLevel )
 {
     // Locals
     bool bFoundError = false;
     bool bBeforeElse = true;
     match _matcher;

     while ( search ( core, RxCore, _matcher ) )
     {
       // Content
         if ( _matcher["Content"].matched == true )
           // Print non-keyword content
           print ( _matcher["Content"].str() );

           // OR, Analyze content.
           // If this 'content' has error's and wish to return.
           // if ( bStopOnError )
           //   bFoundError = true;

         else

       // ElseIf
         if ( _matcher["ElseIf_Body"].matched == true )
         {
             // Check if we are not in a recursion
             if ( nLevel <= 0 )
             {
                // Report error, this 'elseif' is outside an 'if/endif' block
                // ( note - will only occur when nLevel == 0 )
                print ("\n>> Error, 'elseif' not in block, body = " + _matcher["ElseIf_Body"].str() + "\n";

                // If this 'else' error will stop the process.
                if ( bStopOnError == true )
                   bFoundError = true;
             }
             else
             {
                 // Here, we are inside a core recursion.
                 // That means we have not hit an 'else' yet
                 // because all elseif's precede it.
                 // Print 'elseif'.
                 print ( "ElseIf: " );

                 // TBD - Body regex below
                 // Analyze the 'elseif' body.
                 // This is where it's body is parsed.
                 // Use body parsing (operations) regex on it.
                 string sElIfBody = _matcher["ElseIf_Body"].str() );

                // If this 'elseif' body error will stop the process.
                if ( bStopOnError == true )
                   bFoundError = true;
             }
         }


       // Else
         if ( _matcher["Else"].matched == true )
         {
             // Check if we are not in a recursion
             if ( nLevel <= 0 )
             {
                // Report error, this 'else' is outside an 'if/endif' block
                // ( note - will only occur when nLevel == 0 )
                print ("\n>> Error, 'else' not in block\n";

                // If this 'else' error will stop the process.
                if ( bStopOnError == true )
                   bFoundError = true;
             }
             else
             {
                 // Here, we are inside a core recursion.
                 // That means there can only be 1 'else' within
                 // the relative scope of a single core.
                 // Print 'else'.
                 print ( _matcher["Else"].str() );

                 // Set the state of 'else'.
                 bBeforeElse == false;
             }
         }

         else

       // Error ( will only occur when nLevel == 0 )
         if ( _matcher["Error"].matched == true )
         {
             // Report error
             print ("\n>> Error, unbalanced " + _matcher["Error"].str() + "\n";
             // // If this unbalanced 'if/endif' error will stop the process.
             if ( bStopOnError == true )
                 bFoundError = true;
         }

         else

       // If/EndIf block
         if ( _matcher["Begin"].matched == true )
         {
             // Print 'If'
             print ( "If:" );

             // Analyze 'if body' for error and wish to return.

             // TBD - Body regex below.
             // Analyze the 'if' body.
             // This is where it's body is parsed.
             // Use body parsing (operations) regex on it.
             string sIfBody = _matcher["If_Body"].str() );

             // If this 'if' body error will stop the process.
              if ( bStopOnError == true )
                  bFoundError = true;
              else
              {

                  //////////////////////////////
                  // Recurse a new 'core'
                  bool bResult = ParseCore( _matcher["Core"].str(), nLevel+1 );
                  //////////////////////////////

                  // Check recursion result. See if we should unwind.
                  if ( bResult == false && bStopOnError == true )
                      bFoundError = true;
                  else
                      // Print 'end'
                      print ( "EndIf" );
              }
         }

         else
         {
            // Reserved placeholder, won't get here at this time.
         }

       // Error-Return Check
         if ( bFoundError == true && bStopOnError == true )
              return false;
     }

     // Finished this core!! Return true.
     return true;
 }

 ///////////////////////////////
 // Main

 string strInitial = "...";

 bool bResult = ParseCore( strInitial, 0 );
 if ( bResult == false )
    print ( "Parse terminated abnormally, check messages!\n" );

外核匹配的输出样本
请注意，当 内核的 匹配时，会有更多匹配。

 **  Grp 0               -  ( pos 0 , len 211 ) 
[if:(a+b-c/d*e)|(x-y)&!(z%3=0)]
    output
[elseif:('b'+'atman'='batman')]
    output2
[elseif:('b'+'atman'='batman')]
    output3
[elseif:('b'+'atman'='batman')]
    output4
[else]
    output5
[endif]  
 **  Grp 1 [Content]     -  NULL 
 **  Grp 2 [ElseIf_Body] -  NULL 
 **  Grp 3 [Else]        -  NULL 
 **  Grp 4 [Begin]       -  ( pos 0 , len 31 ) 
[if:(a+b-c/d*e)|(x-y)&!(z%3=0)]  
 **  Grp 5 [If_Body]     -  ( pos 4 , len 26 ) 
(a+b-c/d*e)|(x-y)&!(z%3=0)  
 **  Grp 6 [Core]        -  ( pos 31 , len 173 ) 

    output
[elseif:('b'+'atman'='batman')]
    output2
[elseif:('b'+'atman'='batman')]
    output3
[elseif:('b'+'atman'='batman')]
    output4
[else]
    output5

 **  Grp 7 [End]         -  ( pos 204 , len 7 ) 
[endif]  
 **  Grp 8 [Error]       -  NULL 
 **  Grp 9 [_ifbody]     -  NULL 
 **  Grp 10 [_core]       -  NULL 
 **  Grp 11 [_content]    -  NULL 
 **  Grp 12 [_else]       -  NULL 
 **  Grp 13 [_keyword]    -  NULL 

-----------------------------

 **  Grp 0               -  ( pos 211 , len 4 ) 



 **  Grp 1 [Content]     -  ( pos 211 , len 4 ) 



 **  Grp 2 [ElseIf_Body] -  NULL 
 **  Grp 3 [Else]        -  NULL 
 **  Grp 4 [Begin]       -  NULL 
 **  Grp 5 [If_Body]     -  NULL 
 **  Grp 6 [Core]        -  NULL 
 **  Grp 7 [End]         -  NULL 
 **  Grp 8 [Error]       -  NULL 
 **  Grp 9 [_ifbody]     -  NULL 
 **  Grp 10 [_core]       -  NULL 
 **  Grp 11 [_content]    -  NULL 
 **  Grp 12 [_else]       -  NULL 
 **  Grp 13 [_keyword]    -  NULL 

-----------------------------

 **  Grp 0               -  ( pos 215 , len 74 ) 
[if:(a+b-c/d*e)|(x-y)&!(z%3=0)]
    output6
[else]
    output7
[endif]  
 **  Grp 1 [Content]     -  NULL 
 **  Grp 2 [ElseIf_Body] -  NULL 
 **  Grp 3 [Else]        -  NULL 
 **  Grp 4 [Begin]       -  ( pos 215 , len 31 ) 
[if:(a+b-c/d*e)|(x-y)&!(z%3=0)]  
 **  Grp 5 [If_Body]     -  ( pos 219 , len 26 ) 
(a+b-c/d*e)|(x-y)&!(z%3=0)  
 **  Grp 6 [Core]        -  ( pos 246 , len 36 ) 

    output6
[else]
    output7

 **  Grp 7 [End]         -  ( pos 282 , len 7 ) 
[endif]  
 **  Grp 8 [Error]       -  NULL 
 **  Grp 9 [_ifbody]     -  NULL 
 **  Grp 10 [_core]       -  NULL 
 **  Grp 11 [_content]    -  NULL 
 **  Grp 12 [_else]       -  NULL 
 **  Grp 13 [_keyword]    -  NULL

这是 If/ElseIf Body 正则表达式

原始

(?|((?:\s*[^\^=<>+\-%/!&|()\[\]*\s]\s*)+)([\^+\-%/*=]+)(?=\s*[^\^=<>+\-%/!&|()\[\]*\s])|\G(?!^)(?<=[\^+\-%/*=])((?:\s*[^\^=<>+\-%/!&|()\[\]*\s]\s*)+)())

弦乐

'~(?|((?:\s*[^\^=<>+\-%/!&|()\[\]*\s]\s*)+)([\^+\-%/*=]+)(?=\s*[^\^=<>+\-%/!&|()\[\]*\s])|\G(?!^)(?<=[\^+\-%/*=])((?:\s*[^\^=<>+\-%/!&|()\[\]*\s]\s*)+)())~'

展开

 (?|                                           # Branch Reset
      (                                             # (1 start), Operand
           (?: \s* [^\^=<>+\-%/!&|()\[\]*\s] \s* )+
      )                                             # (1 end)
      ( [\^+\-%/*=]+ )                              # (2), Forward Operator
      (?= \s* [^\^=<>+\-%/!&|()\[\]*\s] )
   |  
      \G 
      (?! ^ )
      (?<= [\^+\-%/*=] )
      (                                             # (1 start), Last Operand
           (?: \s* [^\^=<>+\-%/!&|()\[\]*\s] \s* )+
      )                                             # (1 end)
      ( )                                           # (2), Last-Empty Forward Operator
 )

这是如何运作的：
假设结构非常简单。
这只会解析数学操作数/运算符的东西。
它不会解析任何封闭的括号块，也不会解析任何逻辑或数学
之间的运算符。

如果需要，请提前解析任何括号块，即\( [^)* \) 或
相似的。或者拆分逻辑运算符，例如|。

正文正则表达式使用分支重置来获取操作数/操作符序列。
它总是匹配两件事。
第 1 组包含操作数，第 2 组包含运算符。

如果第 2 组为空，则第 1 组是序列中的最后一个操作数。

有效的运算符是^ + - % / * =。
包含等于 = 是因为它分隔了操作集群
并且可以被视为一种分离。

关于这个正文正则表达式的结论是它非常简单并且
只适合简单的使用。涉及任何更复杂的事情
这不会是要走的路。

输入/输出样本 1：

(a+b-c/d*e)

 **  Grp 1 -  ( pos 1 , len 1 ) 
a  
 **  Grp 2 -  ( pos 2 , len 1 ) 
+  
------------
 **  Grp 1 -  ( pos 3 , len 1 ) 
b  
 **  Grp 2 -  ( pos 4 , len 1 ) 
-  
------------
 **  Grp 1 -  ( pos 5 , len 1 ) 
c  
 **  Grp 2 -  ( pos 6 , len 1 ) 
/  
------------
 **  Grp 1 -  ( pos 7 , len 1 ) 
d  
 **  Grp 2 -  ( pos 8 , len 1 ) 
*  
------------
 **  Grp 1 -  ( pos 9 , len 1 ) 
e  
 **  Grp 2 -  ( pos 10 , len 0 )  EMPTY

输入/输出样本 2：

('b'+'atman'='batman')

 **  Grp 1 -  ( pos 1 , len 3 ) 
'b'  
 **  Grp 2 -  ( pos 4 , len 1 ) 
+  
------------
 **  Grp 1 -  ( pos 5 , len 7 ) 
'atman'  
 **  Grp 2 -  ( pos 12 , len 1 ) 
=  
------------
**  Grp 1 -  ( pos 13 , len 8 ) 
'batman'  
 **  Grp 2 -  ( pos 21 , len 0 )  EMPTY

【讨论】：