【问题标题】:PHP- Recursive Regex to get complete Div Class with it's inner contentPHP-递归正则表达式以获取完整的 Div 类及其内部内容
【发布时间】:2021-03-11 08:52:32
【问题描述】:

我已搜索但找不到有效的解决方案。我曾尝试使用 DOM,但结果与源不同(不同的空格和标签元素 - 细微的差异,但我需要相同的源以进一步模式搜索),因此我想尝试正则表达式。这可能吗(我知道这不是最好的解决方案,但想尝试一下)?例如,是否可以返回所有 div 类“want-this-entire-div-class”,包括内部:

$html = '<div class="not-want">
        <div class="also-not-want">
    <div class="want-this-entire-div-class">
<button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
<div class="dropdown-menu j-dropdown">
<div class="header-search">
        <input type="text" name="search" value="" placeholder="Search entire site here..." 
class="search-input" data-category_id=""/>
  <button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
route=product/search&amp;search="></button>
</div>
</div>
</div>
<div class="not-want-this-also">
<div class="or-this">';

第一个 div 之后的停止>

preg_match('/

/s', $html, $match); 谢谢

【问题讨论】:

  • 也许对于每个 '
    都有类似的东西,但我不知道如何使用正则表达式来做到这一点。

标签: php regex


【解决方案1】:

解决此问题的一种方法是使用状态机。您枚举所有可能的状态,然后根据您所处的状态采取行动。在这种情况下,它是

  1. 要忽略的行
  2. 目标打开的div
  3. 要添加的行
  4. 额外打开的 div
  5. 额外关闭的 div
  6. 目标关闭 div

我不认为这是强大的,但它确实适用于给定的示例:

<?php
function inner_div(string $html_s, string $cont_s): string {
   $html_a = explode("\n", $html_s);
   $div_b = false;
   $div_n = 0;
   foreach ($html_a as $tok_s) {
      # state 2: target open div
      if (str_contains($tok_s, 'want-this-entire-div-class')) {
         $div_b = true;
      }
      # state 1: line to ignore
      if (! $div_b) {
         continue;
      }
      # state 3: line to add
      $out_a[] = $tok_s;
      # state 4: extra open div
      if (str_contains($tok_s, '<div')) {
         $div_n++;
      }
      # state 5: extra close div
      if (str_contains($tok_s, '</div>')) {
         $div_n--;
      }
      # state 6: target close div
      if ($div_n == 0) {
         break;
      }
   }
   return implode("\n", $out_a);
}

【讨论】:

    【解决方案2】:

    您是否考虑过使用现成的 html 解析库?关于使用正则表达式解析 html RegEx match open tags except XHTML self-contained tags

    的上下文

    【讨论】:

      【解决方案3】:

      输入

      $html = '<div class="not-want">
              <div class="also-not-want">
          <div class="want-this-entire-div-class">
      <button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
      <div class="dropdown-menu j-dropdown">
      <div class="header-search">
              <input type="text" name="search" value="" placeholder="Search entire site here..." 
      class="search-input" data-category_id=""/>
        <button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
      route=product/search&amp;search="></button>
      </div>
      </div>
      </div>
      <div class="not-want-this-also">
      <div class="or-this">';
      

      代码

      $document   = new DOMDocument();            // Create DOM object
      $document->loadHTML($html);                 // Load html into object
      $class_name = "want-this-entire-div-class"; // Set class name to be found
      $xpath      = new DomXPath($document);      // Create XPath object
      $node = $xpath->query("//div[@class='{$class_name}']")->item(0); // Run query on loaded html
      echo $document->saveHTML($node);            // Print result to page
      

      输出

      <div class="want-this-entire-div-class">
      <button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
      <div class="dropdown-menu j-dropdown">
      <div class="header-search">
              <input type="text" name="search" value="" placeholder="Search entire site here..." class="search-input" data-category_id=""><button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
      route=product/search&amp;search="></button>
      </div>
      </div>
      </div>
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2015-04-28
        • 1970-01-01
        • 1970-01-01
        • 2023-04-02
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多