【问题标题】:Parse the h2 and the next tag in PHP在 PHP 中解析 h2 和下一个标签
【发布时间】:2020-01-31 05:09:47
【问题描述】:

我需要从以下字符串创建一个数组。

$body = '<h2>Heading one</h2>
         <p>Lorem ipsum dolor</p>

         <h2>Heading two</h2>
         <ul>
           <li>list item one.</li>
           <li>List item two.</li>
         </ul>

         <h2>Heading three</h2>
         <table class="table">
           <tbody>
             <tr>
               <td>Table data one</td>
               <td>Description of table data one</td>
             </tr>
             <tr>
               <td>Table data two</td>
               <td>Description of table data two</td>
             </tr>
           </tbody>
         </table>';

我可以使用h2标签作为第一个索引来获取'question'的值。

$dom = new \DOMDocument();
$dom->loadHTML($body);
$xPath = new \DOMXpath($dom);

$question_answer = [];
$tags = $dom->getElementsByTagName('h2');
foreach ($tags as $tag) {
  $next_element = $xPath->query('./following-sibling::p', $tag);
  $question_answer[] = [
    'question' => $tag->nodeValue,
    'answer' =>  $next_element->item(0)->nodeValue,
  ];
}

echo '<pre>';
print_r($question_answer);
echo '</pre>';

结合@Kevin 的建议,该建议非常适合 p 标签并产生以下输出:

Array
(
    [0] => Array
        (
            [question] => Heading one
            [answer] => Lorem ipsum dolor
        )

    [1] => Array
        (
            [question] => Heading two
            [answer] => 
        )

    [2] => Array
        (
            [question] => Heading three
            [answer] => 
        )

)

现在我只需要解决answer 下一个标签何时是无序列表或表格。对于表格,我只对 td 标签感兴趣。

【问题讨论】:

    标签: php html dom domdocument


    【解决方案1】:

    由于您要迭代每个 h2 标记,因此请相对于当前标记使用 following-sibling::p

    foreach ($tags as $tag) {
        $next_element = $xPath->query('./following-sibling::p', $tag);
        if ($next_element->length <= 0) continue; //skip it if p not found
        $question_answer[] = [
            'question' => $tag->nodeValue,
            'answer' => $next_element->item(0)->nodeValue,
        ];
    }
    

    【讨论】:

    • 适用于 p 标签。我可以在 xPath 查询中添加 or 选项吗?
    • @esod 是的,你可以,这是一个小提琴tehplayground.com/hzWjPoVwUrURA365
    • 谢谢。我需要将ul 工作到数组中,以便组合“答案”。如果它们只是像List item one. List item two. 一样组合起来也没关系 可能可以用标题三做同样的事情,所以它也会像Table data one. Description of table data one. Table data two. Description of table data two. Table data three. Description of table data three 这样读入一个字符串。这是您的代码工作的小提琴。 tehplayground.com/QHtYKNADDJA0K3bu
    • 在这种情况下我们不会使用表格标记。我已经更新了操场tehplayground.com/cAdc7P59bHyiRyVI,并将更新这个问题的代码。再次感谢。
    【解决方案2】:

    我们暂时排除表格标记,因为它可能与此用例无关。内容如下:

    $body = '<h2>Heading one</h2>
           <p>Lorem ipsum dolor</p>
    
           <h2>Heading two</h2>
           <ul>
             <li>List item one.</li>
             <li>List item two.</li>
           </ul>';
    

    解决方法代码如下:

    $dom = new \DOMDocument();
    $dom->loadHTML($body);
    $xPath = new \DOMXpath($dom);
    
    $question_answer = [];
    $tags = $dom->getElementsByTagName('h2');
    foreach ($tags as $tag) {
      $possible_answer = $xPath->query('./following-sibling::p | ./following-sibling::ul', $tag);
    
      if ($possible_answer->length <= 0) {
        continue;
      }
    
      if ($possible_answer->item(0)->tagName === 'p') {
        $answer = $possible_answer->item(0)->nodeValue;
        $question_answer[] = [
          'question' => $tag->nodeValue,
          'answer' => $answer,
        ];
      }
    
      elseif ($possible_answer->item(0)->tagName === 'ul') {
        $li_dom = [];
        foreach ($possible_answer->item(0)->getElementsByTagName('li') as $li) {
          $li_dom[] = $li->nodeValue;
        }
        $li_dom = implode(" ", $li_dom);
    
          $question_answer[] = [
            'question' => $tag->nodeValue,
            'answer' => $li_dom,
          ];
        }
      }
    
    echo '<pre>';
    print_r($question_answer);
    echo '</pre>';
    

    这是输出:

    数组
    (
        [0] => 数组
            (
                [问题] => 标题一
                [答案] => Lorem ipsum dolor
            )
    
        [1] => 数组
            (
                [问题] => 标题二
                [答案] => 列出第一项。列出项目二。
            )
    
    )
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2013-08-11
      • 2011-04-26
      • 1970-01-01
      • 2019-06-28
      • 2014-02-22
      • 1970-01-01
      • 1970-01-01
      • 2016-09-20
      相关资源
      最近更新 更多