PHP - preg_match 无法从 html url 获取所有元素答案

【问题标题】：PHP - preg_match unable to get all elements from html urlPHP - preg_match 无法从 html url 获取所有元素
【发布时间】：2017-04-07 03:15:20
【问题描述】：

我一直在尝试从 url (defimedia.info) 获取 html 标记的内部文本，但我只得到 1 个输出。我试过的代码是：

$html = file_get_contents("http://www.defimedia.info");
preg_match("'<h3>(.*?)<h3>'si", $html, $match);
echo($match[1]);

即使我尝试使用 foreach 或尝试使用 $match[2]，它也不起作用。任何帮助都将不胜感激。

问候
bhaamb

【问题讨论】：

也许使用 html 解析器是个好主意。如果它有一个类 <h3 class="large">，你的正则表达式将不匹配 h3
@MartinGottweis 先生，它没有课程
在解析 HTML 而不是使用正则表达式时，我会使用 HTML 解析器（如 simplehtmldom.sourceforge.net），使用 imho 更简单、更容易。它为您完成所有繁重的工作。

标签： php html

【解决方案1】：

你需要 preg_match_all 函数。记录在这里http://php.net/manual/en/function.preg-match-all.php

像这样试试。

<?php
$html = file_get_contents("http://www.defimedia.info");
preg_match_all('/<h3>(.*?)<h3>/si', $html, $match);
print_r($match);
?>

【讨论】：

【解决方案2】：

Regex is not the correct tool for parsing HTML/XML instead you can use DOMDocument

你可以像 as 一样使用DOMDocument

$html = file_get_contents("http://www.defimedia.info");
$dom = new DOMDocument();

libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);

$h3s = $dom->getElementsByTagName('h3');
foreach ($h3s as $h3) {
    echo $h3->nodeValue."<br>";
}

Why did I used libxml_use_internal_errors(true); ?

【讨论】：