【发布时间】:2016-02-20 13:31:44
【问题描述】:
我正在尝试使用 xpath 提取 2 位数据
- 文本节点值和
- 超链接。
这是我的代码:
<?php
$curl = curl_init('http://www.livescore.com/soccer/england/league-2/');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');
$html = curl_exec($curl);
curl_close($curl);
if (!$html)
{
die("something's wrong!");
}
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$result = $xpath->query("/html/body/div[2]/div[5]/div[contains(@class, 'row')]");
var_dump ($result);
foreach($result as $row)
{
$text = $row->nodeValue;
$href = $row->getAttribute("href");
//getAttribute("href")
$array[] = array
(
'text' => trim($text),
'href' => $href
);
}
print "<pre>";
var_dump ($array);
?>
我就是无法提取href链接!!任何帮助都会非常受欢迎。非常感谢
【问题讨论】:
标签: php xpath web-scraping