【发布时间】:2020-11-19 06:22:14
【问题描述】:
我正在尝试使用 simplehtmldom 从https://benthamopen.com/browse-by-title/B/1/ 提取信息。
具体来说,我想访问页面中显示的部分:
<div style="padding:10px;">
<strong>ISSN: </strong>1874-1207<br><div class="sharethis-inline-share-buttons" style="padding-top:10px;" data-url="https://benthamopen.com/TOBEJ/home/" data-title="The Open Biomedical Engineering Journal"></div>
</div>
我有这个代码:
$html = file_get_html('https://benthamopen.com/browse-by-title/B/1/');
foreach($html->find('div[style=padding:10px;]') as $ele) {
echo("<pre>".print_r($ele,true)."</pre>");
}
...返回(我只显示页面中的一项)
simplehtmldom\HtmlNode Object
(
[nodetype] => HDOM_TYPE_ELEMENT (1)
[tag] => div
[attributes] => Array
(
[style] => padding:10px;
)
[nodes] => Array
(
[0] => simplehtmldom\HtmlNode Object
(
[nodetype] => HDOM_TYPE_ELEMENT (1)
[tag] => strong
[attributes] => none
[nodes] => none
)
[1] => simplehtmldom\HtmlNode Object
(
[nodetype] => HDOM_TYPE_TEXT (3)
[tag] => text
[attributes] => none
[nodes] => none
)
[2] => simplehtmldom\HtmlNode Object
(
[nodetype] => HDOM_TYPE_ELEMENT (1)
[tag] => br
[attributes] => none
[nodes] => none
)
[3] => simplehtmldom\HtmlNode Object
(
[nodetype] => HDOM_TYPE_ELEMENT (1)
[tag] => div
[attributes] => Array
(
[class] => sharethis-inline-share-buttons
[style] => padding-top:10px;
[data-url] => https://benthamopen.com/TOBEJ/home/
[data-title] => The Open Biomedical Engineering Journal
)
[nodes] => none
)
)
)
我不确定如何从这里开始。我要提取:
- ISSN 文本(未在 echo 语句中显示 - 不确定原因)[上例中的 1874-1207]。它是 [nodes] 的元素零
- 'data-url' [https://benthamopen.com/TOBEJ/home/,在上面的例子中]
- “数据标题”[上例中的开放生物医学工程期刊]
可能是我对PHP对象和数组的理解没有达到应有的水平,不知道为什么回显语句中没有显示ISSN。
我尝试了各种(很多)东西,但只是在努力从元素中提取数据。
【问题讨论】:
标签: php html dom simple-html-dom