【发布时间】:2021-11-13 22:03:04
【问题描述】:
我使用无头模式提取网页,这是输出的相关内部 HTML 部分。
<div class="product__aside">
\t\t\t\t<div class="slider-pdp">
\t\t\t\t\t<div class="slider__clip">
\t\t\t\t\t\t<div class="slides slick-initialized slick-slider slick-dotted" role="toolbar">
<div aria-live="polite" class="slick-list draggable" style="padding: 0px 24.47%;"><div class="slick-track" role="listbox" style="opacity: 1; width: 6010px; transform: translate3d(-1202px, 0px, 0px);"><div class="slide slick-slide slick-cloned" data-slick-index="-2" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_600-1812358633.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_1365--489680014.jpg">
\t</div>
</div><div class="slide slick-slide slick-cloned" data-slick-index="-1" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_600-251567441.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_1365--146353341.jpg">
\t</div>
</div><div class="slide slick-slide slick-current slick-active slick-center" data-slick-index="0" aria-hidden="false" tabindex="-1" role="option" aria-describedby="slick-slide00" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_600--951538759.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_1365--973725436.jpg">
\t</div>
</div><div class="slide slick-slide" data-slick-index="1" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide01" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_600--1234110023.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_1365-140785407.jpg">
\t</div>
</div><div class="slide slick-slide" data-slick-index="2" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide02" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_02--IMG_600--150275930.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_02--IMG_1365-1432102351.jpg">
\t</div>
</div><div class="slide slick-slide" data-slick-index="3" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide03" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_03--IMG_600--102741357.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_03--IMG_1365-1955701010.jpg">
\t</div>
</div><div class="slide slick-slide" data-slick-index="4" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide04" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_600-1812358633.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_1365--489680014.jpg">
\t</div>
</div><div class="slide slick-slide" data-slick-index="5" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide05" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_600-251567441.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_1365--146353341.jpg">
\t</div>
</div><div class="slide slick-slide slick-cloned slick-center" data-slick-index="6" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_600--951538759.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_1365--973725436.jpg">
\t</div>
</div><div class="slide slick-slide slick-cloned" data-slick-index="7" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_600--1234110023.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_1365-140785407.jpg">
\t</div>
</div></div></div>
由此我需要获取其中包含“PRODUCT_LEAD”字符串的src 值。为了这样做,我编写了以下代码,如果我 dd($imgs) 它返回长度为 10。但它没有返回 for 循环中的 src 值。 $pageBody是网页的内部html。
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
ini_set('user_agent', 'My-Application/2.5');
libxml_use_internal_errors(true);
$doc->loadHTML($pageBody);
$xpath = new \DOMXPath($doc);
$imgs = $xpath->query('//*[@class="slide__image"]');
foreach($imgs as $img)
{
$imgurl = $img->getAttribute('src');
}
dd($imgurl); // This returns nothing
【问题讨论】:
标签: php laravel dom xpath domcrawler