【问题标题】:Get img src that contains a certain word from xpath从 xpath 获取包含某个单词的 img src
【发布时间】:2021-11-13 22:03:04
【问题描述】:

我使用无头模式提取网页,这是输出的相关内部 HTML 部分。

<div class="product__aside">
\t\t\t\t<div class="slider-pdp">
\t\t\t\t\t<div class="slider__clip">
\t\t\t\t\t\t<div class="slides slick-initialized slick-slider slick-dotted" role="toolbar">
<div aria-live="polite" class="slick-list draggable" style="padding: 0px 24.47%;"><div class="slick-track" role="listbox" style="opacity: 1; width: 6010px; transform: translate3d(-1202px, 0px, 0px);"><div class="slide slick-slide slick-cloned" data-slick-index="-2" aria-hidden="true" tabindex="-1" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_600-1812358633.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_1365--489680014.jpg"> 
\t</div>
</div><div class="slide slick-slide slick-cloned" data-slick-index="-1" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_600-251567441.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_1365--146353341.jpg"> 
\t</div>
</div><div class="slide slick-slide slick-current slick-active slick-center" data-slick-index="0" aria-hidden="false" tabindex="-1" role="option" aria-describedby="slick-slide00" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_600--951538759.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_1365--973725436.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="1" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide01" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_600--1234110023.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_1365-140785407.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="2" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide02" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_02--IMG_600--150275930.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_02--IMG_1365-1432102351.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="3" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide03" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_03--IMG_600--102741357.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_03--IMG_1365-1955701010.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="4" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide04" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_600-1812358633.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_04--IMG_1365--489680014.jpg"> 
\t</div>
</div><div class="slide slick-slide" data-slick-index="5" aria-hidden="true" tabindex="-1" role="option" aria-describedby="slick-slide05" style="width: 601px;"> 
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_600-251567441.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_05--IMG_1365--146353341.jpg"> 
\t</div>
</div><div class="slide slick-slide slick-cloned slick-center" data-slick-index="6" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_600--951538759.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_LEAD--IMG_1365--973725436.jpg"> 
\t</div>
</div><div class="slide slick-slide slick-cloned" data-slick-index="7" aria-hidden="true" tabindex="-1" style="width: 601px;">
\t<div class="slide__image">
\t\t<img src="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_600--1234110023.jpg" alt="" data-zoom-image="https://tnuck.ips.photos/images/skus/P31637-PRODUCT_01--IMG_1365-140785407.jpg"> 
\t</div>
</div></div></div>

由此我需要获取其中包含“PRODUCT_LEAD”字符串的src 值。为了这样做,我编写了以下代码,如果我 dd($imgs) 它返回长度为 10。但它没有返回 for 循环中的 src 值。 $pageBody是网页的内部html。

                            $doc = new DOMDocument;
                            $doc->preserveWhiteSpace = false;
                            $doc->strictErrorChecking = false;
                            $doc->recover = true;

                            ini_set('user_agent', 'My-Application/2.5');
                            libxml_use_internal_errors(true);
                            $doc->loadHTML($pageBody);
                            $xpath = new \DOMXPath($doc);
                            $imgs  = $xpath->query('//*[@class="slide__image"]');
                            foreach($imgs as $img)
                            {
                                $imgurl = $img->getAttribute('src');
                            }
                            dd($imgurl); // This returns nothing

【问题讨论】:

    标签: php laravel dom xpath domcrawler


    【解决方案1】:

    试试$imgs = $xpath-&gt;query('//*[@class="slide__image"]/img/@src[contains(., "PRODUCT_LEAD")]');

    方括号中的部分是确定要选择哪些元素的“谓词”。 . 指的是当前节点。

    【讨论】:

      【解决方案2】:

      试试这个代码:

      $imgurl = [];
      
      for($x = 0; $x < $imgs->length; $x++) {
          $imgurl[] = $imgs->item($x)->getAttribute('src');
      }
      

      【讨论】:

        【解决方案3】:
        $doc = new DOMDocument;
        $doc->preserveWhiteSpace = false;
        $doc->strictErrorChecking = false;
        $doc->recover = true;
        
        ini_set('user_agent', 'My-Application/2.5');
        libxml_use_internal_errors(true);
        $doc->loadHTML($pageBody);
        $xpath = new \DOMXPath($doc);
        $imgs  = $xpath->query('//*[@class="slide__image"]/img/@src');
        $imgurl=[];
        foreach($imgs as $img)
        {
            if(str_contains($img->nodeValue,'PRODUCT_LEAD'))
            {
               $leadImage = $img->nodeValue;
            }
        }
        

        我修改了这样的代码,而不是getAttibute()。这很好用。但我想知道我是否可以直接从query() 获取此网址 类似//img[@src(contains())]

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2017-09-20
          • 2013-04-17
          • 2018-09-19
          • 1970-01-01
          • 2012-04-22
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多