【发布时间】:2021-03-17 17:41:45
【问题描述】:
我正在尝试提取 Etsy.com 的一些产品数据我不确定是不是因为我有错误的 Parent 类,我无法提取数据或其他问题。我已经尝试了几个类作为父类,当前一个允许我拉出一行。
链接Etsy.com
我一直在等待页面加载并向下滚动页面以确保它正确加载,而不是作为惰性加载器。但是我仍然只能提取一行数据。
我下面的代码通常适用于我
Set Html = objIE.document
Set elements = Html.getElementsByClassName("bg-white display-block pb-xs-2 mt-xs-0") ' parent CLASS
'FOR LOOP
For Each element In elements
''' Element 1
If element.getElementsByClassName("js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0) Is Nothing Then ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-" 'If Nothing then Hyphen in CELL
Else
HtmlText = element.getElementsByClassName("js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0).href 'Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText 'return value in column
End If
''' Element 2
If element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0) Is Nothing Then ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-" 'If Nothing then Hyphen in CELL
Else
HtmlText = element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0).innerText ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column
End If
''' Element 3
第二个父类
我以为我已经解决了问题,但没有发布我原来的上述问题。 使用下面的父类,我能够完成一整页 50 多个项目和 A 列结果。 从那以后我什么都没改变,但是我无法再次产生相同的结果。我得到的只是一排,我不明白为什么。我一直在尝试解决此问题一段时间,但无法解决问题所在。下面的类工作了一次并提取了 50 多个结果,现在它只做了 1 行,我已经清除了所有浏览器缓存,并重新启动了 PC,
第二个父类
Set Html = objIE.document
Set elements = Html.getElementsByClassName("wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container") ' parent CLASS
'FOR LOOP
For Each element In elements
我尝试了以下课程,只有两个在笔记状态下获得了一些结果
'wt-mt-xs-2 wt-text-black
'col-group pl-xs-0 search-listings-group pr-xs-1
'col-xs-12 pl-xs-1 pl-md-3
'responsive-listing-grid wt-grid wt-grid--block wt-justify-content-flex-start wt-list-unstyled pl-xs-0
'bg-white display-block pb-xs-2 mt-xs-0
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'Can only do 1 row
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'I was able to pull of 50+ items now not working
'wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder
'js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none
每个项目都有一个 li 类,更多信息见下图
问题 - 有人可以告诉我我做错了什么吗? (我曾经用第二个父类成功地提取了 50+ 个结果,但是现在这只是拉出 1 行,我无法解决)
<li class="wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder">
<div class="js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none " data-palette-listing-id="973170689" data-shop-id="" data-listing-id="973170689" data-behat-listing-card="" data-listing-card-v2="">
<a class="6dd4c4354676ccda display-inline-block listing-link logged" data-listing-id="973170689" data-palette-listing-image="" href="https://www.etsy.com/uk/listing/973170689/deconstructed-iphone-5-artwork?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=phones&ref=sc_gallery-1-1&plkey=247d3e6c1599979de70c884db995d78e95827f21%3A973170689&frs=1"
data-display-loc="w.0" data-page-num="1" data-position-num="1" data-logging-key="247d3e6c1599979de70c884db995d78e95827f21:973170689" target="etsy.973170689" title="Deconstructed iPhone 5 artwork">
<div class="v2-listing-card__img position-relative">
<div data-listing-card-image="">
<div class="placeholder placeholder-landscape ">
<div class="placeholder-content ">
<div class="placeholder vertically-centered-placeholder placeholder-landscape">
<div class="height-placeholder">
<img data-listing-card-listing-image="" src="https://i.etsystatic.com/27880825/c/2250/1788/0/538/il/116587/2961533797/il_340x270.2961533797_r4pc.jpg" class="width-full wt-height-full display-block position-absolute " alt="">
</div>
</div>
</div>
</div>
</div>
</div>
<div class="v2-listing-card__info
">
<div>
<h3 class="text-gray text-truncate mb-xs-0 text-body ">
Deconstructed iPhone 5 artwork
</h3>
<p>
</p>
<div class="v2-listing-card__shop">
<p class="text-gray-lighter text-body-smaller display-inline-block" aria-hidden="true"><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">A</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>
<span
class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">d</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span><span class="p06299890 c968b3da8">E</span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">b</span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">y</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>DissectProjects</p>
<p class="screen-reader-only">Ad from shop DissectProjects</p>
<span class="v2-listing-card__rating icon-t-2 display-block">
</span>
</div>
<span class="n-listing-card__price text-gray mt-xs-0 strong display-block
text-body-larger
">
<span class="currency-symbol">£</span><span class="currency-value">120.00</span>
<span class="text-body-smaller no-wrap">
span class="wt-badge wt-badge--small wt-badge--sale-01">
FREE UK delivery</span>
</span>
</span>
<p></p>
</div>
</div>
</a>
<div data-favorite-button-wrapper="" class="v2-listing-card__actions z-index-1 position-absolute">
<button class="inline-overlay-trigger favorite-item-action position-absolute favorite-listing-button p-xs-1 has-hover-state z-index-1 btn-transparent position-right in-search v2-listing-card__favorite" data-ui="favorite-listing-button" data-listing-id="973170689"
data-accessible-btn-fave="" data-favorite-label="Add to Favourites" data-favorited-label="Remove from Favourites">
<div data-source="search" data-btn-fave="" data-neu-fave="">
<span class="favorite-listing-button-icon-container icon-circle-container bg-white icon-group p-xs-1
" data-favorite-icon-container="">
<span class="etsy-icon icon-smaller text-gray wt-display-block
" data-not-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,21C10.349,21,2,14.688,2,9,2,5.579,4.364,3,7.5,3A6.912,6.912,0,0,1,12,5.051,6.953,6.953,0,0,1,16.5,3C19.636,3,22,5.579,22,9,22,14.688,13.651,21,12,21ZM7.5,5C5.472,5,4,6.683,4,9c0,4.108,6.432,9.325,8,10,1.564-.657,8-5.832,8-10,0-2.317-1.472-4-3.5-4-1.979,0-3.7,2.105-3.721,2.127L11.991,8.1,11.216,7.12C11.186,7.083,9.5,5,7.5,5Z"></path></svg></span>
<span class="etsy-icon icon-smaller text-red wt-display-none
" data-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M16.5,3A6.953,6.953,0,0,0,12,5.051,6.912,6.912,0,0,0,7.5,3C4.364,3,2,5.579,2,9c0,5.688,8.349,12,10,12S22,14.688,22,9C22,5.579,19.636,3,16.5,3Z"></path></svg></span>
</span>
</div>
<!--icon font and display:none; elements -->
<span aria-hidden="true" class="icon"></span>
<span class="screen-reader-only default" data-a11y-label="">
Add to Favourites
</span>
</button>
</div>
</div>
</li>
我用它来向下滚动浏览器。
objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser
''#####################今天更新#################### ##
我猜父类是v2-listing-card__info 但是如果我没记错的话 PRODUCT URL 不属于这个,所以我怎么得到那个
''####################### 2021 年 3 月 19 日更新 ############### #######
非常感谢SIM 的支持,也感谢Qharr 的投入。终于解决了,谢谢大家
结果
一如既往地提前感谢
【问题讨论】:
-
您不是试图从该页面解析产品的名称和价格吗?内容似乎是静态的并且在页面源中可用。您应该使用 xmlhttp 请求而不是 IE,这很麻烦。
-
感谢 SIM 卡的更新,我使用的是 IE,因为我也使用了站点过滤器,所以我需要浏览器可见。我知道 IE 不是最好的,并且很快就会过时,但现在我对 IE 很好,并且会考虑在以后更改为 xmlhttp。现在我只需要提取数据
-
很抱歉提供错误信息。前几个项目是静态的,但其余的都是延迟加载的。但是,在这种情况下,除非您已经计划坚持使用 IE,否则 selenium 将是更好的选择。
-
现在,它的 IE,我加载页面并滚动到底部,然后开始提取。所以所有项目都应该通过惰性加载器加载,我让它工作了一次,现在似乎无法让它工作
标签: excel vba web-scraping screen-scraping