【问题标题】:How to scrape product prices that are regionally specific如何抓取特定区域的产品价格
【发布时间】:2019-04-04 01:57:58
【问题描述】:

作为练习,我试图从 Lowes 那里收集有关洗衣机的信息。 https://www.lowes.com/pl/Washing-machines-Washers-dryers-Appliances/4294857977

要访问价格,我需要找到一个类为“product-pricing”的 div,然后在其中获取 span 的文本。但是,当我在浏览器中检查 div 时,它与使用 beautifulsoup 抓取它时完全不同。当我检查它看起来像这样:

<div class="product-pricing">
<div class="pl-price js-pl-price" tabindex="-1">                 

     <!-- Was Price -->
     <div class="v-spacing-mini">
           <span class="h5 js-price met-product-price art-pl-contractPricing0" data-met-type="was">$499.00</span>
     </div>
     <div class="v-spacing-mini">
           <p class="darkMidGrey art-pl-wasPriceLbl0">was: $749.00</p>

              <small class="green small art-pl-saveThruLbl0">SAVE 33% thru 10/30/2018</small><br>
     </div>

  <!-- Start of Product Family Pricing -->

  <!-- Contractor Pack Messaging -->

  <!-- End of Product Family Pricing -->
  </div>
  <div class="v-spacing-small">
     <a role="link" tabindex="-1" data-toggle="popover" aria-haspopup="true" data-trigger="focus" data-placement="bottom auto" data-content="FREE local delivery applies to any major appliance $396 or more, full-size gas grills $498 or more, patio furniture orders $498 or more, and riding and ZTR mowers $999 or more. Applies to standard deliveries in US only. Purchase threshold calculated before taxes, after applicable discounts, if any. Additional fees may apply." data-original-title="Free Delivery" class="js-truck-delivery"><i class="icon-truck" title="FREE Delivery" aria-label="FREE Delivery."></i> <strong>FREE Delivery</strong></a>
  </div>
</div>

但是当我刮擦时,我却看到了:

<div class="product-pricing">
<div class="v-spacing-jumbo clearfix">
  <a aria-haspopup="true" class="js-enter-location" data-content="Since Lowes.com is national in scope, we check inventory at your local store first in an effort to fulfill your order more quickly. You may find product or pricing that differ from that of your local store, but we make every effort to minimize those differences so you can get exactly what you want at the best possible price." data-placement="top auto" data-toggle="popover" data-trigger="focus" role="link" tabindex="-1">
     <p class="h6" id="ada-enter-location"><span>Enter your location</span>
        <i aria-hidden="true" class="icon-info royalBlue"></i>
     </p>
  </a>
  <p class="small-type secondary-text" tabindex="-1">for pricing and availability.</p>
</div>
<form action="#" class="met-zip-container js-store-locator-form" data-modal-open="true" data-zip-in="true" id="store-locator-form">
  <input name="redirectUrl" type="hidden" value="/pl/Washing-machines-Washers-dryers-Appliances/4294857977"/>
  <div class="form-group product-form-group">
     <div class="input-group">
        <input aria-label="Enter your zip code" autocompletetype="find-a-store-search" class="form-control js-list-zip-entry-input met-zip-code" name="searchTerm" placeholder="ZIP Code" role="textbox" tabindex="-1" type="text"/>
        <span class="input-group-btn">
        <button class="btn btn-primary js-list-zip-entry-submit met-zip-submit" data-linkid="get-pricing-and-availability-zip-in-modal-submit" tabindex="-1" type="submit">OK</button>
        </span>
     </div>
     <span class="inline-help">ZIP Code</span>
  </div>
 </form>
</div>

我认为这与网站必须使用我的位置来确定正确价格有关。似乎有一个隐藏的输入,我的浏览器知道我的位置并告诉网站,有没有办法让美丽的汤在检查我的位置后刮掉出现的价格?

这是我正在使用的代码:

import re
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.lowes.com/pl/Washing-machines-Washers-dryers- 
Appliances/4294857977'

uClient = uReq(my_url)

page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, features = "lxml")

containers = page_soup.findAll("div", {"class":"product-wrapper-right"})
for container in containers:
    price = container.findAll("span", {"class":"js-price"})[0].text

编辑:给我第二个html的具体代码是

container.findAll("div", {"class":"product-pricing"})   

【问题讨论】:

    标签: python html web-scraping beautifulsoup


    【解决方案1】:

    不是 100% 确定这会解决您的问题,但使用 selenium 可能会有所帮助,因为它是一个实际的浏览器,并且会发送普通浏览器在访问网站时发送的数据。

    Selenium 简介链接:https://medium.freecodecamp.org/better-web-scraping-in-python-with-selenium-beautiful-soup-and-pandas-d6390592e251

    【讨论】:

    • 谢谢!首先在 selenium 中打开页面可以让我获得正确的 div,从而获得价格!
    • 欢迎您。我并不是要粗鲁或任何事情,但如果这是问题的答案,请将我的答案作为这个 stackoverflow 问题的公认答案,那就太好了!
    • 没问题!我是堆栈溢出的新手,所以我不知道。再次感谢!
    • 没问题很高兴我能帮上忙。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2015-11-03
    • 1970-01-01
    • 1970-01-01
    • 2019-05-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多