【问题标题】:Beautiful Soup fails to acquire prices from amazon at times [duplicate]Beautiful Soup 有时无法从亚马逊获得价格[重复]
【发布时间】:2020-07-27 10:32:19
【问题描述】:

在运行美丽的汤脚本以从亚马逊获取价格时,我偶然发现了美丽汤经常无法随机获取价格的问题,其形式为输出中的空列表。

def getAmazonPrice(productUrl):
    elems = []
    while elems == None or elems == []:
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}  # to make the server think its a web browser and not a bot
        res = requests.get(productUrl, headers=headers)
        res.raise_for_status()

        soup = bs4.BeautifulSoup(res.text, 'lxml')
        elems = soup.select('#mediaNoAccordion > div.a-row > div.a-column.a-span4.a-text-right.a-span-last > span.a-size-medium.a-color-price.header-price')
        print(elems)
    return elems[0].text.strip()


price = getAmazonPrice('https://www.amazon.com/Automate-Boring-Stuff-Python-2nd-ebook/dp/B07VSXS4NK/ref=sr_1_1?crid=30NW5VCV06ZMP&dchild=1&keywords=automate+the+boring+stuff+with+python&qid=1586810720&sprefix=automate+the+bo%2Caps%2C288&sr=8-1')
print('The price is ' + price)

输出:

[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[<span class="a-size-medium a-color-price header-price">


                        $26.58



        </span>]
The price is $26.58

【问题讨论】:

  • 您之前已经收到过关于您的问题的详细答复。使用API 或只使用selenium

标签: python beautifulsoup python-requests css-selectors http-headers


【解决方案1】:

只需将res.text 保存到 html 文件,您就会发现自己被验证码阻止了。

【讨论】:

  • 但有时我是如何通过验证码的,它会返回价格>?
  • 很好的问题,但你必须问亚马逊,他们何时阻止使用验证码的请求。即使是真实用户有时也会得到验证码:reddit.com/r/amazon/comments/606jut/amazon_captcha
猜你喜欢
  • 2023-02-16
  • 2020-12-04
  • 1970-01-01
  • 2020-04-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-12-06
相关资源
最近更新 更多