Python requests-HTML 库有时不起作用答案

【问题标题】：Python requests-HTML library doesn't work sometimesPython requests-HTML 库有时不起作用
【发布时间】：2021-08-05 13:36:27
【问题描述】：

定义

我正在使用 requests_html 库从网站上抓取。我写的get_product_info(url: str) -> dict方法返回页面上的产品名称、价格和产品的url。

问题

我注意到，当我使用相同的 url 多次运行该函数时，它并不总是返回结果。

示例

到底是什么问题？

代码

from requests_html import HTMLSession
session = HTMLSession()

sub_cat2_link = 'https://www.sokmarket.com.tr/bulasik-c-1442'


def get_product_info(url: str) -> dict:
    r2 = session.get(url)
    r2.html.render()
    product_names = [item.text for item in r2.html.find('main.listing-results strong')]
    product_prices = [item.text for item in r2.html.find('main.listing-results div.pricetag')]
    product_links = [MAIN_URL + item.links.pop() for item in r2.html.find('main.listing-results a.productbox-wrap')]
    return {"prod": product_names, "price": product_prices, "prod_link": product_links}


result = get_product_info(sub_cat2_link)
print(result)

【问题讨论】：

我不熟悉 requests_html 模块。但是，如果它类似于无处不在的请求模块，那么您对 get() 的调用可能会返回一个不是 200 的 HTML 状态代码。如果网站上有速率限制，这可能类似于 503 或 429您正在尝试访问
我查过了。状态码 200 但没有数据。我在代码行之间使用了 sleep 方法，但没有任何改变。
您显然需要更改您的代码，但我唯一的建议是使用“请求”模块。我可能遗漏了一些东西，但我看不出如何/为什么返回 HTTP 状态 200 而没有内容

标签： python web-scraping python-requests python-requests-html

【解决方案1】：

我遇到了同样的问题。我最终重试了渲染，到目前为止它对我有用。

for attempt in range(3):
    try:
        r2.html.render()
        #do something
    except:
        time.sleep(5) # not sure if this is needed
        print(attempt)
    else: 
        break
else:
     print('all attempts failed')

【讨论】：