无法在 python 上使用 requests-html 库运行 JavaScript答案

【问题标题】：Can't run JavaScript using requests-html library on python无法在 python 上使用 requests-html 库运行 JavaScript
【发布时间】：2020-01-23 23:22:19
【问题描述】：

我需要从一些包含一些 javascript 代码的链接中提取一些信息。我知道如何使用 Selenium 来做到这一点，但这需要很多时间，我需要更有效的方法来实现这一目标。

我浏览了 requests-html 库，对于我的目的来说，它看起来非常健壮，但不幸的是，我无法使用它运行 javascript。

我从以下链接https://requests-html.readthedocs.io/en/latest/阅读了文档

并尝试了以下代码：

from requests_html import HTMLSession,HTML
from bs4 import BeautifulSoup

session = HTMLSession()
resp = session.get("https://drive.google.com/file/d/1rZ-DhTFPCen6DvJXlNl3Bxuwj4-ULwoa/view")

resp.html.render()

soup = BeautifulSoup(resp.html.html, 'lxml')

email = soup.find_all('img', {'class':'ndfHFb-c4YZDc-MZArnb-BA389-YLEF4c'})
print(email)

运行此代码后我没有得到任何结果，即使从浏览器打开链接时该类存在。

我也尝试在没有帮助的情况下对我的请求使用标头。我为另一个链接（https://web.archive.org/web/*/stackoverflow.com）尝试了相同的代码（当然，使用不同的 html 标记），但我得到了一些 html 文本，其中包括一个响应，说我的浏览器必须支持 javascript。我这部分的代码：

from requests_html import HTMLSession
from bs4 import BeautifulSoup

session = HTMLSession()
resp = session.get("https://web.archive.org/web/*/stackoverflow.com")

resp.html.render()

soup = BeautifulSoup(resp.html.html, 'lxml')


print(soup)

我得到的回应：

<div class="no-script-message">
        The Wayback Machine requires your browser to support JavaScript, please email <a href="mailto:info@archive.org">info@archive.org</a><br/>if you have any questions about this.
      </div>

任何帮助将不胜感激。谢谢！

【问题讨论】：

标签： javascript python-3.x python-requests-html

【解决方案1】：

在渲染中，添加睡眠参数

resp.html.render(sleep=2)

【讨论】：

对我来说，这就是答案

【解决方案2】：

这应该在网站上工作。但是正如您提到的代码适用于 StackOverflow 但不适用于其他 URL？是因为服务器可能没有响应，或者您正在寻找的标签当时可能不可用。但无论如何requests-HTML 应该给你一个错误。

我正要检查您的问题并将其添加到我的博客帖子How to use Requests-HTML但不幸的是，您提供的链接无效。

【讨论】：