为什么 render / requests-html 不抓取动态内容？答案

【问题标题】：Why render / requests-html doesn't scrape dynamic content?为什么 render / requests-html 不抓取动态内容？
【发布时间】：2020-04-27 03:57:30
【问题描述】：

长话短说：从 Selenium 切换到 Requests(-html)。

工作正常，但并非在所有情况下都可以。

页面：https://www.winamax.fr/paris-sportifs/sports/1/1/1

加载时，它会通过英语比赛（例如：谢菲尔德联队 - 西汉姆联队）对动态内容收费。

但是当我尝试这样做时：

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/1/1/1')
r.html.render()
print(r.html.text) # I also tried print(r.html.html)

游戏不显示在输出中。

为什么？谢谢！

【问题讨论】：

这能回答你的问题吗？ How to retrieve the values of dynamic html content using Python
只是因为页面中没有包含“输出”。在浏览器中，它通过 Javascript 被添加到页面的 DOM 中。请求不运行 Javascript。 Selenium 使用浏览器工作。
但是 requests-html 不是应该处理 JavaScript 支持吗？ requests-html.kennethreitz.org/#javascript-support
我试图找到带有数据的 ajax json，但没有运气（不确定如何执行此操作）

标签： python python-requests python-requests-html

【解决方案1】：

添加超时，它应该可以工作，对不起，这必须是评论，但我不能评论..

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/sports/1/1/1')
r.html.render(timeout=20)
print(r.html.html)
session.close()

【讨论】：

当我运行时，我得到这个RuntimeWarning:C:\Applications__\python372\lib\site-packages\pyee\_base.py:81: RuntimeWarning: coroutine 'Browser._targetCreated' was never awaited f(*args, **kwargs) RuntimeWarning: Enable tracemalloc to get the object allocation traceback. 然后什么都没有发生，它永远不会停止运行。即使使用 CTRL+C 我也无法停止进程。
如何在 BeautifulSoup 中使用它？