用beautifulsoup4从天才网站上抓取评论答案

【问题标题】：Scraping comments from genius site with beautifulsoup4用beautifulsoup4从天才网站上抓取评论
【发布时间】：2022-01-18 21:55:32
【问题描述】：

我想问一下，在您看来，是否有可能用 beautifulsoup4 从genius.com 上抓取 cmets。我在问这个问题，因为当我用 bs4 抓取页面时，我找不到 cmets 的部分，因为它们位于可扩展容器的后面。如果我从浏览器中查看页面的 html，即使我没有点击“展开”按钮，我也可以看到 cmets，但是当使用 bs4 抓取时，我无法在 html 源代码中找到它们。

我该如何解决这个问题？有办法用bs4刮掉cmets吗？还是我应该使用硒？（我想避免使用 selenium，因为我必须抓取大量数据，而使用 selenium 可能会非常慢）。

【问题讨论】：

标签： python html web-scraping beautifulsoup expand

【解决方案1】：

页面很可能是 JavaScript 呈现的。你需要 Selenium。如果你不想使用 selenium，你可以将 driver.content 传递给 BeautifulSoup 方法。

这里是示例代码：

from selenium import webdriver

url = 'https://www.siteURL.com'

driver = webdriver.Chrome()
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser') #or you can use lxml parser
driver.close()

【讨论】：