使用 phantomjs 在文章中获取评论答案

【问题标题】：getting comments in article with phantomjs使用 phantomjs 在文章中获取评论
【发布时间】：2015-12-08 07:43:28
【问题描述】：

我正在尝试提取 website 中的 cmets。

我尝试为此目的使用 urllib，但无济于事。然后我意识到，因为启用 javascript 是必须的，所以我使用 selenium 和 phantomjs 来提取 cmets，如以下 python3 代码所示：

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.PhantomJS('phantomjs')

url='http://www.faz.net/aktuell/politik/inland/sterbehilfe-im-bundestag-unmoralisches-angebot-13887916.html'

driver.get(url)
htm_doc = driver.page_source
soup = BeautifulSoup(htm_doc, 'html.parser')
print (soup.find('div', attrs={'id','lesermeinungen'}))

由于 cmets 在加载页面时加载，我只需访问源代码并尝试查看标签名称“lesermeinungen”下是否有任何 cmets，因为这是我在访问 cmets 部分时出现的部分。

但是，它将结果作为None

更新试过下面的代码

from bs4 import BeautifulSoup
import selenium.webdriver.support.ui as ui
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.PhantomJS('phantomjs')

url='http://www.faz.net/aktuell/politik/inland/sterbehilfe-im-bundestag-unmoralisches-angebot-13887916.html'

driver.get(url)

wait = ui.WebDriverWait(driver,3)

try:
    wait.until(driver.find_element_by_id('lesermeinungen'))
    htm_doc = driver.page_source
    soup = BeautifulSoup(htm_doc, 'html.parser')
    print (soup.find('div', attrs={'id','lesermeinungen'}))

except TimeoutException:
    print ("Loading took too much time!")

即使 2 小时后也没有结果

【问题讨论】：

问题是什么？
@Vaviloff 嗨，你能检查一下编辑吗

标签： python-3.x selenium selenium-webdriver phantomjs

【解决方案1】：

在使用 beautifulsoup 搜索元素时出现拼写错误。而不是

print (soup.find('div', attrs={'id','lesermeinungen'}))

应该是冒号，而不是逗号

print (soup.find('div', attrs={'id' : 'lesermeinungen'}))

有了这个更正，你的第一个例子对我有用。

【讨论】：

傻我。非常感谢
@Ekoji 嘿，很高兴为您提供帮助！如果我的回答不仅正确而且有用，您可以投票以表明这一点。另外：很好的问题，真的 - 有足够的信息来分析和找到解决方案。