python 使用 selenium 从 JS 获取 HTML答案

【问题标题】：python Get HTML from JS using seleniumpython 使用 selenium 从 JS 获取 HTML
【发布时间】：2016-10-06 07:44:55
【问题描述】：

我正在尝试从https://www.workday.com/en-us/company/careers/open-positions.html#?q= 获取 div HTML。

但列出工作职位的 div 是基于网络 XHR 从granite.min.js 加载的。

from selenium import webdriver
from bs4 import BeautifulSoup
from pprint import pprint


path_to_chromedriver = "/Users/RichWin/Documents/chromedriver.exe"
browser = webdriver.Chrome(executable_path=path_to_chromedriver)

driver = browser.get('https://www.workday.com/en-us/company/careers/open-positions.html#?q=')

elem = driver.find_element_by_id('template-content')

soup = BeautifulSoup(elem.get_text, "html.parser")

for tag in soup.find_all('div'):
    pprint(tag)

谁能帮帮我？

【问题讨论】：

请按以下方式更新问题：stackoverflow.com/help/how-to-ask
我的问题有什么问题？
没有人会为你做这项工作 -> 努力证据是必要的，即向我们展示你遇到问题的代码
已完成编辑。对不起，我是新来的。

标签： javascript python selenium

【解决方案1】：

好的，所以你的代码有几个问题。

a) 您需要等待 template-content div 加载其内容。在下面的代码中，我使用 implicitly_wait 等待 30 秒。
b) find_element_by_id 不返回 HTML 而是一个 Selenium 对象。因此不能将其传递给BeautifulSoup 进行解析。

from pprint import pprint
from bs4 import BeautifulSoup
from selenium import webdriver


url = 'https://www.workday.com/en-us/company/careers/open-positions.html#?q='
path_to_chromedriver = "/Users/RichWin/Documents/chromedriver.exe"

browser = webdriver.Chrome(executable_path=path_to_chromedriver)
browser.implicitly_wait(30)
browser.get(url)

elem = browser.find_element_by_id('template-content')
elem_html = elem.get_attribute('innerHTML')

soup = BeautifulSoup(elem_html, "html.parser")
for tag in soup.find_all('div'):
    pprint(tag)

browser.quit()

【讨论】：

哦。这就是为什么它说：AttributeError：'NoneType'对象没有属性'find_element_by_class_name'非常感谢！
没问题，欢迎采纳：stackoverflow.com/help/someone-answers