您应该在加载后定位元素并通过arguments[0]而不是通过document获取整个页面
html_of_interest=driver.execute_script('return arguments[0].innerHTML',element)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
这有2个实际案例:
1
该元素尚未加载到 DOM 中,您需要等待该元素:
browser.get("url")
sleep(experimental) # usually get will finish only after the page is loaded but sometimes there is some JS woo running after on load time
try:
element= WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'your_id_of_interest')))
print "element is ready do the thing!"
html_of_interest=driver.execute_script('return arguments[0].innerHTML',element)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
except TimeoutException:
print "Somethings wrong!"
2
该元素位于影子根中,您需要先扩展影子根,可能不是您的情况,但我会在这里提及它,因为它与将来参考有关。例如:
import selenium
from selenium import webdriver
driver = webdriver.Chrome()
from bs4 import BeautifulSoup
def expand_shadow_element(element):
shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
return shadow_root
driver.get("chrome://settings")
root1 = driver.find_element_by_tag_name('settings-ui')
html_of_interest=driver.execute_script('return arguments[0].innerHTML',root1)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
sel_soup# empty root not expande
shadow_root1 = expand_shadow_element(root1)
html_of_interest=driver.execute_script('return arguments[0].innerHTML',shadow_root1)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
sel_soup