【发布时间】:2018-11-03 18:42:07
【问题描述】:
当一个元素直到你使用 .click() 方法并且该元素位于 JavaScript 的一部分(称为 BODY_BLOCK_JQUERY_REFLOW)之后才显示时,如何通过 xapth 抓取一个元素。
我正在尝试访问这部分 html。
<div class="ui_radio item" data-value="it" data-tracker="Italian">
<input id="filters_detail_language_filterLang_it" type="radio" name="filters_detail_language_filterLang_1" value="it" onchange="widgetEvCall('handlers.updateFilter', event, this);">
<label for="filters_detail_language_filterLang_it" class="label">Italian <span class="count">(11)</span>
</label>
</div>
我可以访问之前的语言 1 - 3,但是当我选择第 4 种语言(以及更多)时,我无法解析 xpath,因为它显示为覆盖。
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import os
import time
from lxml import html
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--windows-size=1080*720")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-proxy-server")
headers = {'User-Agent': ''}
proxies = {"http": ''}
chrome_driver = os.getcwd() + "/chromedriver"
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver)
driver.get("https://www.tripadvisor.com/Attraction_Review-g60776-d117416-Reviews-Colorado_National_Monument-Fruita_Colorado.html")
# here we click on the more languages element
driver.find_element_by_xpath("""//*[@id="taplc_detail_filters_ar_responsive_0"]/div/div[1]/div/div[2]/div[4]/div/div[2]/div[1]/div[5]""").click()
html_thing = driver.page_source
innerHTML = driver.execute_script("return document.body.innerHTML")
parser = html.fromstring(html_thing)
#T hese XPATHS work since they are part of the DOM on intial load
XPATH_LANG1 = '//*[@id="taplc_detail_filters_ar_responsive_0"]/div/div[1]/div/div[2]/div[4]/div/div[2]/div[1]/div[2]/label/text()'
XPATH_LANG_COUNT1 = '//*[@id="taplc_detail_filters_ar_responsive_0"]/div/div[1]/div/div[2]/div[4]/div/div[2]/div/div[2]/label/span//text()'
XPATH_LANG2 = '//*[@id="taplc_detail_filters_ar_responsive_0"]/div/div[1]/div/div[2]/div[4]/div/div[2]/div[1]/div[3]/label/text()'
XPATH_LANG_COUNT2 = '//*[@id="taplc_detail_filters_ar_responsive_0"]/div/div[1]/div/div[2]/div[4]/div/div[2]/div/div[3]/label/span//text()'
XPATH_LANG3 = '//*[@id="taplc_detail_filters_ar_responsive_0"]/div/div[1]/div/div[2]/div[4]/div/div[2]/div[1]/div[4]/label/text()'
XPATH_LANG_COUNT3 = '//*[@id="taplc_detail_filters_ar_responsive_0"]/div/div[1]/div/div[2]/div[4]/div/div[2]/div[1]/div[4]/label/span//text()'
# Unfortunately, these XPATHS dont work. Im assuming because they are in this JQUERY thing.
XPATH_LANG4 = """//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[12]/div[2]/div/div[5]/label/text()"""
print(XPATH_LANG4, 'this is lang 4')
raw_lang1 = parser.xpath(XPATH_LANG1)
print(raw_lang1)
raw_lang_count1 = parser.xpath(XPATH_LANG_COUNT1)
print(raw_lang_count1)
raw_lang2 = parser.xpath(XPATH_LANG2)
print(raw_lang2)
raw_lang_count2 = parser.xpath(XPATH_LANG_COUNT2)
print(raw_lang_count2)
raw_lang3 = parser.xpath(XPATH_LANG3)
print(raw_lang3)
raw_lang_count3 = parser.xpath(XPATH_LANG_COUNT3)
print(raw_lang_count3)
raw_lang4 = parser.xpath(XPATH_LANG4)
if not raw_lang4:
print(raw_lang4, '<--------------- THIS IS EMPTY')
else:
print(raw_lang4, 'It actually showed up')
driver.close()
driver.quit()
我试过使用`driver.find_element_by_xpath(""""""),我试过解析器,以及我能想到的一切。
问题似乎在于,虽然语言(在本例中为“Italian”(叠加层中的第 4 种语言))位于页面源代码中,但 XPATH 却看不到它。这是一个挑战,因为该页面使用动态 id 或根本不使用。
【问题讨论】:
标签: javascript python selenium xpath selenium-chromedriver