【问题标题】:Scrape data from csv downloaded after right clicking on webpage using selenium python使用 selenium python 右键单击网页后从 csv 中抓取数据
【发布时间】:2021-07-05 01:55:03
【问题描述】:
我希望使用 python 和 selenium 从网页中抓取数据。有一个 csv 下载选项,只有在图形框架中单击鼠标右键后才能看到该选项。我无法右键单击页面并单击 csv - 使用 selenium 下载选项。
这是我试图从中获取数据的网页链接 - https://datastudio.google.com/reporting/d97f5736-2b85-4f39-beba-6dc386c24429/page/Z3ToB
已尝试使用以下代码集:
options = webdriver.ChromeOptions()
options.binary_location = r"<Path where chrome application is installed>"
driver = webdriver.Chrome(r"<path to chrome driver>",chrome_options=options)
driver.get("https://datastudio.google.com/reporting/d97f5736-2b85-4f39-beba-6dc386c24429/page/Z3ToB")
timeout = 10
from selenium.webdriver import ActionChains
action = ActionChains(driver)
action.move_to_element(driver.find_element_by_xpath("//lego-canvas-container[@class='lego-canvas-container']")).perform()
action.context_click().perform()
使用它,无法找到给定的 XPATH,甚至尝试使用报告区域之类的类名。谁能指导一下如何右键单击框架中的任意位置,然后在其中找到下载 csv 选项?
【问题讨论】:
标签:
python
selenium
webdriver
webdriverwait
【解决方案1】:
由于 javascript 在右键单击后可见,因此如果没有右键单击它无法找到 xpath,请尝试此代码对我有用
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("disable-infobars")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(executable_path = 'chromedriver.exe',options = chrome_options)
driver.implicitly_wait(10)
driver.get("https://datastudio.google.com/reporting/d97f5736-2b85-4f39-beba-6dc386c24429/page/Z3ToB")
action = ActionChains(driver)
action.pause(1)
action.move_by_offset(150,150).perform()
action.context_click().perform()
action.move_to_element(driver.find_element_by_xpath('//*[@id="mat-menu-panel-0"]/div/span[5]/button')).perform()
action.click().perform()
【解决方案2】:
使用下面的xpath 来识别元素,然后右键单击,然后找到 csv 按钮并单击。
driver.get("https://datastudio.google.com/reporting/d97f5736-2b85-4f39-beba-6dc386c24429/page/Z3ToB")
time.sleep(5) #delay to load page properly. you can use explicit wait as well
element=driver.find_element_by_xpath("//div[@class='drop-zone-text']")
action = ActionChains(driver)
action.move_to_element(element).perform()
action.context_click().perform()
#To click on download csv
WebDriverWait(driver,5).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(.,'Download CSV')]"))).click()
您需要导入以下库
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
浏览器快照:
【解决方案3】:
我发现右键单击然后使用箭头键选择适当的选项会更容易一些。
因此,您可以在画布上的任何位置执行right click/context_click,以打开菜单弹出窗口。然后您可以使用箭头键上下移动并选择“下载 Csv”选项。
actions = ActionChains(driver)
# Find the canvas element
element = driver.find_element_by_xpath('//*[@id="body"]/div/div/div[1]/div[2]/div/div[1]/div[1]/div[1]/div/lego-report/lego-canvas-container/div/file-drop-zone/span/content-section/div[3]/canvas-component')
# Right click the element, then press the Down key twice followed by the Enter to move to the Download CSV option and select it.
actions.move_to_element(element).context_click().send_keys([Keys.DOWN, Keys.DOWN, Keys.ENTER]).perform()
【解决方案4】:
driver.get("https://datastudio.google.com/reporting/d97f5736-2b85-4f39-beba-6dc386c24429/page/Z3ToB")
time.sleep(5)
source= wait.until(EC.presence_of_element_located((By.XPATH,"/html/body")))
action = ActionChains(driver)
action.context_click(source).perform()
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#mat-menu-panel-0 > div > span:nth-child(5) > button"))).click()
奇怪地让它与这个一起工作。似乎您需要等待,上下文单击正文,然后单击菜单元素。
<button _ngcontent-fys-c1="" class="mat-focus-indicator mat-tooltip-trigger mat-menu-item ng-star-inserted" mat-menu-item="" role="menuitem" tabindex="0" aria-disabled="false"> Download CSV <!----><!----><!----><div class="mat-menu-ripple mat-ripple" matripple=""></div></button>
导入
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from time import sleep