【发布时间】:2022-01-19 16:16:22
【问题描述】:
https://fbref.com/en/squads/0cdc4311/Augsburg-Stats 提供了将表格转换为 csv 的按钮,我想将其抓取。我点击像
这样的按钮elements = driver.find_elements(By.XPATH, '//button[text()="Get table as CSV (for Excel)"]')
for element in elements:
element.click()
但我得到一个例外
ElementNotInteractableException:消息:元素不可交互
这是完整的代码(我添加了 Adblock plus 作为 Chrome 扩展,应该配置为本地测试):
import pandas as pd
import bs4
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
import time
import os
#activate adblock plus
path_to_extension = '/home/andreas/.config/google-chrome/Default/Extensions/cfhdojbkjhnklbpkdaibdccddilifddb/3.11.4_0'
options = Options()
options.add_argument('load-extension=' + path_to_extension)
#uses Chrome driver in usr/bin/ from https://chromedriver.chromium.org/downloads
driver = webdriver.Chrome(options=options)
#wait and switching back to tab with desired source
time.sleep(5)
driver.switch_to.window(driver.window_handles[0])
NO_OF_PREV_SEASONS = 5
df = pd.DataFrame()
urls = ['https://fbref.com/en/squads/247c4b67/Arminia-Stats']
for url in urls:
driver.get(url)
html = driver.page_source
soup = bs4.BeautifulSoup(html, 'html.parser')
#click button -> accept cookies
element = driver.find_element(By.XPATH, '//button[text()="AGREE"]')
element.click()
for i in range(NO_OF_PREV_SEASONS):
elements = driver.find_elements(By.XPATH, '//button[text()="Get table as CSV (for Excel)"]')
for element in elements:
element.click()
#todo: get data
#click button -> navigate to next page
time.sleep(5)
element = driver.find_element(By.LINK_TEXT, "Previous Season")
element.click()
driver.quit()
【问题讨论】:
标签: python selenium web-scraping